TyS

A Framework to Facilitate the Development of Object-Oriented Type Checkers

As an example to show how to use the TyS framework, we are going to develop a simple type Calculator, being able to calculate a logical or arithmetical value, depending on type inference. We are going to use a YACC version for Java: BYACC/Java.

This calculator accepts logic and arithmetical statements separated by a form feed. Each statement is a mathematical expression. If its syntax and semantic is correct, it would be evaluated and the resulting value will be shown in the standard output.

The calculator has three types: integer, real and boolean. It supports the following arithmetic operations: addition, subtraction, multiplication, division, and modulus. It also has boolean and, or and not operations.

You can download the full example source code.

Lexical Analysis

At first, we are going to define the lexical components (tokens) of our small language. The next table shows language lexemes and the tokens associated.

Lexical items Token
'+' '-' '*' '/' '%' '\n' '(' ')' '!' '|' '&' Its ASCII value
Integer Constant CONST_INT
Real Constant CONST_REAL
"true" TRUE
"false" FALSE

This defines our lexical specification. As we are going to use YACC, the scanner must be implemented by the yylex function. As its specification is pretty simple, we have developed it by hand.

Syntax Analysis

We will use a LALR(1) grammar, resolving the ambiguity conflict of infix operator notation following the conventional operator associativity and precedence:

Operator Number of Operands Associativity Syntax
| & 2 left infix
+ - 2 left infix
/ * % 2 left infix
! 1 none prefix
( ) 1 none surrounding the expression

This is the LALR(1) grammar of the calculator language:

		   <calc> :: <calc> <stament> 
				  | <stament>
				  ;
		   
		   <stament> ::= <expression> '\n'
					 |   '\n' 
					 ;
					   
		   <expression> ::= <expression> '+'<expression>
						|   <expression> '-' <expression>
						|   <expression> '*' <expression>
						|   <expression> '/'<expression>
						|   <expression> '%' <expression>
						|   <expression> '&' <expression>
						|   <expression> '|' <expression>
						|   '!' <expression> 
						|   '+' <expression> 
						|   '-' <expression> 
						|   '(' expression  ')'
						|   CONST_INT
						|   CONST_REAL
						|   TRUE
						|   FALSE
						;
			

Semantic Specification

In view of the fact that we are going to use BYACC/Java, we should define a class that represent the semantic value of every terminal or non-terminal grammar symbol. These semantic values are the ones to reference on the type and token sections of the yacc grammar. The class is called SemanticValue.

These attributes of the objects SemanticValue are going to represent with their attributes the classical values to be assigned to YACC %union. The interesting attribute we are going to employ on the semantic analysis is Type: it represents the type of every expression. The complete specification can be obtained from the file SemanticValue.java.

We are about to specify the semantic specification of our language in a natural (informal) way. Let's show every operation defined and type inference to be developed. We also identified each method of the Evaluator class explained later on. SV stands for Semantic Value and e1, e2, and e represents language expressions.

Operators Semantics

Operation Semantics Evaluator Method
e1 & e2 e1, e2: SV Boolean type.
Resulting type: Boolean.
Evaluation: Logical AND.
logical(e1, e2:SV, operator:char):SV
e1 | e2 e1, e2: SV Boolean type.
Resulting type: Boolean.
Evaluation: Logical OR.
logical(e1, e2:SV, operator:char):SV
! e e: SV Boolean type.
Resulting type: Boolean.
Evaluation: Logical NOT.
not(e:SV):SV
e1 + e2 e1, e2: SV Real or Integer types.
Resulting type: Major of both.
Evaluation: Addition.
arithmetical(e1, e2: SV, operator:char):SV
e1 - e2 e1, e2: SV Real or Integer types.
Resulting type: Major of both.
Evaluation: Subtraction.
arithmetical(e1, e2: SV, operator:char):SV
e1 * e2 e1, e2: SV Real or Integer types.
Resulting type: Major of both.
Evaluation: Multiplication.
arithmetical(e1, e2: SV, operator:char):SV
e1 / e2 e1, e2: SV Real or Integer types.
Resulting type: Major of both.
Evaluation: Division.
arithmetical(e1, e2: SV, operator:char):SV
e1 % e2 e1, e2: SV Integer type.
Resulting type: Integer.
Evaluation: Modulus.
mod(e1, e2:SV):SV
+ e e: SV Real or Integer type.
Resulting type: The same as e.
Evaluation: Do nothing.
unarySign(e:SV, operator:char)
- e e: SV Real or Integer type.
Resulting type: The same as e.
Evaluation: Multiplication by -1.
unarySign(e:SV, operator:char)

In case previous conditions are not satisfied, a semantic error must be displayed.

TypeChecking

It is time to use our TyCC processor of the TyS system in order to automatically develop type checking of the semantic analysis.

Type Specification

We create type specification in a simple file: CalcTypes.txt. A type specification file is composed of three sections:

  1. Preamble
  2. Types Declaration
  3. Types Definition
Preamble

This section might be used to insert any kind of code between the %{ and %} pairs of characters. This code will be simply copied into the beginning of the destination file Type.java generated by TyCC.

In our example, we have used it import the necessary packages:

        %{ 
            import java.lang.io.*; 
        %} 
			
Types Declaration

In this section types are declared at the same time as type promotion: rules that give a type the ability to implicitly convert to another type. By this declaration, code to automatically make this type promotion is generated. Our language has three types: Int, Real and Bool.

For numeric types, integer may be implicitly converted to real type. The same way, Integer may promote to Boolean. We just have to write:

        Types = { Int < Real Bool } 
			

The "less than" operator represents promotion. If promotion does not exist, types are separated by white space or tab character. So, why have not we represented Boolean to Integer promotion? Every type has to be introduced in this declaration, but they can just appear once. We will see how our tool offers enough flexibility to solve this limitation.

Implicit type conversion is managed by the method implicitCast(Type): Type generated by TyCC. In case we want to modify how this mechanism works, we just have to modify this method implementation in a specific class. In our example, we ought to override the implicitCast method of the Int class.

Types Definition

Types are defined following a syntax very similar to the Java programming language.

At first it is written the reserved word class, followed by the unique type identifier. Optionally, after the identifier, a comma-separated type list could be located between parentheses. Finally, method definition or embedded code could be written between braces.

        ClassDefinition ::= CLASS ID '(' [ID{, ID}] ')' '{' {MethodDefinition|EmbbededCode} '}'
			

Depending on the target language selected, code will be represented following the specific language rules.

Types following type identifier means that the type is a type constructor. Types that this constructor will be applied to, are the comma-separated identifiers. In this tutorial we are not going to illustrate this capability. For more information, please read the user guide or samples.

In the calculator, we define three types:

        class Bool {
          ...
        } 
        class Int {
          ...
        } 
        class Real {
          ... 
        }
			

Retrieving types: getType()

In order to obtain any type implementation, TyS offers a types table by means of the Type.typesTable reference. Its main method getType receives as its sole parameter a type expression, returning the appropriate Type object.

TyC generates a type object for each type specified. If we want to directly use a type table, the correct reference is Type.TypesTableAdapter.

Our system uses a Type Expression Language (TEL). If you want to know its specification, check out the TyS User's Guide. A type is basically represented by its name and components (attributes) --if any exists.

In our example, we have three types with no components. If would like to obtain the Boolean type, the correct statement is:

        Type t=Type.typesTable.getType("Bool");
			

Type identifiers are case sensitive.

Typechecking Operations Implementation

Following the Java syntax, we will define methods of the three types declared.

mod():Type

The modulus operation is just defined for integer type. We will, therefore, define it only in the Int type. The rest of types will not have defined this operation as a correct one. This way, if the modmessage is sent to another type, the type checker would generate a semantic error.

        class Int {
            Type  mod() { 
                return Type.typesTable.getType("Int"); 
             } 
            ...
        }
			

What we have done to access the integer type is used the types table. TyS has a type table from which we can obtain any type implementation by using its type expression. The only way that the user may interact with the system is by means of the Type Facade class. Its static attribute typesTable offers every type defined in TyS.

arithmetical(Type):Type

We have grouped every arithmetic operation (except modulus) whose operators are two into a unique method arithmetical. This operation checks the semantic restrictions imposed to three operators: +, -, *, and /. It also infers the resulting type. We will apply it both to integer and real types.

Type inference is based in promotion. Real type is commonly represented as major than integer. This way, we define a method majorTyppe that checks if two types are compliant and, if so, it infers the major one. It will be defined later on.

        class Int {
            Type arithmetical (Type t){
                if (majorType(t) == null)
                     throw new TypeException("Operation arithmetical not allowed between Int and " + t.getName());
                return t;		 
            }
         ...
        } 

        class Real {
            Type arithmetical (Type t){
                if (majorType(t) == null)
                    throw new TypeException("Operation arithmetical not allowed between Real and " + t.getName());
                return majorType(t);
            }
         ... 

        } 
			

Boolean type does not implement this type. A semantic error would be thrown when trying to use it.

unarySign():Type

Next method represents - and + unary operators. It is defined in number types (integer and real), returning the same type.

        class Int {
            Type unarySign() {
                return Type.typesTable.getType("Int");
            }
          ...
        } 

        class Real {
            Type unarySign() {
                return Type.typesTable.getType("Real");
            }
          ... 
        } 
			
implicitCast(Type):Type

In the type declaration section, we could not have defined promotion from integer to both real and Boolean. Since the specified promotion is performed by the implicitCast method of BaseType (any Type extends BaseType), we just have to override this method implementation in the Int class.

Note that, in case that the parameter is not Boolean type, we call the base method --that is, the default implicit cast behavior.

        class Int {
            Type implicitCast(Type t) {
                if (t.getName().equals("Bool")) 
                    return Type.typesTable.getType("Bool");
                // Default behavior
                return super.implicitCast(t);
                }
             ...
            }
        }
			
makeBool(SemanticValue):boolean

This method concerns to calculator expressions evaluation. Depending on the expression value, it would be represented as a logical expression. In our language, integers (but not real numbers) may be represented as booleans.

Thus, we implement a makeBool method in Integer and Boolean that makes explicit this conversion and evaluates it. Note that the parameter is not a Type, but a SemanticValue representing expression value.

        class Int {
            boolean makeBool(SemanticValue v) {
                return v.dval != 0;
            }
          ...
        } 

        class Bool {
            boolean makeBool(SemanticValue v) {
                return v.bval;
            }
          ... 
        } 
			
print(SemanticValue):void

At runtime, we will show the expression value of each of the three types in our language. We will also display a message representing the type --this is performed by the info implemented afterwards.

        class Int {
            void print(SemanticValue v) {
                System.out.println(info() + ", Value = " + (int)v.dval);
            }
          ...
        } 

        class Real {
            void print(SemanticValue v) {
                System.out.println(info() + ", Value = " + v.dval);
            }
          ...
        } 

        class Bool {
            void print(SemanticValue v) {
                System.out.println(info() + ", Value = " + v.bval);
            }   
         ... 
        } 
			

BaseType

As we have seen, each type identified by the user will be translated into a class representing the appropriate type implementation. Each type will be derived from the BaseType class. Its functionality is refactoring every common behavior that exists in all the types.

This class should never be declared in the preamble. However, it is defined as another type.

class BaseType {
         ...
        }
			

Since the rest of the operations are going to be defined in the BaseType, they will be shared to every type.

majorType(Type):Type

This method is used by the arithmetical method to represent type promotion. It returns the major type or nullif there is no promotion.

As we have previously mentioned, type promotion defined by the user is automatically implemented by the implicitCast method of this class. So, the implementation of majorType is pretty simple:

   class BaseType {
            Type majorType(Type t) {	
                Type ret = implicitCast(t);
                if (ret == null) 
                    ret = t.implicitCast(Type.typesTable.getType(getName()));
                return ret;	   
            }
            ...
        }
			

getName returns the type expression.

info():void

This method returns the type expression in order to trace a specific implementation.

   class BaseType {
            String info() { 
                return "Type -> " + getName();
            }
            ...
        }
			
not():Type

We could have implemented this method in every Type that admits the logical not operation --in our example, Integer and Boolean types. However, as it is exactly the same implementation, it always returns the Boolean type, and type promotion is offered by implicitCast, we can develop it at the BaseType class:

   class BaseType {
            Type not() {
                Type t = implicitCast(Type.typesTable.getType("Bool"));
                if (t == null) 
                    throw new TypeException("Cannot make operation upon a " + getName());
                return t;
            }
            ...
        }
			
logical(Type):Type

Following the same idea, we may identify the logical binary operators as a BaseType method. We just have to check if both operands can be implicitly converted to Booleans and, if so, this type is returned. In other occasion, a semantic error is thrown.

   class BaseType {
            Type logical(Type t) {
                if (implicitCast(Type.typesTable.getType("Bool")) == null) 
                    throw new TypeException("Cannot make logical operations upon a " + getName());
                if (t.implicitCast(Type.typesTable.getType("Bool")) == null) 
                    throw new TypeException("Cannot make logical operations upon a " + getName());
                return Type.typesTable.getType("Bool");
            }
            ...
        }
			

Evaluator

YACC Grammar

Now that we have specified the semantic constraints and we have implemented types in TyS, we just have to define the rest of the language processor. This is the Parser.y YACC grammar:

        %token CONST_INT CONST_REAL TRUE FALSE 
        %token ERROR
        %left '&' '|'
        %left '-' '+'
        %left '*' '/' '%'
        %nonassoc '!'

        %%

        calc
            :  calc stament
            | 
            ;
        stament
            : expression '\n' { eval.print($1); }
            | '\n'
            ;

        expression
              : CONST_REAL                { $$ = eval.getConst( $1, "Real" ); }
              | CONST_INT                 { $$ = eval.getConst( $1, "Int" ); }
              | TRUE                      { $$ = eval.getConst( $1, "Bool" ); }
              | FALSE                     { $$ = eval.getConst( $1, "Bool" ); }
              | '+' expression %prec '!'  { $$ = eval.unarySign( $2, '+' ); }
              | '-' expression %prec '!'  { $$ = eval.unarySign( $2, '-' ); }
              | '!' expression %prec '!'  { $$ = eval.not( $2 ); }
              | expression '+' expression { $$ = eval.arithmetical( $1, $3, '+' ); }
              | expression '-' expression { $$ = eval.arithmetical( $1, $3, '-' ); }
              | expression '*' expression { $$ = eval.arithmetical( $1, $3, '*' ); }
              | expression '/' expression { $$ = eval.arithmetical( $1, $3, '/' ); }
              | expression '%' expression { $$ = eval.mod ( $1, $3 ); }
              | expression '&' expression { $$ = eval.logical( $1, $3, '&' ); }
              | expression '|' expression { $$ = eval.logical( $1, $3, '|' ); }
              | '(' expression ')'        { $$ = $2; }
              ;

        ...

        Evaluator eval = new Evaluator();

        %%
			

Remember that every $x is a reference to a SemanticValue . What we have added to the parser is an evaluator object. This object's main aim is:

  • Since we are implementing a calculator interpreter, it evaluates each value of different expressions.
  • Type checking but just calling the type checker generated by TyCC.

Semantic operations list

Evaluator.java is the module were the evaluation is performed. These are all its operations:

arithmetical(op1, op2: SemanticValue, operator:char):SemanticValue

This method first checks that type semantics is correct, and then evaluates the arithmetical binary operation. Type checking is so simple as calling the arithmetical method created by TyC.

Here is the code:

       SemanticValue aritmetical(SemanticValue v1, SemanticValue v2, char operator) {
            SemanticValue r = new SemanticValue(v1.dval);    
            try {
                r.type = v1.type.aritmetical(v2.type);//TypeChecking
                switch (operator) {
                    case '+' : r.dval += v2.dval; break;
                    case '-' : r.dval -= v2.dval; break;
                    case '/' : r.dval /= v2.dval; break;
                    case '*' : r.dval *= v2.dval; break;
                    default: throw new TypeException("Unknown operator " + operator);
                }
            } catch (TypeException e) { System.err.println(e); }
            return r;
         }
			

First we create a new SemanticValue object to be returned. Then we call the type checking appropriate operation and finally the evaluation is performed. Note that our error handler just shows the semantic error by console and follows processing expressions --more sophisticated ones could be developed modifying exception handlers.

logical(op1, op2: SemanticValue, operator: char): SemanticValue

Following the same scheme, logical checks semantic constraints and then evaluates the logical expression.

       SemanticValue logical(SemanticValue v1, SemanticValue v2, char operator) { 
            SemanticValue r = new SemanticValue(v1.dval); 
            try { 
                r.type = v1.type.logical(v2.type); //TypeChecking 
                // Evaluation
                boolean b1 = makeBool(v1); 
                boolean b2 = makeBool(v2);// If typechecking is OK continue with evaluation 
                switch (operator) {
                    case '&' : r.bval = b1 && b2; break; 
                    case '|' : r.bval = b1 || b2; break;
                    default: throw new TypeException("Unknown operator " + operator);
                    }
                . . .
              . . . 
        }
			

In this example, we have used TyS, besides for type checking, as part of expression evaluator. If you remember, makeBool takes a Semantic Value and returns, if possible, its respective Boolean. This shows us the high versatility of this system.

mod(op1, op2: SemanticValue): SemanticValue
       SemanticValue mod(SemanticValue v1, SemanticValue v2) {
            SemanticValue r = new SemanticValue(v1.dval); 
            try { 
                r.type = v1.type.mod(v2.type);//TypeChecking 
                r.dval = (int)v1.dval % (int)v2.dval;
            ....
        }							
			
unarySign(op1:SemanticValue, operator: char): SemanticValue

This method checks and evaluates the unary operators + and -:

       SemanticValue unarySign(SemanticValue v, char operator) { 
            SemanticValue r  = new SemanticValue(v.dval);
            try { 
                r.type = v.type.unarySign();//TypeChecking 
                if (operator == '-') r.dval = -r.dval;
            ...
        }
			
not(op1: SemanticValue): SemanticValue
      SemanticValue not(SemanticValue v) { 
            SemanticValue r = new SemanticValue(v.dval); 
            try {
                r.type = v.type.not(); //TypeChecking 
                r.bval = !makeBool(v);//TypeChecking Interpreter Interaction } 
          ...
        }
        

Building the Calculator

Once we have created all the modules, it is time to build our typed calculator.

At first, first compile the parser grammar with BYACC/J. We have to make explicit the use of SemanticValue:

      yacc -JSemantic=SemanticValue Parser.y

If everything goes well, a Parser.java file is generated.

Once the parser is generated, we produce the type checker with TyC. In this case, there is no need of specific parameters, simply:

       TyCC CalcTypes.txt

A Type.java file is created with our type checker. System's facade is Type.

Next step is application (language processor) compilation. Everything is Java source, so we have:

       javac Type.java Parser.java Evaluator.java

To run the calculator, we write:

       java Parser

Sample valid programs are:

        2 + 2 
        3 * 5.6 
        3 & 4 | true
        3 % 2

Wrong ones and, therefore, semantic errors are displayed in these ones:

         2 % 2.2
         ! 2.3
         4 | 2.3
        

 

Conclusions

It has been shown how TyS makes possible the isolation of type systems specification (the most important part of semantic analysis). The user, employing object orientation, defines types, its defined operations and promotion. The checker is automatically generated and the language processor just uses it.

As YACC is a well-known parser generator tool, we have done this tutorial with YACC. However, using more sophisticated tools (like ANTLR) demonstrate that the rest of semantic analysis is commonly reduced to symbol table management --see TyS user's guide.