Semantic Analysis Simple Hindley-Milner Type Checking

CSE340 Spring 2015 - Project 3: Semantic AnalysisSimple Hindley-Milner Type CheckingDue: March 25, 2015 by 11:59 PMAbstractThe goal of this project is to finish an incomplete parser and write a semanticchecker for a language. The input to the semantic checker will be a programand the output will be information about the types of the variables in theprogram or an error message if there is type mismatch or other semantic error.This project is conceptually more involved than previous projects. It will takearound 20 hours to fi nish this project.1 OverviewThe goal of this project is to finish an incomplete parser and write a semantic checkerfor a language. The input to the semantic checker will be a program and the outputwill be information about the types of the variables in the program or an errormessage if there is type mismatch or other semantic error. This is an illustration:__________________| | /------ Error MessageProgram ----| Semantic Checker |----||__________________| \------ Type InformationThe semantic checker enforces semantic rules on the input program. The rulesconstrain variable usage; depending on the type of a variable, some operations onthe variable are allowed or disallowed.You are provided with an incomplete parser for a language and asked to com-plete it and enforce the semantic rules of the language. The language is specfi edby a grammar (see below) and the parser we provide already parses all rules ex-cept while stmt, condition, do stmt, switch stmt, case list and case. Com-pleting the parser means adding code to parse while stmt, condition, do stmt,switch stmt, case list and case. You are not allowed to modify otherparts of the provided parser. In particular, you should not modify howexpressions are parsed.1The provided code also builds an abstract syntax tree. While you do not need touse an abstract syntax tree for your solution, looking at the implementation can beuseful. The abstract syntax tree building is also not complete! It does not handlewhile stmt, condition, do stmt, switch stmt, case list and case.The provided parser also rewrites the input program so that all expressions arechanged to pre x notation. This functionality is also not needed for your implemen-tation but it can be useful as an example.The language and the rules to be enforced are described in the following sections.2 GrammarThe grammar of the language is the following:(1) program - decl body(2) decl - type_decl_section var_decl_section(3) type_decl_section - TYPE type_decl_list(4) type_decl_section - // epsilon(5) type_decl_list - type_decl type_decl_list(6) type_decl_list - type_decl(7) type_decl - id_list COLON type_name SEMICOLON(8) type_name - REAL(9) type_name - INT(10) type_name - BOOLEAN(11) type_name - STRING(12) type_name - LONG(13) type_name - ID(14) var_decl_section - VAR var_decl_list(15) var_decl_section - // epsilon(16) var_decl_list - var_decl var_decl_list(17) var_decl_list - var_decl(18) var_decl - id_list COLON type_name SEMICOLON(19) id_list - ID COMMA id_list(20) id_list - ID(21) body - LBRACE stmt_list RBRACE(22) stmt_list - stmt stmt_list(23) stmt_list - stmt(24) stmt - while_stmt(25) stmt - assign_stmt(26) stmt - do_stmt(27) stmt - switch_stmt(28) while_stmt - WHILE condition body(29) assign_stmt - ID EQUAL expr SEMICOLON(30) do_stmt - DO body WHILE condition SEMICOLON(31) switch_stmt - SWITCH ID LBRACE case_list RBRACE(32) case_list - case case_list(33) case_list - case2(34) case - CASE NUM COLON body(35) expr - term PLUS expr(36) expr - term MINUS expr(37) expr - term(38) term - factor MULT term(39) term - factor DIV term(40) term - factor(41) factor - LPAREN expr RPAREN(42) factor - NUM(43) factor - REALNUM(44) factor - ID(45) condition - ID(46) condition - primary relop primary(47) primary - ID(48) primary - NUM(49) primary - REALNUM(50) relop - GREATER(51) relop - GTEQ(52) relop - LESS(53) relop - NOTEQUAL(54) relop - LTEQThe following sections will go into the details of semantic checking that needs tobe done.3 ProgramA program consists of two sections: a declaration section (decl) that contains typeand variable declarations and a body section (body) that contains statements of theprogram. The top-level grammar rules for a program are:(1) program - decl body(2) decl - type_decl_section var_decl_sectionIn what follows we elaborate on each program section.3.1 Types and Variables3.1.1 TypesThe language has a number of built-in types. These are: INT, REAL, BOOLEAN, STRINGand LONG. In addition to the built-in types, the language allows the declaration ofuser-de ned types. User-de ned types are declared in a type decl section (typedeclaration section). The syntax for type declarations is given by the followinggrammar rules:3(3) type_decl_section - TYPE type_decl_list(4) type_decl_section - // epsilon(5) type_decl_list - type_decl type_decl_list(6) type_decl_list - type_decl(7) type_decl - id_list COLON type_name SEMICOLON(8) type_name - REAL(9) type_name - INT(10) type_name - BOOLEAN(11) type_name - STRING(12) type_name - LONG(13) type_name - IDThe rules for id list are rules (19) and (20) above. They are also repeated belowwith variable declarations.User-declared types can be explicit or implicit:? Explicit user-declared types are names that are not built-in types and thathave their rst appearance in the program as part of id list of a declarationaccording to rule (7).? Implicit user-declared types are not built-in types and not explicit user-declared types. Implicit user-declared types have their rst appearance astype name in a declaration according to rules (7) or (18) (see below also).Example: The following is an example program. We added line numbers tomore easily refer to the declarations and statement. Line numbers are not part ofthe input.In the example, types at ( rst appearance in line 02) and bt ( rst appearance inline 03) are explicitly declared. at is explicitly declared because it is not the nameof a built-in type and it rst appears in at : INT; (line 02). In the parsing of theprogram at : INT; is parsed as rule (7) and at is part of the id list.Type ct ( rst appearance in line 07) is an implicit user-declared type becauseit has its rst appearance as type name in bv : ct; which is parsed according torule (18) for variable declarations.01: TYPE02: at : INT;03: bt : at;04:05: VAR06: av : at;07: bv : ct;408: {09: av = 1;10: bv = av + lv;11: WHILE jv < iv12: {13: av = av+1;14: }15: }Constraints for type declaration section(C1) Declaration of a built-in type should result in a syntax error. This is alreadyhandled by the parser that I provide.(C2) A user-declared type name should not be declared more than once in the TYPEsection. The following should hold:? An explicit declared type should not appear in two di erent id list(s)(in the TYPE section) or more than once in the same id list (in the TYPEsection).? An implicit user-declared type should not appear in an id list of adeclaration according to (7).Example In the following example at appears in two di erent id list(s) in theTYPE section, thereby rule (C2) is violated. Also, et appears twice in the sameid list in the TYPE section, a violation of rule (C2). Built-in type REAL is explicitlydeclared (line 04), thereby rule (C1) is violated. Finally, implicitly user-declaredtype bt appears in an id list of a declaration according to (7) (line 05) which is aviolation of rule (C2).01: TYPE02: at,et,et : INT;03: at : bt;04: REAL : at;05: bt : dt;06:07: VAR08: av : at;09: bv : ct;09: {10: av = 1;11: }53.2 VariablesThe language allows the explicit declaration of variables. This is done in a var decl section(variable declaration section) whose syntax is de ned as follows:(14) var_decl_section - VAR var_decl_list(15) var_decl_section - // epsilon(16) var_decl_list - var_decl var_decl_list(17) var_decl_list - var_decl(18) var_decl - id_list COLON type_name SEMICOLON(19) id_list - ID COMMA id_list(20) id_list - IDVariables are declared explicitly or implicitly: A variable is declared explicitlyif it appears in an id list in a declaration according to (18). A variable is declaredimplicitly if it is not declared explicitly but appears in the program body (see below).Example: In the following example av and bv are declared explicitly. Variable evis declared implicitly.01: VAR02: av : at;03: bv : ct;04: {05: av = 1;06: ev = 2;07: }Constraints for variable declaration section(C3) Types declared in TYPE section should not be re-declared as a variable. Atype name that is introduced in a type decl (implicit or explicit) should notappear as part of an id list of rule (18)(C4) Multiple declarations of the same variable name not allowed. The same vari-able name should not appear in two di erent var decl or more than once inthe same var decl.(C5) Types declared in VAR section should not be re-declared as a variable. Aname that rst appears as type name in rule (18) should not be declaredas a variable (note that a name can appear as a type name without beingexplicitly or implicitly declared as a type in the type decl section. This6rule is the same as rule (C3) but applied to implicitly declared types that are rst introduced in the var decl section).(C6) Variables should not be re-declared as type names. A name that rst appearsin an id list of rule (18) should not appear as a type name in rule (18).Example: In the following example the re-declaration of the explicitly-declaredtype bt ( rst introduced in line 03) as a variable in line 06 violates rule (C3). There-declaration of the implicitly declared type it in line 05 is also a violation ofrule (C3).In the example, the two declarations of av in line 05 is a violation of rule (C4).The re-declaration of bv ( rst declared in line 05) in line 06 is also a violation ofrule (C4). The re-declaration of implicit type dt ( rst introduced in line 05) as avariable in line 06 is a violation of rule (C5). The re-declaration of variable x ( rstintroduced in line 07) as a type name in line 08 is a violation of rule (C6).01: TYPE02: at : INT;03: bt : it;04: VAR05: it, av, av, bv : dt;06: bt, bv, dt : INT;07: x, y : REAL;08: z : x;09: {10: av = 1;11: ev = 2;12: }4 Program BodyThe body of the program consists of a list of statements between two braces. Thegrammar rules are:(21) body - LBRACE stmt_list RBRACE(22) stmt_list - stmt stmt_list(23) stmt_list - stmt5 StatementsA statement can be either an assignment (assign stmt), a while statement (while stmt),a do-while statement (do stmt) or a switch statement (switch stmt). An assign stmt7assigns an expression to a variable. A while statement has 3 parts: (1) the WHILEkeyword, (2) a condition, and (3) a body (this is a recursive de nition). A do stmthas 5 parts: (1) the DO keyword, (2) a body, (3) the WHILE keyword, (4) a conditionand (5) a SEMICOLON. A switch stmt has 5 parts: (1) the SWITCH keyword, (2) avariable name (ID), (3) an LBRACE i.e. f, (4) a case list and (5) an RBRACE i.e. gThe grammar rules for stmt are the following:(24) stmt - while_stmt(25) stmt - assign_stmt(26) stmt - do_stmt(27) stmt - switch_stmt(28) while_stmt - WHILE condition body(29) assign_stmt - ID EQUAL expr SEMICOLON(30) do_stmt - DO body WHILE condition SEMICOLON(31) switch_stmt - SWITCH ID LBRACE case_list RBRACE(32) case_list - case case_list(33) case_list - case(34) case - CASE NUM COLON body(35) expr - term PLUS expr(36) expr - term MINUS expr(37) expr - term(38) term - factor MULT term(39) term - factor DIV term(40) term - factor(41) factor - LPAREN expr RPAREN(42) factor - NUM(43) factor - REALNUM(44) factor - ID(45) condition - ID(46) condition - primary relop primary(47) primary - ID(48) primary - NUM(49) primary - REALNUM(50) relop - GREATER(51) relop - GTEQ(52) relop - LESS(53) relop - NOTEQUAL(54) relop - LTEQConstraints on statements and expressions(C7) Assignment cannot be made between variables of di erent types8(C8) Operations (PLUS, MINUS, MULT, DIV) cannot be applied to operands of di erenttypes (but can be applied to operands of same type including STRING andBOOLEAN)(C9) Relational operators (relop) cannot be applied to operands of di erent types(but can be applied to operands of same type including STRING and BOOLEAN)(C10) NUM constants are of type INT(C11) REALNUM constants are of type REAL(C12) The result of primary relop primary is of type BOOLEAN (and both primarieshave to have the same type)(C13) condition should be of type BOOLEAN (this is automatically satis ed for rule(46) when both primaries have the same type)(C14) The type of an expression is the same as the type of its operands (which musthave the same type by (C8))(C15) Names that are explicitly or implicitly declared as types should not appearin the program body. This error is considered a type name re-declared asvariable.(C16) The variable that follows the SWITCH keyword according to rule (31) shouldbe of type INT6 RequirementsThe goal is to complete the parser and write a type checker. The type checker willenforce all the constraints above and the constraints described in this section. Inthis section, I will just describe the constraints. Examples are given in the nextsection.The goal of the type checker is to determine which types and variables havethe same "type". We will express the type of a type name or a variable name asan integer number. Built-in types have prede ned numbers. The goal of the typechecker is to assign type numbers to user de ned types and to variables so thatnames that have the same "type" also have the same type number.We have the following main rules (MR) to be enforced:(MR1) Structural equivalence is enforced.(MR2) Each type is assigned an integer value (type number).(MR3) Each variable is assigned an integer type number.9(MR4) Each variable (whether or not it is explicitly declared) has a xed "type" thatdoes not change. This means that there should be one type number that isassigned to the variable in the whole program and that is consistent with thedeclaration of the variable and all uses of the variable.(MR5) Each type name (whether or not it is explicitly declared) has a xed "type"that does not change. This means that there should be one type number thatcan be assigned to the type name and that is consistent with the declarationof the type name and all the uses of the type name.(MR6) If a : b; is a type declaration, then #a = #b, where #x is the type numberassigned to x.(MR7) If a : t; is a variable declaration, then #a = #t, where #x is the type num-ber assigned to x.7 Implementation SuggestionsAs the declaration sections are parsed, type numbers are assigned to type namesand variable names in a way that satisfy the constraints above.(IS1) For built-in types, you can use xed numbers. We suggest the followingnumbers: INT is assigned 10, REAL is assigned 11, STRING is assigned 12, BOOLEAN isassigned 13 and LONG is assigned 14.(IS2) Implicit types are assigned unique numbers when they are introduced. Thenumbers should be greater than 14.(IS3) Variables that are implicitly declared are assigned unique numbers whenthey are introduced. The numbers should be greater than 14.When an assignment statement is parsed, the numbers of the left hand side and theright hand side should be the same. If the two sides have the same number, nothingneeds to be done (constraint (C7) is satis ed). Otherwise, due to rule (C7), we needto make sure they get the same number. There are three cases to consider:(IS4) The two numbers are numbers of two di erent built-in types: type mismatcherror, since two built-in types cannot have the same type number.(IS5) One number is the number of a built-in type and the other number is thenumber of an implicit type: The implicit type number should be changed tomatch that of the built-in type.10(IS6) The two numbers are the numbers of implicit types: One of the implicit typenumbers should be changed to match the other one.Similar changes are done when processing conditions or expressions. This is illus-trated in a detailed example below.8 WHAT YOUR PROGRAM SHOULD DO?Analyze the input program and either:1. Print out semantic error message if there is an errorOR2. For every type and variable, print all the variables and types with the sametype number together. The detailed format is given belowIf there is a semantic error, you should not print any variables or types. As in allprojects in this class, your program should read input from standard input, andwrite output to standard output.11OUTPUTError codes:? type name declared more than once: ERROR CODE 0? type re-declared as variable: ERROR CODE 1? variable declared more than once: ERROR CODE 2? type mismatch: ERROR CODE 3? variable re-declared as a type name: ERROR CODE 4Non-error output:? For every type name that is built-in, explicitly or implicitly declared, the se-mantic checker should output, in the order they are introduced in the program,all type names (built-in rst, then explicit, then implicit) and variable names(explicit rst then implicit) that have the same integer value (type number).The format is:typename_1 : typename_2 typename_3 ... typename_n variablename_1variablename_2 ... variablename_k #Note that if built-in types appear in a list, they always appear to the left of acolon.Your should read this rule very carefully. It is not obvious. Readexactly what it says.? For built-in types the order is according to their type numbers, e.g. 10, 11,...So you would list INT before REAL in the output.? If a type name already appears in a list, do not create a list for it? For every variable name that is implicitly declared, output, in the order theyare introduced in the program all variable names that have the same integernumber. Do not create a list for an implicitly declared variable if it alreadyappears in an earlier list. The format is:variablename_1 : variablename_2 variablename_3 ... variablenam_l #Note that explicitly-declared variables will already appear in one of the \type"lists and therefore do not need separate lists.129 ExamplesHere are some example input programs and detailed explanations of type checkingprocedure. Note that gray boxes are input programs and brown boxes are outputboxes.{a = b;}In this example:? type of a = 15? type of b = 16) we change type of a to 16 because of the assignment statement.Output:a : b #Note that no built-in types are listed in the output because there are no variablesor types that have the same type numbers as built-in types.VARa : INT;b : REAL;{a = b;}In this example:? type of a = 10? type of b = 11) type mismatch (assignment statement)13Output:ERROR CODE 3{WHILE a{b = b+1;}}In this example:? type of a = 13 (BOOLEAN) since it is used as a condition? type of b = 10 (INT) since it is used in an expr of type INT) type of a = 13 and type of b = 10Output:INT : b #BOOLEAN : a #Note that in the above example there is no output for REAL or STRING orLONG because there are no variables or types that have the same type number asREAL or STRING or LONG.{a = 1;b = a;WHILE b{c = a;}}14In this example:? a is INT ( rst statement)? b is INT (second statement)? b is BOOLEAN (condition of the while)) type mismatchOutput:ERROR CODE 3{a = b+c;b = 1;c = 1.5;}In this example:? a, b, and c are the same type ( rst statement)? b is INT (second statement) this also makes a and c INT? c is REAL (from the third statement), type mismatch between INT and REAL) type mismatchOutput:ERROR CODE 315{DO{a = a + 1;SWITCH d{CASE 5:{b = a * 2;}CASE 3:{b = a / 2;}}}WHILE a < c;}In this example:? type of a = 10 (INT) since it is used in an expr of type INT? type of d = 10 (INT) since it is used as switch variable? type of b = 10 (INT) since it is used in an expr of type INT (b = a * 2;)? No conict in the assignment statement b = a / 2;? type of c = 10 (INT) since it is compared to a which is of type INT) all variables are of type INTOutput:INT : a, d, b, c #16Detailed exampleNote that line numbers are for reference only and not part of the input.01: TYPE02: at : INT;03: bt : at;04: ct : dt;05: et : BOOLEAN;06: ft : STRING;07: gt : REAL;08:09: VAR10: av : at;11: bv : bt;12: cv : ct;13: ev : et;14: fv : ft;15: gv : gt;16: hv : t1;17: iv : t1;18: jv : t219: kv : t2;20: lv : t3;21:22: {23: av = 1;24: bv = av + lv;25: WHILE jv < iv26: {27: av = av+1;28: }29:30: WHILE jv31: {32: av = av+1;33: }34:35: kv = kv+1;36: bv = av;37: cv = 1.0;38: cv = bv;39: mv = nv;40: }17Here is a line-by-line type analysis of the above program. Note that there is atype mismatch at line 35 but the analysis is continued after that line, this is justfor information purposes and normally the type checking should be stopped afterdetecting an error.line 02: at : INT;#at = #INT // by (MR6)#INT = 10 // INT is a built-in type. See (IS1)#at <- 10line 03: bt : at;#bt = #at // by (MR6)#at = 10 // assigned previously (line 2)#bt <- 10line 04: ct : dt;#ct = #dt // by (MR6)#dt <- 15 // dt is an implicit type. See (IS2)#ct <- 15line 05: et : BOOLEAN;#et = #BOOLEAN // by (MR6)#BOOLEAN = 13 // BOOLEAN is a built-in type. See (IS1)#et <- 13line 06: ft : STRING;#ft = #STRING // by (MR6)#STRING = 12 // STRING is a built-in type. See (IS1)#ft <- 12line 07: gt : REAL;#gt = #REAL // by (MR6)#REAL = 11 // REAL is a built-in type. See (IS1)#gt <- 11line 10: av : at;#av = #at // by (MR7)#at = 10 // assigned previously (line 2)#av <- 10line 11: bv : bt;#bv = #bt // by (MR7)#bt = 10 // assigned previously (line 3)#bv <- 1018line 12: cv : ct;#cv = #ct // by (MR7)#ct = 15 // assigned previously (line 4)#cv <- 15 // note that the base type is unknownline 13: ev : et;#ev = #et // by (MR7)#et = 13 // assigned previously (line 5)#ev <- 13line 14: fv : ft;#fv = #ft // by (MR7)#ft = 12 // assigned previously (line 6)#fv <- 12line 15: gv : gt;#gv = #gt // by (MR7)#gt = 11 // assigned previously (line 7)#gv <- 11line 16: hv : t1;#hv = #t1 // by (MR7)#t1 <- 16 // t1 is an implicit type. See (IS2)#hv <- 16line 17: iv : t1;#iv = #t1 // by (MR7)#t1 = 16 // assigned previously (line 16)#iv <- 16line 18: jv : t2#jv = #t2 // by (MR7)#t2 <- 17 // t2 is an implicit type. See (IS2)#jv <- 17line 19: kv : t2;#kv = #t2 // by (MR7)#t2 = 17 // assigned previously (line 18)#kv <- 17line 20: lv : t3;#lv = #t3 // by (MR7)#t3 <- 18 // t3 is an implicit type. See (IS2)#lv <- 1819line 23: av = 1;#(1) = 10 // by (C10)#av = 10 // assigned previously in line 10no mismatch // no rules violatedline 24: bv = av + lv;#av = 10 // assigned previously in line 10#lv = 18 // assigned previously in line 20#av = #lv // by (C8)#lv <- 10 // See (IS5)#t3 <- 10 // since #lv = #t3#bv = 10 // assigned previously in line 11no mismatch // rule (C7) appliesline 25: WHILE jv < iv#jv = 17 // assigned previously in line 18#iv = 16 // assigned previously in line 17#jv = #iv // by (C9)#jv <- 16 // See (IS6)#t2 <- 16 // since #jv = #t2#kv <- 16 // since #kv = #t2line 27: av = av+1;#av = 10 // assigned previously in line 10#(1) = 10 // by (C10)no mismatch // rules (C7) and (C8) applyline 30: WHILE jv#jv = #BOOLEAN // by (C13)#jv = 16 // assigned previously in line 25#BOOLEAN = 13 // BOOLEAN is a built-in type. See (IS1)#jv <- 13 // See (IS5)#t1 <- 13 // change all 16's to 13's#t2 <- 13 // for all types and#hv <- 13 // variables#iv <- 13#kv <- 13line 32: av = av+1;#av = 10 // assigned previously in line 10#(1) = 10 // by (C10)no mismatch // rules (C7) and (C8) apply20line 35: kv = kv+1;#kv = 13 // from line 30#(1) = 10 // by (C10)#kv = #(1) // by (C8)Type mismatch // type mismatch since 13 and 10 are// built-in type numbers. See (IS4)line 36: bv = av;#av = 10 // assigned previously in line 10#bv = 10 // assigned previously in line 11no mismatch // rule (C7) appliesline 37: cv = 1.0;#cv = 15 // assigned previously in line 12#(1.0) = 11 // by (C11)#cv = #(1.0) // by (C7)#cv <- 11 // See (IS5)#ct <- 11 // change all 15's to 11's#dt <- 11line 38: cv = bv;#cv = 11 // assigned previously in line 37#bv = 10 // assigned previously in line 11#cv = #bv // by (C7)Type mismatch // type mismatch since 11 and 10 are// built-in type numbers. See (IS4)line 39: mv = nv;#mv <- 19 // mv is an implicit variable. See (IS3)#nv <- 20 // nv is an implicit variable. See (IS3)#mv = #nv // by (C7)#nv <- 19 // See (IS6)2110 GradingYour program will be executed on a number of test cases using the shell scriptprovided for project 1. The grade is based on the test cases that your programcorrectly passes. To pass a test case, the output of the program should match theexpected output. It is not enough that the right information be in the output thatyour program generates. The format and order of the output is essential.11 Submission1. Only C or C++ can be used for this project2. You should submit all your code on the course website by 11:59:59 pm on duedate.3. Make sure your submission has no compile problems. If after submission youget a compiler error on the website, x the problem and submit again. Asubmission that does not compile will not be given any credit4. You can submit any number of times you need, but remember that we onlygrade your last submission5. DO NOT include test scripts or test cases with your submission6. DO NOT use any whitespace characters in your le names22
Powered by