a small compiler that will read an input program and represents it in an internal data structure

CSE 340  Project 4

Abstract

The goal of this project is to give you some hands-on experience with implementing a smallcompiler. You will write a compiler for a simple language. You will not be generating lowlevel code. Instead, you will generate an intermediate representation (a data structure thatrepresents the program). The execution of the program will be done after compilation byinterpreting the generated intermediate representation.

1 Introduction

You will write a small compiler that will read an input program and represents it in an internaldata structure. The data structure will contain representation of instructions to be executed aswell as a part that represents the memory of the program (space for variables). Then your compilerwill execute the data structure (interpret it). This means that the program will traverse the datastructure and at every node it visits, it will execute the node by changing appropriate memorylocations and deciding what is the next instruction to execute (program counter). The output ofyour compiler is the output that the input program should produce. These steps are illustrated inthe following gureThe remainder of this document is organized as follows:

1. Grammar De nes the programming language syntax including grammar.

2. Execution Semantics Describe statement semantics for if, while, switch and print state-ments.
3. How to generate the intermediate representation Explains step by step how to generatethe intermediate representation (data structure). You should read this sequentially and notskip around.

4. Executing the intermediate representation Basically, you have two options. If you areusing C or C++, you are only allowed to use the code we provide to execute the intermediaterepresentation. If you are using Java, it describes the strict rules to follow in executing theintermediate representation. Those rules will be enforced.

5. Input/Output Reminds you to only use standard input and output.

6. Requirements Lists the programming languages allowed (C/C++ or Java) and other re-quirements.

7. Submission Instructions for submitting your project.

8. Grading Describes the grading scheme.

9. Bonus Project Describes the requirements for the bonus project.

2 Grammar

The grammar for this project is a simpli ed form of the grammar from the previous project, butthere are a couple extensions.

program ! var section bodyvar section ! id list SEMICOLONid list ! ID COMMA id list j IDbody ! LBRACE stmt list RBRACEstmt list ! stmt stmt list j stmtstmt ! assign stmt j print stmt j while stmt j if stmt j switch stmtassign stmt ! ID EQUAL primary SEMICOLONassign stmt ! ID EQUAL expr SEMICOLONexpr ! primary op primaryprimary ! ID j NUMop ! PLUS j MINUS j MULT j DIVprint stmt ! print ID SEMICOLONwhile stmt ! WHILE condition bodyif stmt ! IF condition bodycondition ! primary relop primaryrelop ! GREATER j LESS j NOTEQUAL2switch stmt ! SWITCH ID LBRACE case list RBRACEswitch stmt ! SWITCH ID LBRACE case list default case RBRACEcase list ! case case list j casecase ! CASE NUM COLON bodydefault case ! DEFAULT COLON body

Some highlights of the grammar:

1. Expressions are greatly simpli ed and are not recursive.

2. There is no type declaration section.

3. Division is integer division and the result of the division of two integers is an integer.

4. if statement is introduced. Note that if stmt does not have else. Also, there is no SEMI-COLON after the if statement.

5. A print statement is introduced. Note that the print keyword is in lower case.

6. There is no variable declaration list. There is only one id list in the global scope and thatcontains all the variables.

7. There is no type speci ed for variables. All variables are INT by default.

8. All terminals are written in capital in the grammar and are as de ned in the previous projects(except the print keyword)

3 Execution Semantics

All statements in a statement list are executed sequentially according to the order in which theyappear. Exception is made for body of if stmt, while stmt and switch stmt as explained below.

3.1 Boolean ConditionA boolean condition takes two operands as parameters and returns a boolean value. It is used tocontrol the execution of while and if statements.

3.2 If statementif stmt has the standard semantics:

1. The condition is evaluated

.2. If the condition evaluates to true, the body of the if stmt is executed, then the next statementfollowing the if is executed

.3. If the condition evaluates to false, the statement following the if in the stmt list is executedThese semantics apply recursively to nested if stmt.3

3.3 While statementwhile stmt has the standard semantics:

1. The condition is evaluated.
2. If the condition evaluates to true, the body of the while stmt is executed, then the conditionis evaluated again and the process repeats.
3. If the condition evaluates to false, the statement following the while stmt in the stmt list isexecuted.These semantics apply recursively to nested while stmt. The code block:WHILE conditionfstmt listgis equivalent to:label :

IF conditionfstmt listgoto labelgNote that goto statements do not appear in the input program, but our intermediate represen-tation includes GotoStatement which is used in conjunction with IfStatement to represent whileand switch statements.

3.4 Switch statementswitch stmt has the standard semantics:

1. The value of the switch variable is checked against each case number in order.

2. If the value matches the number, the body of the case is executed, then the statement followingthe switch stmt in the stmt list is executed.

3. If the value does not match the number, next case is evaluated.

4. If a default case is provided and the value does not match any of the case numbers, then thebody of the default case is executed and then the statement following the switch stmt in thestmt list is executed.

5. If there is no default case and the value does not match any of the case numbers, then thestatement following the switch stmt in the stmt list is executed.

4These semantics apply recursively to nested switch stmt. The code block:SWITCH var fCASE n1 : f stmt list 1 g...CASE nk : f stmt list k ggis equivalent to:IF var == n1 fstmt list 1goto labelg...IF var == nk fstmt list kgoto labelglabel :And for switch statements with default case, the code block:SWITCH var fCASE n1 : f stmt list 1 g...CASE nk : f stmt list k gDEFAULT : f stmt list default ggis equivalent to:IF var == n1 fstmt list 1goto labelg...IF var == nk fstmt list kgoto labelgstmt list defaultlabel :

5 3.5 Print statementThe statementprint a;prints the value of variable a at the time of the execution of the print statement.

4 How to generate the codeThe intermediate code will be a data structure (a graph) that is easy to interpret and execute. Iwill start by describing how this graph looks for simple assignments then I will explain how to dealwith while statements.Note that in the explanation below I start with incomplete data structures then I explain whatis missing and make them more complete. You should read the whole explanation.

4.1 Handling simple assignments

A simple assignment is fully determined by: the operator (if any), the id on the left-hand side, andthe operand(s). A simple assignment can be represented as a node:struct AssignmentStatement {struct ValueNode* left_hand_side;struct ValueNode* operand1;struct ValueNode* operand2;int op; // operator}For assignment without an expression on the right-hand side, the operator is set to 0 and thereis only one operand. To execute an assignment, you need the values of the operand(s), applythe operator, if any, to the operands and assign the resulting value of the right-hand side to theleft hand side. For literals (NUM), the value is the value of the number.

For variables, the value isthe last value stored in the variable. Initially, all variables are initialized to 0.Multiple assignments are executed one after another. So, we need to allow multiple assignmentnodes to be linked to each other. This can be achieved as follows:struct AssignmentStatement {struct ValueNode* left_hand_side;struct ValueNode* operand1;struct ValueNode* operand2;int op; // operatorstruct AssignmentStatement* next;}

This structure only accepts ValueNode as operands. To handle literal constants (NUM), you need tocreate ValueNode for them and store them in the created ValueNode while parsing.This will now allow us to execute a sequence of assignment statements represented in a linked-list: we start with the head of the list, then we execute every assignment in the list one after theother. This is simple enough, but does not help with executing other kinds of statements. Weconsider them one at a time.

64.2 Handling print statementsThe print statement is straightforward. It can be represented asstruct PrintStatement{struct ValueNode* id;}Now, we ask: how can we execute a sequence of statements that are either assignment or printstatement (or other types of statements)?

We need to put both kinds of statements in a list and notjust the assignment statements as we did above.

So, we introduce a new kind of node: a statementnode. The statement node has a eld that indicates which type of statement it is. It also has eldsto accommodate the remaining types of statements. It looks like this:struct StatementNode {int type; // NOOP_STMT, GOTO_STMT, ASSIGN_STMT, IF_STMT, PRINT_STMTunion {struct AssignmentStatement* assign_stmt;struct PrintStatement* print_stmt;struct IfStatement* if_stmt;struct GotoStatement* goto_stmt;};struct StatementNode* next;}This way we can go through a list of statements and execute one after the other. To executea particular node, we check its type.

If it is PRINT STMT, we execute the print stmt eld, if it isASSIGN STMT, we execute the assign stmt eld and so on. With this modi cation, we do not needa next eld in the AssignmentStatement structure (as we explained above), instead, we put thenext eld in the statement node.This is all ne, but we do not yet know how to generate the list to execute later. The idea is tohave the functions that parses non-terminals return the code for the non-terminals. For examplefor a statement list, we have the following pseudecode (missing many checks):struct StatementNode* parse_stmt_list(){struct StatementNode* st; // statementstruct StatementNode* stl; // statement listst = parse_stmt();if (nextToken == start of a statement list){stl = parse_stmt_list();append stl to st; // this is pseudecodereturn st;}else{ungetToken();return st;}}7And to parse body we have the following pseudecode:struct StatementNode* parse_body(){struct StatementNode* stl;match LBRACEstl = parse_stmt_list();match RBRACEreturn stl;}

4.3 Handling

if and while statementsMore complications occur with if and while statements. The structure for an if statement can beas follows:struct IfStatement {int condition_op;struct ValueNode* condition_operand1;struct ValueNode* condition_operand2;struct StatementNode* true_branch;struct StatementNode* false_branch;}The condition op, condition operand1 and condition operand2 elds are the operator andoperands of the condition of the if statement.

To generate the node for an if statement, we need toput together the condition, and stmt list that are generated in the parsing of the if statement.The true branch and false branch elds are crucial to the execution of the if statement. Ifthe condition evaluates to true then the statement speci ed in true branch is executed otherwisethe one speci ed in false branch is executed. We need one more type of node to allow loop backfor while statements.
This is a GotoStatement.struct GotoStatement {struct StatementNode* target;}To generate code for the while and if statements, we need to put a few things together.

The outlinegiven above for stmt list needs to be modi ed as follows (this is missing details and shows only themain steps).8struct StatementNode* parse_stmt(){...create statement node stif next token is IF{st-type = IF_STMT;create if-node; // note that if-node is pseudecode and is not// a valid identifier in C, C++ or Javast-if_stmt = if-node;parse the condition and set if-node-operator, if-node-op1 and if-node-op2if-node-true_branch = parse_body(); // parse_body returns a pointer to a list of statementscreate no-op node // this is a node that does not result// in any action being takenappend no-op node to the body of the if // this requires a loop to get to the end of// if-node-true_branch by following the next field// you know you reached the end when next is NULL// it is very important that you always appropriately// initialize fields of any data structures// do not use uninitialized pointersset if-node-false_branch to point to no-op nodeset st-next to point to no-op node...} else ...}
The following diagram shows the desired structure for the if statement:9The stmt list code should be modi ed because of the extra no-op node:struct StatementNode* parse_stmt_list(){struct StatementNode* st; // statementstruct StatementNode* stl; // statement listst = parse_stmt();if (nextToken == start of a statement list){stl = parse_stmt_list();if st-type == IF_STMT{append stl to the no-op node that follows st// st// |// V// no-op// |// V// stl}else{append stl to st;// st// |// V// stl}return st;}else{ungetToken();return st;}}Handling while statement is similar. Here is the outline for parsing a while statement and creatingthe data structure for it:...create statement node stif next token is WHILE{st-type = IF_STMT; // handling WHILE using if and goto nodescreate if-node // if-node is not a valid identifier see// corresponding comment abovest-if_stmt = if-nodeparse the condition and set if-node-operator, if-node-op1 and if-node-op2if-node-true_branch = parse_body();create a new statement node gt // This is of type StatementNodegt-type = GOTO_STMT;create goto-node // This is of type GotoStatementgt-goto_stmt = goto-node;goto-node-target = st; // to jump to the if statement after// executing the body10append gt to the body of the while // append gt to the body of the while// this requires a loop. check the comment// for the if above.create no-op nodeset if-node-false_branch to point to no-op nodeset st-next to point to no-op node}...The following diagram shows the desired structure for the while statement:

4.4 Handling switch statement

You can handle the switch statement similarly. Use a combination of IfStatement and GotoStatementto support the semantics of the switch statement. See section 3.4 for more information.

5 Executing the intermediate representation

After the graph data structure is built, it needs to be executed. Execution starts with the rstnode in the list. Depending on the type of the node, the next node to execute is determined. Thegeneral form for execution is illustrated in the following pseudo-code.11pc = first nodewhile (pc != NULL){switch (pc-type){case ASSIGN_STMT: // code to execute pc-assign_stmt ...pc = pc-nextcase IF_STMT: // code to evaluate condition ...// depending on the result// pc = pc-if_stmt-true_branch// or// pc = pc-if_stmt-false_branchcase NOOP_STMT: pc = pc-nextcase GOTO_STMT: pc = pc-goto_stmt-targetcase PRINT_STMT: // code to execute pc-print_stmt ...pc = pc-next}}Executing the graph should be done non-recursively and without any function calls.

Even helper functions are not allowed for the execution of the graph. This is a re-quirement that will be checked by inspecting your code. Little credit will be assignedif this requirement is not met.C/C++ implementations If you are developing in C or C++, we have provided you withthe data structures and the code to execute the graph and you must use it. There are two les compiler.h and compiler.c, you need to write your code in separate le(s) and includecompiler.h. The entry point of your code is a function declared in compiler.h:struct StatementNode* parse_generate_intermediate_representation();You need to implement this function.The main() function is given in compiler.c:int main(){struct StatementNode *

program;program=parse_generate_intermediate_representation();execute_program(program);return 0;}It calls the function that you will implement which is supposed to parse the program andgenerate the intermediate representation, then it calls the execute program function to executethe program.

You should not modify any of the given code. In fact if you write your program in Cor C++, you should only submit the le(s) that contain your own code and we will add the givenpart and compile the code before testing. If you write your program in Java, you should strictlyfollow the guidelines for executing the intermediate representation.
126 Input/Output

The input will be read from standard input. We will test your programs by redirecting the standardinput to an input le. You should NOT specify a le name from which to read the input. Outputshould be written to standard output.

7 Requirements
1. Write a compiler that generates intermediate representation for the code and write an inter-preter to execute the intermediate representation. The interpreter is provided for C/C++implementations.
2. Language: You can use Java, C, or C++ for this assignment.
3. Any language other than Java, C or C++ is not allowed for this project.
4. If you use C or C++ for this project, you should use the provided code and only implementthe required functions.
5. If you use Java, you will need to write everything yourself but the requirements on how toexecute the intermediate representation will be checked manually when grading.
6. Platform: As previous projects, the reference platform is CentOS 6.6

7. You can assume that there are no syntax or semantic errors in the input program.

 Bonus Project10.1

Bonus Project Options

You have three options for the bonus project:1. Resubmit project 2. For this option you are given another chance to submit project 2. Thegrade you obtain will be reduced by 30% (as if it is 3 days late) and replaces the grade youalready obtained on project 2. So, if your grade for project 2 was 20 and your grade for thereplacement is 90, the grade for project 2 will be changed from 20 to 63 (63 = 90 reduced by30%)2. Resubmit project 3. For this option you are given another chance to submit project 3. Thegrade you obtain will be reduced by 30% (as if it is 3 days late) and replaces the grade youalready obtained on project 3. So, if your grade for project 3 was 20 and your grade for thereplacement is 100, the grade for project 3 will be changed from 20 to 70 (70 = 100 reducedby 30%)3. Do a new project that replaces the lowest grade between project 2 and project 3. The gradeyou obtain under this option will replace the lower of the two grades that you obtained onproject 2 or 3 (if the grade you obtain on the bonus is lower than what you already got onprojects 2 and 3, no replacement is made).If you make multiple submissions, only the last submission will count.10.2 New Bonus Project (option 3)Support the following grammar:program ! var section bodyvar section ! VAR int var decl array var declint var decl ! id list SEMICOLONarray var decl ! id list COLON ARRAY LBRAC NUM RBRAC SEMICOLONid list ! ID COMMA id list j IDbody ! LBRACE stmt list RBRACEstmt list ! stmt stmt list j stmtstmt ! assign stmt j print stmt j while stmt j if stmt j switch stmtassign stmt ! var access EQUAL expr SEMICOLONvar access ! ID j ID LBRAC expr RBRACexpr ! term PLUS exprexpr ! termterm ! factor MULT termterm ! factorfactor ! LPAREN expr RPARENfactor ! NUMfactor ! var access14print stmt ! print var access SEMICOLONwhile stmt ! WHILE condition bodyif stmt ! IF condition bodycondition ! expr relop exprrelop ! GREATER j LESS j NOTEQUALswitch stmt ! SWITCH var access LBRACE case list RBRACEswitch stmt ! SWITCH var access LBRACE case list default case RBRACEcase list ! case case list j casecase ! CASE NUM COLON bodydefault case ! DEFAULT COLON bodyNote that LBRAC is "[" and LBRACE is "f". The former is used for arrays and the latter isused for body. Assume that all arrays are integer arrays and are indexed from 0 to size - 1, wheresize is the size of the array speci ed in the var section after the ARRAY keyword and between "["and "]".The data structures and code that we have provided for the regular assignment will not beenough for the bonus, you will need to modify those to support arrays. Submit all code les forthe bonus project (including the modi ed compiler.h and compiler.c).All restrictions imposed on the execution of the intermediate representation for theregular project apply to the bonus project as well. You are not allowed to call anyfunctions while executing the intermediate representation. You are not allowed toexecute the program recursively.15
Powered by