Project 3

IntroductionYour goal is to write, in C or C++, a program that reads in a description of a context free grammar, then, depending on the command line argument passed to the program, outputs either the FIRST sets for all non-terminals in the grammar or FOLLOW sets for all non-terminals in the grammar or other information about the grammar (read on for details). The input grammar will be given on standard input (stdin), and your program should output to standard output (stdout). Specifications for the input grammar and expected output follow, and they must be followed precisely.Reading Command-line ArgumentsThe following piece of C code shows how to read the first command line argument and call a function based on the argument value:Grammar DescriptionThe grammar specification has multiple sections separated by the # symbol. The grammar specification is terminated with ##. If there are any symbols after the ##, they are ignored. All grammar symbols, as well as the # and ## symbols, are whitespace-separated. The grammar description is a context-free grammar as follows:S → Non-terminal-list Rule-list DOUBLEHASH
Non-terminal-list → ID-list HASH
ID-list → ID ID-list | ID
Rule-list → Rule Rule-list | Rule
Rule → ID ARROW Right-hand-side HASH
Right-hand-side → ID-list | ε
The tokens used in the grammar description are:ID = letter(letter|digit)*
HASH = #
DOUBLEHASH = ##
ARROW = -
where (note that digit is 0-9 and letter is all uppercase and lowercase letters, same as we have been using in class, however we define them here to be precise):digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

letter = a | b | c | d | e | f | g | h | i | j | k | l | m | n | o
| p | q | r | s | t | u | v | w | x | y | z | A | B | C | D | E
| F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V
| W | X | Y | Z
In this project, we assume that there is at least one whitespace character (where whitespace is defined as an input that causes the isspace function in ctype.h to return 1) between any two tokens. Tokens are case-sensitive.SemanticsThe first section of the input lists all the non-terminals of the grammar. The first non-terminal in this list is the start symbol of the grammar. The following sections each represent a grammar rule.A grammar rule starts with a non-terminal symbol (the left-hand side of the rule) followed by -, then followed by a sequence of zero of more terminals and non-terminals, which represent the right-hand side of the rule. If the sequence of terminals and non-terminals in the right-hand side of a rule is empty, and if the left-hand side of the rule is the non-terminal A, then this represents a rule of the form A → ε.ExampleHere is an example input grammar:decl idList1 idList #
decl - idList colon ID #
idList - ID idList1 #
idList1 - #
idList1 - COMMA ID idList1 # ##
The first section is the non-terminals list:Non-Terminals = { decl, idList1, idList }
The rest of the input (terminated by the double-hash) specifies the grammar rules:decl → idList colon ID
idList → ID idList1
idList1 → ε
idList1 → COMMA ID idList1
The set of terminals of the grammar consist of all ID tokens that appear in the grammar rules that are not non-terminals (once again, case sensitive). In the example:Terminals = { colon, ID, COMMA }
Note that even though the example shows that each rule is on a line by itself, a rule can be split into multiple lines, or even multiple rules can be on the same line, according to the formal specification. This input is an equivalent grammar:decl idList1 idList # decl - idList colon ID # idList - ID idList1 #
idList1 - # idList1 - COMMA ID idList1 #

##
ImplementationYou should test your code on CentOS 6.7 as before. You should submit all your source files and there are no automatically-added source code this time. You might use a Makefile for this project if needed. You are welcome to use code from lexer.c and lexer.hfrom the previous project, however be warned that lexer.c and lexer.h are for a different grammar, so you must modify them to accomidate the project 3 grammar.RequirementsYour program should read in the input grammar from standard input, and read the requested task number from the command line arguments as described earlier, then calculate the requested output according to the task number and print the results in the exact format for each task to standard output (stdout). The following specifies the exact requirements for each task number.Task 0: Information about the grammarList all the terminals of the grammar separated by space in the order they appear in the input grammar followed by a new line character. Then output for each non-terminal of the input grammar in the order they appear in the first section of the input (list of non-terminals), the non-terminal followed by a colon : followed by the number of rules in which that non-terminal appears on the left-hand-side followed by a new line character.OutputThe output for the example grammar given previously when run with the command line argument of 0 like ./a.out 0:Task 1: FIRST SetsFor each of the non-terminals of the input grammar, in the order that they appear in the non-terminal section of the input, compute the FIRST set for that non-terminal and output one line in the following format:FIRST(<symbol) = { <set_items }
Where <symbol should be replaced by the non-terminal and <set_items should be replaced by the comma-separated list of elements of the FIRST set ordered in the following manner:
  • If ε belongs in the set, represent it as #
  • If ε belongs in the set, it should be listed before any other elements
  • All other elements of the set should be sorted in dictionary order (specifically, the C function strcmp from string.h defines the sorting order)
OutputThe output for the example grammar given previously when run with the command line argument of 1 like ./a.out 1:FIRST(decl) = { ID }
FIRST(idList1) = { #, COMMA }
FIRST(iDList) = { ID }
Task 2: FOLLOW SetsFor each of the non-terminals of the input grammar, in the order that they appear in the non-terminals section of the input, compute the FOLLOW set for that non-terminal and output one line in the following format:FOLLOW(<symbol) = { <set_items }
Where <symbol should be replaced by the non-terminal and <set_items should be replaced by the comma-separated list of elements of the FOLLOW set ordered in the following manner:
  • If EOF belongs in the set, represent it as $
  • If EOF belongs in the set, it should be listed before any other elements
  • All other elements of the set should be sorted in dictionary order (specifically, the C function strcmp from string.h defines the sorting order)
OutputThe output for the example grammar given previously when run with the command line argument of 2 like ./a.out 2:FOLLOW(decl) = { $ }
FOLLOW(idList1) = { colon }
FOLLOW(idList) = { colon }
EvaluationYour submission will be graded on passing the automated test cases.The test cases (there will be multiple test cases in each category, each with equal weight) will be broken down in the following way (out of 105 points):
  • Task 0 for all grammars: 20 points
  • Calculating FIRST sets for grammars without ε: 30 points
  • Calculating FIRST sets for grammars with ε: 20 points
  • Calculating FOLLOW sets for grammars without ε: 25 points
  • Calculating FOLLOW sets for grammars with ε: 10 points
Note that the output produced by your program should exactly match the expected output, and no partial credit will be given.For every test case, there are 3 expected output files respectively for each task. For example, for the input file test01.txt, there is test01.txt.expected0 for the expected output of task 0, test01.txt.expected1 for the expected output of task 1 and test01.txt.expected2 for the expected output of task 2. There is also a modified version of the test script test1.sh that you should use to test your code with the provided test cases for this project. You can find this test script along with the project material.SubmissionSubmit your code on the course submission website:https://cse340.fulton.asu.edu/cse340/
Powered by