Starting from:
$25

$19

Algorithm Implementation Assignment#1 Solution

OVERVIEW



Purpose: To implement backtracking algorithms and search trees.

Task 1: The first task of the assignment is to create a backtracking algorithm that finds one legal filling of the squares of a given crossword puzzle (if a legal filling exists), as specified in detail below.

Task 2: The second task is to use de la Briandais trees to improve the search efficiency in Task 1 and hence be able to produce all possible legal fillings of a given crossword puzzle.

BACKGROUND



Crossword puzzles are challenging games that test both our vocabularies and our reasoning

skills. However, creating a legal crossword puzzle is not a trivial task. This is because the words both across and down must be legal, and the choice of a word in one direction restricts the possibilities of words in the other direction. This restriction progresses recursively, so that some word choices "early" in the board could make it impossible to complete the board successfully. For example, look at the simple crossword puzzle below (note: no letter Xs appear in the actual puzzle shown below – in this example X is always used as a variable):



Assume that the word LENS has been selected for row 0 of the puzzle, as shown in Figure 1 above. Now, the word in column 1 must begin with an E, the word in column 2 must being with an N and the word in column 3 must begin with an S. All single characters are valid words in our dictionary so the L in column 0 is a valid word but is also irrelevant to the rest of the puzzle, since its progress is blocked by a filled in square. There are many ways to proceed from this point, and finding a good way is part of the assignment. However, if we are proceeding character by character in a row-wise fashion, we now need

Assignment adapted from Dr. John Ramirez’s CS 1501 class.


a letter X1 such that EX1 is a valid prefix to a word. Several letters will meet this criterion (EA, EB and EC are all valid prefixes, just to pick the first three letters of the alphabet). Once a possibility is selected, there are now two restrictions on the next character X2: NX2 must be a valid word and X1X2 must be a valid prefix to a word (see Figure 1). Assume that we choose Q for X1 (since EQ is a valid prefix). We can then choose U for X2, (see Figure 2 (NU is a valid word in our dictionary)). Continuing in the same fashion, we can choose the other letters shown in Figure 2 (in our dictionary QUA, DU and DC are all legal words).

Unfortunately, in row 3, column 1 we run into a problem. There is no word in our dictionary EQUX3 for any letter X3 (note that since we are at a terminating block, we are no longer just looking for a prefix) so we are stuck. At this point we need to undo some of our previous choices (i.e. backtrack) in order to move forward again toward a solution. If our algorithm were very intelligent, it would know that the problem that we need to fix is the prefix EQU in the second column. However, based on the way we progressed in this example, we would simply go back to the previous square (row 3, column 0), try the next legal letter there, and move forward again. This would again fail at row 3, column 1, as shown in Figure 3. Note that the backtracking could occur many times for a given board, possibly going all the way back to the first word on more than one occasion. In fact, the general run-time complexity for this problem is exponential. However, if the board sizes are not too large we can likely solve the problem (or determine that no solution exists) in a reasonable amount of time. One solution (but not the only one) to the puzzle above is shown in Figure 4.

TASK 1 – FINDING A SINGLE SOLUTION



The first task of your assignment is to create a legal crossword puzzle (if it exists) in the following way:

Read a dictionary of words in from a file and form a MyDictionary of these words. The interface DictInterface (in DictInterface.java) and the class MyDictionary (in MyDictionary.java) are provided for you on CourseWeb, and you must use them in this assignment. Read over the code and comments carefully so that you understand what they do and how. The interface DictInterface will also be important for the second task of this assignment. The file used to initialize the MyDictionary will contain ASCII strings, one word per line. Use the file dict8.txt on CourseWeb. If you are unsure of how to use DictInterface and MyDictionary correctly, see the DictTest.java example program (and read the comments).
Read a crossword board in from a file. The name of the file should be specified by the user as a command-line argument. The crossword board will be formatted in the following way:
The first line contains a single integer, N. This represents the number of rows and columns that will be in the board. Since the dictionary will contain up to 8-letter words, your program should handle crosswords up to 8x8 in size.
The next N lines will each have N characters, representing the NxN total locations on the board. Each character will be either
+ (plus) which means that any letter can go in this square
2– (minus) which means that the square is solid and no letter can go in here
A..Z (a letter from A to Z) which means that the specified letter must be in this square (i.e. the square can be used in the puzzle, but only for the letter
indicated)

For the board shown above, the file would be as follows:

4

++++

-+++

++-+

++++

Some test boards have been put onto the assignment page on CourseWeb.

Create a legal crossword puzzle for the given board and print it out to standard output. Many of the test files may have many solutions, but for this part of the project you only need to find one. See the second task of this assignment for information about finding multiple solutions. For example, one output to the crossword shown above in Figure 4 would be
LENS -TON AC-O THAW

Depending upon your algorithm, the single solution that you find may differ from that of my program or your classmates' programs. This is fine as long as all of the solutions are legal. Note that because of the severe performance limitations of the MyDictionary class, some of the run-times for the test files will be very long. See more details on this in testFiles.html available on CourseWeb.

Important Notes on Task 1:

To help you to get started,
first think of boards with all squares open (you can consider filled in squares later). In this case a solution for a KxK board will consist of K words of length K in the columns of the board and K words of length K in the rows of the board.
Construct an array of K StringBuilders for the columns (call it colStr) and an array of K StringBuilders for the rows (call it rowStr), each initially empty. Now consider a single recursive call at square (i,j) on the board. For the current character at position (i,j) to be valid the following must be true:
If j is not an end index, then rowStr[i] + the current char must be a valid prefix in the dictionary
If j is an end index, then rowStr[i] + the current char must be a valid word in the dictionary
If i is not an end index, then colStr[j] + the current char must be a valid prefix in the dictionary
If i is an end index, then colStr[j] + the current char must be a valid word in the dictionary
3
If the character is valid you append it to both corresponding StringBuilders and recurse to the next square. If it is not valid you try the next character at that square or, if all have been tried, you backtrack.
For consistency, call your main program Crossword.java.
Search algorithm details: Carefully consider the algorithm to fill the words into the board. Make sure it potentially considers all possibilities yet does not waste time rechecking prefixes that have already been checked. Although you are not required to use the exact algorithm described above, your algorithm must be a recursive backtracking algorithm that uses pruning. The algorithm you use can vary greatly in its efficiency. If your algorithm is very inefficient or otherwise poorly implemented, you will lose some style points. This algorithm is a significant part of the overall assignment, so put a good amount of effort into doing it correctly. For guidance on your board-filling algorithm, it is strongly recommended that you attend recitation.
The MyDictionary implementation of the DictInterface that is provided to you should work correctly, but it is not very efficient. Note that it is doing a linear search of an ArrayList to determine if the argument is a prefix or word in the dictionary. In the second task of this assignment you will write a more efficient implementation of the DictInterface.
Be sure to thoroughly document your code, especially the code that fills the board.
TASK 2 – FINDING ALL SOLUTIONS USING DE LA BRIANDAIS TREES



In Task 1 of this assignment, you were asked to complete a recursive, backtracking solution to filling in the squares of a crossword puzzle. To do this you utilized the DictInterface interface and the MyDictionary class, both of which were provided for you. Unfortunately, the MyDictionary class implements the DictInterface in a somewhat primitive and inefficient way, utilizing a linear search of the dictionary array. This can lead to long run-times when many searches are required (as in the crossword problem).

Consider de la Briandais (DLB) trees, which we discussed in lecture. Since these trees allow a string to be tested as a word or as a prefix in the dictionary in (typically) time proportional to the length of the string, they appear to be a superior way to implement the DictInterface interface. In Task 2 of this assignment you will do the following:

Implement a DLB as a class, as discussed in lecture and shown in the online notes. This requires the "nodelets" within the DLB to contain a character field, a child reference and a sibling reference. Your class(es) should be written in reasonably good object-oriented style. You may have a number of methods in your DLB class, but it must minimally implement the DictInterface as specified. Verify that this implementation works by testing your DLB using the DictTest.java program that I have provided for you. For consistency in grading and testing, you must call your class DLB and it must be in a file named DLB.java.
Modify your main program (Crossword.java) in the following ways:
Have the type of the DictInterface object be a command-line argument so that the user can run the same program with either a MyDictionary or a DLB. Look over DictTest.java to see how this would be done.
Update your program so that if the DictInterface object is a DLB it will find ALL of the solutions rather than just the first solution. This can be done with two simple changes to your main program:
4
When a solution is output, test again to see the type of the DictInterface object. If the type is MyDictionary, stop the program execution, but if the type is DLB, continue execution.
Rather than having your backtracking algorithm complete (terminate) once a solution is found, have it backtrack and continue to search for solutions. This is the approach taken in the EightQueens.java program. Look over that to get an idea of how to do this.
Since some of the test files will have thousands and even millions of solutions, in this case do not print out every solution. Rather, only print solutions whose number == 0 (mod 10000). In other words, you should print out solution 0, solution 10000, solution 20000, etc. At the end of your execution, print out the total number of solutions found. Don't forget to do this as it will be used in evaluating the correctness of your algorithm.

Compare the execution of your crossword solution with the MyDictionary to that using the DLB. Do this by informally recording the amount of time required for the first solution to be found (or for the program to terminate if no solutions exist) with both dictionaries. To informally time the programs, run them while watching your clock/watch/etc. We are not worried about exact times here – just orders of magnitude. Based on your comparisons of the various test files, determine if there is a significant difference in the run-times. Below are some additional notes about the comparisons:
Ideally you should have only one main program to solve your crossword puzzle problem. The choice whether to use a MyDictionary object or a DLB object for your DictInterface should be an input decision. Since both classes implement DictInterface, they should be interchangeable within your crossword generation algorithm. The choice of which class to use is to be done as a command-line option. This choice should also be used to determine whether to find one solution or all solutions. However, as mentioned above, if you are more comfortable making a second main program for Task 2 that is acceptable.
For some test files, one or perhaps both versions will take several minutes or perhaps hours or even days. Based on some worst-case analysis and some real timing of the smaller files you should be able to get an idea of how long the larger files will take to run. If a program takes more than a few hours to run you can abort the execution.
Once you have completed your comparison runs, write a short paper (2-3 pages, double-spaced) that summarizes your project in the following ways:
Discuss how you solved the crossword-filling problem in some detail. Include both how you set up the data structures necessary for the problem and how your algorithm proceeded. Also indicate any coding or debugging issues you faced and how you resolved them. If you were not able to get the program to work correctly, still include your approach and speculate as to what still needs to be corrected.
Discuss the differences (if any) between the run-times of the program using the MyDictionary and the program using the DLB. Include the (approximate) run-times for the programs for the various files in a table. Include in this discussion an asymptotic analysis of the worst-case run-time for each version of the program. Some values to consider in this analysis may include:

Number of words in the dictionary
Number of characters in a word
Number of possible letters in a crossword location
Number of crossword locations in the puzzle
5
If you were unable to complete the crossword solving program, speculate (using some intelligent guessing) for the actual run-times, but still include the comparison in your paper.

W SECTIONS ONLY: In addition to the paper requirements in 4) above, you must add a section that compares in detail the MyDictionary implementation of DictInterface to the DLB implementation. Discuss how the data is stored, how a search is done, and how long this will take. In particular, elaborate on how (if at all) the DLB is superior to the MyDictionary in testing for prefixes.
Note: The paper will count more toward the overall project grade for the W section than for the regular section. It will also be graded more strictly for spelling, grammar, etc. for the W section than for the regular section. You may also have to submit a revision of this paper at some point later in the term. Make sure the tone of this paper is somewhat formal and technical.
you must submit in a .doc or .docx format. This is required so that we can make electronic comments in your paper and so "track changes" can be used.
Important Notes on Task 2:

If you implement both tasks of this assignment correctly and everything works, you should only need one main program. However, if you are concerned about "breaking" a working crossword program from Task 1 by incorporating logic to find multiple solutions, then you may write a second main program for Task 2 if you wish. If you do this, call your original (from Task 1) program Crossword.java and call your second main program CrosswordB.java.
When grading your assignments, your TAs may redirect your output to a file so that they can refer to it at a later point. To make sure this will work correctly, make sure that once your search algorithm begins there is NO INPUT or anything that would make your program require any user interaction (since the TA will not see any prompts given that they will be sent to a file rather than the display).
Read over the test file page testFiles.html for important information on the test files and some expected results.
Be sure to thoroughly document your code.
This assignment was devised so that Task 1 and Task 2 could be implemented independently. If you are unable to complete one task don't let that prevent you from completing the other. Also, be sure to try each and submit something so that you can get some partial credit.
EXTRA CREDIT



If you are interested in some extra credit for this assignment, here are some possibilities:

Create a third implementation of the DictInterface and compare its performance to that of the other two implementations. One possibility is to use binary search of a sorted array. However, searching for a prefix is tricky if binary search is used, so be careful with this approach (i.e. make sure it is correct).
Do more detailed analysis of the run-times of the two different DictInterface implementations. Add timing to your program and do multiple runs to obtain (reasonably) accurate values of the actual run-times. Create a graph to show how the run-times increase as the crossword board size increases.
6
Add and demonstrate (through a driver program that you write yourself) the delete() method for your DLB (as discussed in lecture).
SUBMISSION REQUIREMENTS



You must submit the following to CourseWeb in a single zipped file:

All source files of program (including files provided to you)
All input data files for the puzzles [including the dict8.txt file]
All output files
Well written/formatted paper explaining your search algorithm and results (see Task 2 above for details on the paper)
Assignment Information Sheet (including compilation and execution information).
The idea from your submission is that your TA can unzip your .zip file, then compile and run your programs from the command line WITHOUT ANY additional files or changes, so be sure to test it thoroughly before submitting it. If the TA cannot compile or run your submitted code it will be graded as if the program does not work.

If you cannot get the programs working as given, clearly indicate any changes you made and clearly indicate why on your Assignment Information Sheet. You will lose some credit for not getting it to work properly, but getting the main programs to work with modifications is better than not getting them to work at all. A template for the Assignment Information Sheet can be found in the assignment’s CourseWeb folder. You do not have to use this template but your sheet should contain the same information.

Note: If you use an IDE such as NetBeans, Eclipse, or IntelliJ, to develop your programs, make sure they will compile and run on the command line before submitting – this may require some modifications to your program (such as removing some package information).

RUBRICS



Please check the grading rubrics on CourseWeb.
7

More products