Data Structures Lab 4

Data Structures Lab 4

605.202 Data Structures Lab 4

This lab assignment requires you to compare the performance of two distinct sorting algorithms to obtain some appreciation for the parameters to be considered in selecting an appropriate sort. Write a HeapSort and a Shell Sort. They should both be recursive or both be iterative, so that the overhead of recursion will not be a factor in your comparisons. In this case, iteration is recommended. Be sure to justify your choice. Also, consider how your code would have differed
if you had made the other choice. 
The strategy behind a Shell Sort is to create a more nearly optimal environment for a simple, relatively inefficient sort technique, namely Simple Insertion Sort. This optimal environment allows the simple strategy to be efficient. Use the following sets of increments 
1, 4, 13, 40, 121, 364, 1093, 3280, 9841, 29524 (Knuth’s sequence)
1, 5, 17, 53, 149, 373, 1123, 3371, 10111, 30341
1, 10, 30, 60, 120, 360, 1080, 3240, 9720, 29160
One or more sets of increments of your choice. 
Please note that the increment sets will need to be supplemented if you use data sets larger than 29000. Use the sequences increment sets like you use Knuth’s increments.  Find the first value larger than the file. Move back two increments to find the starting increment.  So in the third sequence, for a file of size 1200, you would use increments of 360, 120, 60, 30,
10, 1, in that order.  You will have four different Shell sorts to run. 
Heap Sort is a practical sort to know and is based on the concept of a heap. It has two phases: Build the heap and extract the elements in sorted order from the heap. Altogether, you will have five sorts. 
Create input files of four sizes: 50, 500, 1000, 2000, and 5000 integers. For each size file, make three versions. On the first, use a randomly ordered data set. On the second, use the integers in reverse order. On the third, use the integers in normal ascending order. (You may use a random number generator or shuffle function to create the randomly ordered file.
It is important to avoid too many duplicates. Keep them to about 1%). This means you have an input set of 15 files plus whatever you deem necessary and reasonable. Files are available in the course site if you want to copy them. Your data should be formatted so that each number is on a separate line with no leading blanks. There should be no blank lines in the file. 
Each sort must be run against all the input files. This will give you at least 75 runs. For grading purposes, for each sort, generate output only from the files of size 50. You will have 15 sets of output to turn in for the size 50 files. Your code needs to print out the sorted values and the times for each of the Shell Sorts and the Heap Sort for each of the three orders for size 50. 
Your program should access the system clock to get some time values for the different runs. The call to the clock should be placed as close as possible to the beginning and the end of each sort. If other code is included, it may have a large, fixed, cost, which would tend to drown out the differences between the runs, if any. Why take a chance! If you get too many zero time data values or any negative time values then you must fix the problem. One way to do this is to use
larger  files than those specified. Another solution is to perform the sorting in a loop, N times, and calculates an average value. You would need to be careful to start over with unsorted data, each time through the loop. 
Turn in a analysis comparing the two sorts and their performance. Be sure to comment on the relative runtimes of the various runs, the effect of the order of the data, the effect of different size files, and the effect of different increment sizes for the Shell Sort. Which factor has the most effect on the efficiency? Be sure to consider both time and space efficiency.
Be sure to justify your data structures. As time permits consider implementing a Straight Insertion Sort to compare with Shell Sort. Also, consider files of size 10,000 or additional random files - perhaps with 15-20% duplicates. Your write-up must include a table of the times obtained. 
In developing this assignment, please keep in mind that you will be turning in your source code to be run against my input. This is in addition to the runs you will need to make for analysis purposes. It needs to print out the sorted values.
For grading purposes, it does not need to print the times, but the times should be printed in the sample runs you turn in.
This lab assignment requires you to compare the performance of two distinct sorting algorithms to obtain some appreciation for the parameters to be considered in selecting an appropriate sort. Write a HeapSort and a Shell Sort. They should both be recursive or both be iterative, so that the overhead of recursion will not be a factor in your comparisons. In this case, iteration is recommended. Be sure to justify your choice. Also, consider how your code would have differed if you had made the other choice.
Late Policy 
Programming assignments must be submitted to the course website by midnight of the last night of the specified module.
Assignments received after that are considered late and will be penalized 5 points for each day late. Assignments more than one week late cannot be accepted, except by prior arrangement with the instructor. Problems with your system do not constitute a legitimate excuse for lateness, so make plans to deal with the unavailability of your system. Programs must compile and produce legitimate output.
 
Style  
Your code must incorporate a consistent, well-documented style.  Required points of style include: 
• Include clear, concise, and adequate inline comments
• Write substantive, descriptive blocks for each module
• Write a substantive, descriptive block for the entire program.
• Comments should describe the function performed, not restate the code in English.
• Comments should explain the purpose of a function or method, its inputs, and outputs.
• Comments should explain the algorithm being applied, a particular approach to a problem, or restrictions in using the code
• Use white space liberally.
• Write one driver that executes your entire code.
• Consistent, well-delineated use of both upper and lower case is encouraged.
• Using indentation to show nesting of control statements is encouraged.
• Set tabs to only 2-3 spaces and keep line lengths to about 78 columns to reduce wrap around.
• Keep code modular. One page is a good rough guide to module size.
• Do not use GOTOs or global variables
• Use include files so that a link step is not necessary. 
Input 
You must correctly handle any required input, turning in output to show that it does.
• Generate your own test cases. You will lose points for not providing adequate additional input.

• Generate input that checks extreme cases
• Generate input with errors that might reasonably occur due to typos or a novice user. Assume if it is possible to make a stupid mistake, someone will do so.
• When testing, approach your code as a total novice, then on a second pass, as though you are an experienced end-user.
• Show that your code does everything it is supposed to do
• Show all reasonable error cases are handled.
• Use named files to handle I/O.  There will be a penalty for hardcoded file names.
• Do NOT use GUI/console input.  
• If you want credit for something, then it is important to demonstrate it with an appropriate I/O set.

Output 
Output is not part of the analysis. Output files must:
• Echo the input as well as contain answers to the required input.
• Be user friendly with additional labels, lines, and white space
• Have statistical information as needed. 
 
Source Code 
You are expected to write your own code except as provided in a specific assignment. Do NOT use code from the internet or other sources to be part of your assignment, except Lab 4. Use of standard libraries is restricted to standard I/O calls and standard math functions, etc. In other words, you can't use the library stack code.
 
Analysis
The analysis should discuss the following points
• Description of your data structures
• Justification of your data structure choices and implementation
• Discussion of the appropriateness to the application
• Description and  justification of your design decisions
• Efficiency with respect to both time and space
• What you learned,
• What you might do differently next time
• Specific requirements in the lab handout
• Discussion of anything you did as an enhancement
• Do not reiterate the requirements of the assignment
• Do not include code or output
• Use Times New Roman Size 12 font, single spaced, with .75-inch margins on all sides.
• Include your name inside the file. 

Powered by