Computer Project #05   SOLUTION

 Assignment Deliverable  The deliverable for this assignment is the following file:   proj05.py – the source code for your Python program  Be sure to use the specified file name and to submit it for grading via the handin system before the project deadline.  
Assignment Background  
What do people earn in different jobs?  What is the average annual pay for all full-time employees in the US?  These are interesting questions.  We grabbed a file of data named national_M2014_dl.txt from the data.gov website so you can answer those questions.  Assignment Specifications  
1. The program will prompt for a data file name.  If the file is not found, keep prompting until the file is found. (Hint: See Code Listing 5.6 in the text, i.e. use a while True loop with a try-except block in the loop and a break to exit the loop.  Note that since the text was written there is a new, more specific exception FileNotFoundError to use instead of the broader IOError used in that code listing).  2. The input file national_M2014_dl.txt is formatted as follows.  The file is formatted by columns so string slicing is the way to extract data. Except for the header line (first line), all other lines are formatted as follows:  
• Occupation code: columns 0-10  
• Occupation title: columns 10-120  
• All other fields each occupy 13 columns 
 
• Entry 7 (starting in column 172) is the Average Salary—the value used for this programming project.  (Hint: these values are read as strings so they must be converted to numbers.)  
• Entry 3 (starting in column 120) is the Occupational Group.  For this project we are only interested in rows that are coded “detailed” in this column—data is all other rows are ignored.  
• Entry 2 (starting in column 10) is the Occupation description.  For this project that is the entry that will be referenced when searching for keywords (see item 3 next).  
3. The program will prompt the user for a keyword.  Your program will print a table listing all the occupations that contain the keyword, print the highest and lowest paying occupations that have that keyword in its title, and print the average salary and number of occupations having that keyword.  Salaries and occupations should be formatted nicely in columns with commas in the numerical values.  If there is only one matching occupation neither a max nor a min should be output; also, the word “occupation” should be singular in that case. If at the keyword prompt you simply hit enter, the keyword will be the empty string and all occupations will be selected. See the sample below. If the keyword doesn’t exist in the file, print an error message and do not do any calculations.  
(Mathematical note: the average we are expecting in this project is a simple average of the values from the rows selected by the keyword.  This is a simplification to make your programming easier and it is mathematically incorrect because we should be taking into account how many people have that average salary in calculating the overall average— that is, taking the average of averages is usually meaningless.  You should convince yourself why this is so – feel free to ask your instructor or TA for clarification.)  
4. The inputs are case insensitive, so if user enters “computer”, “COMPUTER” or “CompUter”, your program should find all the occupations having “computer” in the title. Keep in mind that title of occupations in the file are not all in lower case.  
Assignment Notes  
1. You cannot use Python’s lists for this project.  
2. Items 1-6 of the Coding Standards will be enforced for this project.  
3. Lab06 should be a big help so check it out before doing this project  
4. Project should work with any input file that has the same format. Your project will be tested using multiple input files that have different number of lines.   
5. Sometimes there is no data in the file for average salary: there will be a single asterisk (‘*’) character in the field—ignore those values in your calculations.  Trying to convert 
that string to a number will generate an error so you need to check for that case before trying to convert.  You can use a conditional (if) or use a try-except block—whichever you prefer.  
6. Keyword input must be case insensitive so if a user enters “computer”, “COMPUTER” or “CompUter”, your program should find all the occupations having “computer” in the title. Keep in mind that title of occupations in the file are not all in lower case.    
Suggested Procedure  
• Solve the problem using pencil and paper first.  You cannot write a program until you have figured out how to solve the problem.  This first step may be done collaboratively with another student.  However, once the discussion turns to Python specifics and the subsequent writing of Python statements, you must work on your own.  
• Divide-and-conquer!  I start programs that read data from a file by simply opening the file and printing all the contents.  Since there is a header line I skip that next (hint: try readline).  Next I print out all the data from the columns that I am interested in (Entries 2, 3, and 7 in this project).  With that framework you can now select by keyword and print out those columns.  Then count and find the average.  Then min and max.  
• Here is an algorithm to find the minimum value of a collection of numbers that you are reading through.  Start with a huge value and whenever you find a smaller value make that new value the minimum value.  After working your way through all the numbers you end up with the minimum value.  Finding the maximum is similar.  minimum_value = 10e10 # something huge if new_value < minimum_value:    minimum_value = new_value  If you want to also gather some information about the minimum value, e.g. the occupation title in this case, simply add a variable to hold that information and update the information whenever you have found a new minimum value.  • Use the handin system to turn in the first version of your solution.  
• Cycle through the steps to incrementally develop your program:  o Edit your program to add new capabilities. o Test the program and fix any errors. o Regularly use the handin system to submit the current version of your solution.  That way, if something happens and you cannot submit a final version there will be a partial version within handin so you can get partial credit.  
• Be sure to log out when you leave the room, if you’re working in a public lab.  
Sample Output   
   
Powered by