# Programming Project 004  SOLUTION

Assignment Overview
The goal of this project is to gain more practice with file I/O, lists and functions.
Background
Data mining is the process of sorting through large amounts of data and picking out relevant information. Everyone from financial analysts to scientists use it to extract information from enormous data sets. These large data sets and the trend of analyzing them has come to be know as  "Big Data" http://en.wikipedia.org/wiki/Big_data
In this project, we want to do some preliminary data mining of the prices of Apple stock. Your program will calculate the monthly average prices of Apple stock from 1984 to 2013. You will report facts about the monthly highs and lows for this data.   Project Specifications
1. A file of Apple's daily stock's prices will be given to you, whose name is table.csv (we pulled it off the web). This file could be opened by notepad or similar text editor, and is delimited by commas. If you open it with Excel, it will show you the data as a spreadsheet.
2. You must implement the following functions:  a) get_input_descriptor() In this function, you are required to repeatedly prompt for the name of an input file until the user enters filename and the file can be opened for input.  Return a file descriptor attached to the opened file.   b) get_data_list(file_object, column_number) In this function, you are required to read the file of Apple's data. The function is flexible as it can read the data for any column of the data (0 through 6). If you read column 6, you are gathering the data for the "Adjusted Daily Close". If you read column 5, you are gathering data for the "Volume" that day. The function returns a list that consists of tuples. Each tuple is of the form: (date, column_data), the first value is a string, the second is a float. For example: ('201302-08', 474.98) if we were collecting data from column 6.   c) average_data(list_of_tuples) In this function the parameter is a list, the list of tuples generated by get_data_list above. You will average the data for each month, and generate a list of tuples. A tuple here will have the form: (data_avg, date), the first is a float, the second is a string. For example: (2972945.4545454546, '07:1985'). Note the date in the returned list does not contain a day any more.
Because each month has multiple entries the biggest challenge is to collect the data for each
month together.  One way to is to have variables “current_month” and “current_year” and update them when the month changes.  That is, read lines summing data for the “current_month” until you encounter a new month.  Encountering a new month means that you are done summing data for the “current_month” so you can calculate an average for the “current_month”.  After calculating the average, you can now set “current_month” to a new month and start summing values for the new “current_month.”
d) main()             In this function, you:  • call get_input to get a file descriptor • prompt for the column to average • call the get_data function • call the average_data function • print the highest 6 averages (for the column selected) and the lowest 6 averages. Print that data with the month-year information.
Deliverables proj04.py – your source code solution (remember to include your section, the date, project number and comments). 1. Please be sure to use the specified file name, ie. “proj04.py” 2. Save a copy of your file in your CSE account disk space (H drive on CSE computers). 3. You will electronically submit a copy of the file using the “handin” program: http://www.cse.msu.edu/handin/webclient