small ETL program in python..solved

small ETL program in python..solved

This assignment is the second of a mini project where you are demonstrating your understanding of the modules of this class.

The mini project is about building a small ETL program in python. Each pat of the project focuses on different portions / stages of the ETL process.

Assignment Background

Summarizing your acquired knowledge from module 5, we are going to focus on the L part of the ETL process. For reference, ETL stands for Extract, Transform, Load. We are going to focus here on Load.

Assignment Statement

Save data in a database
Read data from a database
You are required to supply Last Name_First Name_Project_Part2.py and Last Name_First Name_create_dbs.py.  Please upload as a single zip file.

Requirements:

Let’s create a place to store each dataset separately.

1.   Create a “create_dbs.py” file.

2.   In this file created in step 1, write python code using sqlite3 to:

1.   Create a “baseball.db” database

1.   Create 1 table named “baseball_stats” with the following columns:

2.   player_name

3.   games_played

4.   average

5.   salary

The “baseball.db” SQLite3 database should look like this:

”’
Write a Python script to store baseball and stock data in SQLite
databases.

 

The “baseball.db” SQLite3 database file should have one table:

Baseball_stats ======================== player_name text games_played int average real salary real

 

The “stocks.db” SQLite3 database file should have one table:

stock_stats ======================== company_name text ticker text country text price real exchange_rate real shares_outstanding real net_income real market_value real pe_ratio real
”’

 

Baseball_stats

========================

player_name          text

games_played        int

average                  real

salary                     real

 

3.   In this file created in step 1, write python code using sqlite3 to:

1.   Create a “stocks.db” database:

2.   Create 1 table named “stock_stats”

1.   company_name

2.   ticker

3.   exchange_country

4.   price

5.   exchange_rate

6.   share_outstanding

7.   net_income

8.   market_value_usd

9.   pe_ratio

The ” stocks.db” SQLite3 database should look like this:

 

stock_stats

========================

company_name       text

ticker                       text

country                    text

price                        real

exchange_rate         real

shares_outstanding real

net_income             real

market_value          real

pe_ratio                   real

 

4.   Create AbstractDAO class. It should have the methods:

1.   it should have 1 (instance) member: db_name

2.   insert_records(records) – Should raise the NotImplementedError

3.   select_all() – Should raise the NotImplementedError

4.   connect():

1.   connect to the database identified by db_name

2.   returns the created connection

5.   Create BaseballStatsDAO class

1.   Class should inherit AbstractDAO

 

1.   Class should implement the methods listed:

1.   insert_records:

1.   takes a list of records as parameter ( BaseballStatsDAO takes BaseballStatRecord)

2.   call the method connect()

3.   using the returned connection, create a cursor.

4.   For each record in the list, write and execute an INSERT INTO statement to save the record’s information to the correct table.

Example for baseball: cursor.execute(“INSERT INTO baseball_stats VALUES ( ?, ? , ? , ?)”, (name, number_games_played, avg, salary))

5.   Commit the connection

6.   Close the connection

2.   select_all

1.   call the method connect()

2.   using the returned connection, create a cursor.

3.   create an empty deque to hold the records in memory

4.   write and execute a SELECT statement to get all the records of the table for the DAO

Example for baseball: cursor.execute(“SELECT player_name, games_played, average, salary FROM baseball_stats;”)

5.   For each row fetched, iterate with a for loop over the result of your select command

1.   Create a new record (BaseballStatRecord)

2.   Add the record to the deque

6.   Close the connection

7.   Return the deque

6.   Create StockStatsDAO class

1.   Class should inherit AbstractDAO

2.   Class should implement the methods listed:

1.   insert_records:

1.   takes a list of records as parameter (StockStatsDAO takes StockStatRecord)

2.   invokes the method connect()

3.   using the returned connection, create a cursor.

4.   For each record in the list, write and execute an INSERT INTO statement to save the record’s information to the correct table.

Example for stocks: see insert statement for baseball stats

5.   Commit the connection

6.   Close the connection

2.   select_all

1.   invokes the method connect()

2.   using the returned connection, create a cursor.

3.   create an empty deque to hold the records in memory

4.   write and execute a SELECT statement to get all the records of the table for the DAO

Example for stocks: refer to select statement for Baseball

5.   For each row fetched, iterate with a for loop over the result of your execute:

1.   Create a new record (StockStatRecord)

2.   Add the record to the deque

6.   Close the connection

7.   Return the deque

 

Use the code written for Project Part 1.  If needed, a partial solution can be provided by your facilitator.  This will only be done on a case by case basis and only in extreme circumstances.  A 10-point deduction will be made if a solution is needed.

This section loads the records into the correct database using the classes from Project Part1

7.   load MLB2008.csv using the BaseballCSVReader

8.   load StockValuations.csv using the StocksCSVReader

9.   Instantiate a new DAO instance for BaseballStats

10.                Instantiate a new DAO instance for StocksStats

11.                Insert the loaded records into baseball database using Baseball DAO’s insert_records.

12.                Insert the loaded records into stocks database using Stocks DAO’s insert_records.

 

Awesome! Now we have data in the database. Let’s use it!

 

Stocks stats:

13.                Using the instance of StockStatsDAO select_all the records

1.   Calculate and print the number of tickers by exchange_country using a dictionary.

Baseball stats:

14.                Using the instance of BaseballStatsDAO select_all the records

1.   Compute the average salary and enter into a dictionary by batting average

2.   Print the dictionary formatting the salary to 2 decimal places.

 

NOTE: use round(record.avg, 3)) The rounding needs to happen at average time

Code/Comment Format

 

Good code includes well named variables that are consistent from the beginning to the end of the program.  Naming of objects should be self-explanatory.  For instance, iterator_for_noun_list is much better than i.

 

Every program consists of a sequence of paragraphs, each of which has objectives, and which builds on the previous paragraphs. We are mostly interested in objectives that are valid at the end of the program so we can verify the program’s design. The following is a preferred form for such paragraph headings.  The # sign is adequate when the comment is a single line.

 

#This is an in-line comment – used to document the code for you, or anyone else, that intends

#To extend the code

 

In-line comments are helpful when one has to go back to the code 6 months later to make changes.

 

For doc strings, python allows the use of triple quotes.  The triple quotes can be either single or double quotes.  A doc sting is generally used as user documentation.  It does not need to include details of the implementation of the program, but instead it provides documentation as how to use the API for the program (input, output etc.)

 

For example:

 

“””

This is an example of a doc string

It allows multiple lines within the string.

 

“””

‘’’

This is an example of a doc string

It allows multiple lines within the string.

 

‘’’

This becomes significant when using functions, classes etc. as the triple quotes help to self-document the parameters and return values of the function.

 

Sample Output: The output is a sample and is not complete

 

0.083 400,000.00

0.130 600,000.00

0.132 1,500,000.00

0.147 6,333,333.00

0.158 14,726,910.00

0.159 3,850,000.00

 

IE 1

CA 10

ZA 1

JP 37

NO 2

SE 7
Powered by