Web Scraping I ISTA 350 Hw6

Introduction. This homework is the first part of an introduction to the art of web scraping. It is a threepart assignment. In the first part, you will grab the data you need from the web, put it into an html parser, and save the result into a file. Along the way, you will meet the os, requests, bs4, and zlib modules. Instructions. Create a module named hw6.py. Below is the spec for three functions and main. Implement them and upload your module to the D2L dropbox. Testing. Download hw6_test.py and the associated files necessary for testing and put them in the same folder as your hw6.py module. Run it from the command line to see your current correctness score. Each of the four functions in hw6.py is worth 25% of your correctness score. The test file we will use to grade your program will be different and may uncover failings in your work not evident upon testing with the provided file. Add any necessary tests to make sure your code works in all cases.
Documentation. Your modules must contain a header docstring containing your name, your section leader’s name, the date, ISTA 350 Hw6, and a brief summary of the module. Each method/function must contain a docstring. Each docstring should include a description of the function’s purpose, the name, type, and purpose of each parameter, and the type and meaning of the function’s return value. Grading. Your module will be graded on correctness, documentation, and coding style. Code should be clear and concise. You will only lose style points if your code is a real mess. Include inline comments to explain tricky lines and summarize sections of code (not necessary on this assignment). Collaboration. Collaboration is allowed. You are responsible for your learning. Depending too much on others will hurt you on the tests. “Helping” others too much harms them in reality. Cite any sources/collaborators in your header docstring. Leaving this out is dishonest. Resources. https://docs.python.org/3.4/library/os.html https://stackoverflow.com/questions/27803503/get-html-using-python-requests http://docs.python-requests.org/en/latest/index.html http://docs.python-requests.org/en/latest/user/quickstart/ https://pypi.python.org/pypi/beautifulsoup4/ http://www.crummy.com/software/BeautifulSoup/bs4/doc/ http://www.zlib.net/manual.html https://docs.python.org/3.4/library/zlib.html hw6.py: get_soup: this function has three parameters. The first is a string representing a URL and has a default argument of None. The second is a string named fname representing a filename also with default argument of None. The third is a Boolean named gzipped with a default value of False. True is passed to this parameter if the html to be parsed is gzipped. If the filename is not None, read the file, pass its contents to the BeautifulSoup constructor, and return the resulting object. Otherwise, if the url is None, raise a RuntimeError with the message 'Either url or filename must be specified.' (I have noticed that many of you are not raising your errors with meaningful messages). If it is not None, send a get request to the server. If the response content is zipped (third parameter), unzip it. Pass the content to the BeautifulSoup constructor and return the resulting object. save_soup: this function takes two arguments, a filename and a soup object. Save a textual representation of the soup object in the file. scrape_and_save: this function scrapes the following addresses, soupifies the contents, and stores a textual representation of these objects in the files 'wrcc_pcpn.html', 'wrcc_mint.html', and 'wrcc_maxt.html', respectively: http://www.wrcc.dri.edu/WRCCWrappers.py?sodxtrmts+028815+por+por+pcpn+none+msum+5+01+F http://www.wrcc.dri.edu/WRCCWrappers.py?sodxtrmts+028815+por+por+mint+none+mave+5+01+F http://www.wrcc.dri.edu/WRCCWrappers.py?sodxtrmts+028815+por+por+maxt+none+mave+5+01+F main: check the current directory for any one of the files that scrape_and_save creates. If it is not there, print '---- scraping and saving ----' and scrape and save the addresses.
Powered by