Lab 3: "Thanks for All the Fish!!!" solution

In this lab you will develop a software system to collect data from a portion of the book War of
The Worlds by H.G. Wells to verify the experiments performed by Dr. Lawrence. Ultimately, Dr.
Lawrence is trying to apply information theory to see if the chatter of dolphins constitutes a
complex language.
You will create a software system that will analyze English text by creating word frequency
histograms. Some general program specifications are as follows:
1. You must develop a C++ class library to work with the main.cc file provided to
you in Appendix A. Your libraries will be used to create a word frequency histogram
from a text file that is provided to you on t-square called ProcessedWOW.dat. This
file contains modified text for War of the Worlds; as you see in the file it contains
only lower case letters and no puncuation marks.
2. Your program will create two output files that contain the histograms sorted in two
different fashions -- one sort alphabetically and one sorted by frequency.
3. You must create at least TWO C++ classes, which are discussed in the next section.
4. We are limited to data structures that we have talked about so far; therefore, you
can only use static C++ arrays to store your word frequency histograms. A discussion
of a strategy to do this appears later in this lab.
5. You must have three specific files to manage your system. The header file
histogram.h contains your class interfaces for all your classes, the source file
histogram.cc will contain the implementations of your member functions of all
your C++ classes, and the source file main.cc will contain the main function that is
provided to you in Appendix A.
The objectives of this lab are to give you practice:
1. Using basic C++ classes;
2. Creating basic arrays of user-defined objects;
3. Creating constructors and overloading constructors;
4. Using and creating set and get member functions in C++ classes;
5. Using C++ string objects; and
6. Using basic text file I/O objects and operators.
This lab has been tested using the g++ compiler on deepthought.cc.gatech.edu. Please see
Appendix B to see your turn in options.
ECE2036 FALL2017 2
C++ Class Requirements
I would like for you to create at least two separate classes for your program. The specifications
are below.
WordUnit
This class will be called WordUnit, and you will need to create a class with data members that
correspond to a string that contains a single word of text and an integer that contains the number
of times that the word appears in a given book. You can choose your own data member names.
WordHistogram
This class will be call WordHistogram, and it will contain the entire word histogram for the
book that you are analyzing. A primary data member will be an array of WordUnits. We have
not discussed dynamic arrays at this point, so I would like for you to have a static fixed array
with a large number of entries. For this exercise, assume that you have 10,000 elements in your
array. In addition, you will need to have an integer variable that contains the number of elements
that you are storing in this array. Furthermore, I would like for you to have a string that contains
the file name of the text file that contains your book.
You may need other member functions, but as you see in Appendix A, you will definitely need
to create the following member functions in the WordHistogram class.
void makeHistogram()
This member function belongs to the WordHistogram class and it creates the histogram from
the name of the text file that is passed to the constructor. You will need to open the text file and
populate the static array of WordUnits. Because we have not discussed dynamic arrays yet,
please use an array over-allocation strategy as seen below. You will need to keep track of how
much of the array is being used as you build the histogram.
Figure 1: Illustration of using an array with a fixed size to contain a list with an initially
unknown length.
void sortAlphaHistogram()
This member function belongs to the WordHistogram class and it will sort the WordUnit
array in alphabetical order according to the word in each WordUnit. This arrangement of the
array will be useful if you want to quickly find a frequency of a given word.
Example of Array Over-Allocation Strategy
”a” ”the” “he”
5 12 3
size = 3
unused elements
max_size = 11
used elements
“”
0 0 0 0 0 0 0 0 “” “” “” “” “” “” “”
ECE2036 FALL2017 3
void sortFreqHistogram()
When this member function is called, the WordUnit array will be sorted by the frequency
parameter. This will be useful when making a plot of the histogram to see if a slope of -1 in a
log-log plot describes the data.
void exportHistogram(string filename)
This member function in WordHistogram will create an output file that contains the word
histogram. The argument of this function is a string that contains the name of the output file. For
the file that is alphabetized by each word, the format of the output should look something like the
following:
a 937
abandoned 3
abandoning 1
abart 1
ability 1
ablaze 1
able 8
aboard 1
about 112
above 16
Please note that the word and its frequency are separated by a single space.
C++ String Objects
In this lab, you will need manipulate strings in a very basic way. For completeness, I have
included in Appendix E a list of a variety of member functions for C++ string objects. You can
use any of these you like, but I believe that you may find the following operators that can be used
with C++ string objects more useful.
string1 == string2
The equality operator can be used to compare two strings. If ALL characters are the same, then
this operation returns a true value; otherwise, it will return false.
string1 = string1 + string2;
Both the assignment (=) operator and the addition (+) operator can be used with strings. When
used with strings the + operators will concatenate the two string operands.
string1 string2
The greater than (and less than) can both be used with C++ strings objects. One string is greater
than the other when it appears later in an alphabetize list. For example, "cat" is greater than
"apple" because "cat" is listed after "apple" in an alphabetized list.
ECE2036 FALL2017 4
string1[number]
You can also use the indexing operator [] to access each character in the string. This is similar to
accessing an array of characters with zero indexing.
Input and Output Text Files
For this lab, you will do some basic manipulation of text files. In this section, I will show you
how to instantiate an input or output text file object. In addition, you can use the insertion stream
(<<) and extraction stream () operators to send data to an output file or receive data from an
input file, respectively.
Preprocessor Directive
You will need to have the following include statement in your header file that allows you to use
the C++ standard library for I/O files.
#include <fstream
Instantiating Output File Objects
To instantiate an output file object that you can use to manipulate your output text file, you will
need something like the following.
std::ofstream YourOutputFileObject("outputfile.dat", std::ios::out);
The "outputfile.dat" name is arbitrary, and it will create a file in the local directory that you
execute your program in. The std::ios::out is a designation that you are creating an output file;
any existing file with the same name will be overwritten.
Instantiating Input File Objects
To instantiate an input file object from which you can read data, you can do the following:
std::ifstream YourInputFileObject("inputfile.dat", std::ios::in);
Like the output file example, the "inputfile.dat" name is arbitrary and specifies the name of the
file in the local directory from which you would like to read. As you see in the main file, you
can use this file object with the ! operator to check to see if the file is valid.
Insertion Stream Operator
Just like with the cout object, you can use the << operator to write data to an output file. You
use the object name in place of cout in the following way.
YourOutputFileObject << "Hello File! " << std::endl;
Extraction Stream Operator
Furthermore, just like the cin object, you can use the operator to read data from an input file.
You can use the object name in the following way.
YourInputFileObject string1;
This command will read a single string from the input file that is delineated by white space. For
example, if the input file has the following text:
hello from professor Snape
ECE2036 FALL2017 5
The above line would have "hello" stored in string1 with no spaces. Furthermore, the
operation itself will return a true value if a string is successfully read in from the input file. This
can enable you to embed this statement in a while loop condition to access all the strings in a file
sequentially. For example, the following while loop will continue until all the strings have been
read into the program one at a time.
while (YourInputFileObject string1)
{
//manipulate value in string1
}
sellfy