Starting from:

$35

Naive Bayes classifier Solution

Implement a Naive Bayes classifier for text classification. This classifier will be used to classify fortune cookie messages into two classes: messages that predict what will happen in the future and messages that just contain a wise saying. We will label messages that predict what will happen in the future as class 1 and messages that contain a wise saying as class 0. For example,







"Never go in against a Sicilian when death is on the line" would be a message in class 0.







"You will get an A in SENG 474" would be a message in class 1.




You can use any language you wish. There are two sets of data files provided:




1. The training data:




 
traindata.txt: This is the training data consisting of fortune cookie messages.




 
trainlabels.txt: This file contains the class labels for the training data.




 
The testing data:




 
testdata.txt: This is the testing data consisting of fortune cookie messages.




 
testlabels.txt: This file contains the class labels for the testing data. These are only used to determine the accuracy of the classifier.




Your results must be stored in a file called results.txt.




 
Run your classifier by training on traindata.txt and trainlabels.txt then testing on traindata.txt and trainlabels.txt. Report the accuracy in results.txt (along with a comment saying what files you used for the training and testing data). In this situation, you are training and testing on the same data. This is a sanity check: your accuracy should be very high i.e. 90%




 
Run your classifier by training on traindata.txt and trainlabels.txt then testing on testdata.txt and testlabels.txt. Report the accuracy in results.txt (along with a comment saying what files you used for the training and testing data). We will not be letting you know beforehand what your performance on the test set should be.




Submit your source code and the results.txt file.




Reference: This exercise is adapted from Weng-Keen Wong, Oregon State University.

More products