Starting from:
$30

$24

Assignment 01 Solution

Text count helper




Recently, Melbourne Literature Society has announced that they are hunting for young story writer. The society has asked all young teenager to submit a story. 17 years old Jonathan is interested to participate, but he is a bit tensed. In this competition, there are some conditions. He needs a program to check his writing is meeting the conditions or not. From his school friend, he learnt that in this semester you are focusing on developing e cient algorithms and data structures. He really hopes that you can help him and has come to seek your help. (Please read the conversation very carefully and retrieve the tasks.) You: \Hi Jonathan, How’s going?"




Jonathan:\Hi, I am ok.". You said \Tensed?"




Jonathan:\Hmm". \No worries, tell me how can I help you"- you said to Jonathan.




\You know MLS are hunting for young story writer and I have a dream to be a writer like JK Rowling. I love Harry Porter stories".\I am going to send them my story, but they have some conditions".\My story should meet these conditions, but I do not know how I am going to check these conditions in my writings"- Jonathan said.









You: \Ok, tell me the conditions, I will develop a fast and memory e cient program for you which will check your writing."




Jonathan:\Thanks".\The essay should be within 10000 words. No words will appear more than 50 times".\So I need a program to nd out the number of words in my story including each word count".




You:\Well, that can be done".




Jonathan :\But the word count shall not include the following auxiliary verbs: am, is, are, was, were, has, have, had, been, will, shall, may, can, would, should, might and could; and articles: a, an and the".\It’s really easy task, but what types of punctuation marks are you going to use in our writing"- you asked.




\I will use only comma(,), period(.), question(?), exclamation(!), colon(:), semicolon(;), and quotation(")"




\Ok, I will check these punctuation marks in the program".




\But there is more"-Jonathan.\Ok go ahead"- you.




\The program will show the list of the words, sorted in alphabetical order with their counts. Can I also ask you to nd out the k top most frequent words appears that in my writing."-Jonathan.




You:\Ok, consider it done"




\Thanks. You are great!" replied by Jonathan.




\By the way, when do you need this program?"-You. \I think I need this before 24 March 2019.




I am sorry, it is very short time"-Jonathan.




You:\It’s ok". \Thanks"-Jonathan.\No worries, bye"-You. Jonathan-\Bye".




He is full of excitement and you do not want to disappoint him. You do not have much time, so start working on it now.




Input




The input le named Writing.txt consists of the variable number of words not sorted in alphabetical order. Each word consists of only lowercase English characters, i.e., there is no uppercase letter. Moreover, Sentences of the input are separated by newline.







an aggressive alien creature visited our planet.




it was ugly, with a big nose, pinkish hairy skin, and feet that smelled.




it was frightened of us for no reason.




it resented our differences and laid claim to our planet.




this strange alien was an earth human.




it called me ‘‘alien’’.










Input size. Let n be the total number of words in the writing and m be the maximum number of characters in a word. The upper bound of the input size is O(nm). Your program must be able to do several tasks as described below.




Task 1: Preprocessing




In this task, your program will preprocess the input from Writing.txt by removing the aux-iliary verbs, articles and punctuation marks mentioned in the conversation between you and Jonathan. You will write a function named preprocess for this task. This function will take the lename of the input le as a parameter and will return a list of words after preprocessing.



2Keep in mind: words are separated by whitespace characters (space (‘ ’), tab (‘nt’), and newline (‘nn’)).
Example of parameter and return values:




Parameter: ‘Writing.txt’




Return: [‘aggressive’,‘creature’,‘alien’,‘big’,....]




Important: Any di erence from the above mentioned function name, parameter and return type will e ect the test scripts and your program will NOT pass.




Complexity requirement: The input size is O(nm). So that your program should read and preprocess the input in O(nm) worst-case time complexity. The space complexity must be O(nm).




Task 2: Sorting the words




In this task, you will sort the preprocessed words in alphabetical order in faster way. You will create another function named wordSort to sort the all words in alphabetical order. The parameter of the function will be the list of words which are preprocessed in Task 1. The function will return the sorted list of word which are remained after preprocessing.




Example of parameter and return values:




Parameter: [‘aggressive’,‘creature’,‘alien’,‘big’,....]




Return: [‘aggressive’,‘alien’,‘big’,‘creature’....]




Important: Any di erence from the above mentioned function name, parameter and return type will e ect the test scripts and your program will fail to pass.




Complexity requirement: Let consider the size of preprocessed words is O(nm). You need to develop the faster wordSort to sort the words. The worst case time complexity of function wordSort should be O(nm). Keep in your mind, the space complexity must be O(nm).






Task 3: Showing the number of total words including the frequency of each word




In task 3, you need to calculate and display the number of total words in the writing after pre-processing and the count of each word. To do this, you will write a function named wordCount to calculate and print the total number of words including the frequency (count) of each word. The parameter of the wordCount is the sorted list of words of Task 2. This function will print the total number of words and each word’s counts. Finally, it will return a list with two values:




the total number of words and (b) a list of words with their count. The list of words should be remain sorted.



Example of parameter and return values:




Parameter: [‘aggressive’,‘alien’,‘big’,‘creature’....]




Return: [44,[‘aggressive’,1],[‘alien’,3],[‘big’,1],[‘creature’,1],....]




Important: Any di erence from the above mentioned function name, parameter, return type and ordering of return values will e ect the test scripts and your program will NOT pass.




Complexity requirement: Let consider, the number of preprocessed words is n. To count the number of each word, the worst case time complexity of your function should not be more than O(nm). The space complexity should also be O(nm).






Task 4: Showing the k top most words appears in the writing




This is your nal task in this assignment. In this task, you have to nd out and display the k top-most frequent words in the writing after removing auxiliary verb and articles. You will construct a method named kTopWords to nd out and display the k top most frequent words in the writing. There are two parameters of the function: (a) The value of k and (b) the list of sorted words with their frequencies which will be similar like: [[‘aggressive’,1],[‘alien’,3],[‘big’,2],....],




where the rst element of the list is the word and second element is it’s count. The function will return similar list of its parameter, but will only contain the k number of top-most frequent words. Keep in your mind: if the count of two words are same, the word comes earlier in the sorted list will have higher priority over another.




Example of parameter and return values:




Parameters: 4 and [[‘aggressive’,1],[‘alien’,3],[‘big’,1],[‘creature’,1],....]




Return: [[‘it’,4],[‘alien’,3],[‘our’,3],[‘and’,2]]




Important: Any di erence from the above mentioned function name, parameters, ordering of parameters and return type will e ect the test scripts and your program will NOT pass.




Complexity requirement: To nd the k top most words, the worst case time complexity of your function should not be more than O(n log k). The space complexity should also be O(km).




Output




The output of your program of the above shown input will be as below:







Words are preprocessed..




Do I need to display the remaining words: Y




aggressive




alien




creature




visited




our




planet




it




ugly




with




big




nose




pinkish




hairy




skin




and




feet




that




smelled




it




frightened




of




us




for



4no



reason




it




resented




our




differences




and




laid




claim




to




our




planet




this




strange




alien




earth




human




it




called




me




alien




The remaining words are sorted in alphabetical order




Do you want to see: Y




aggressive




alien




alien




alien




and




and




big




called



claim




creature




differences




earth




feet




for




frightened




hairy




human




it




it




it




it




laid




me




no







5
nose




of




our




our




our




pinkish




planet




planet




reason




resented




skin




smelled




strange




that




this




to




ugly




us




visited




with




The total number of words in the writing: 44




The frequencies of each word:




aggressive : 1




alien : 3




and : 2




big : 1




called : 1




claim : 1




creature : 1




differences : 1




earth : 1




feet : 1




for : 1




frightened : 1




hairy : 1




human : 1




it : 4




laid : 1




me : 1




no : 1




nose : 1




of : 1




our : 3




pinkish : 1




planet : 2




reason : 1




resented : 1







6
skin : 1




smelled : 1




strange : 1




that : 1




this : 1




to : 1




ugly : 1




us : 1




visited : 1




with : 1




How many top-most frequent words do I display: 4




4 top most words appear in the writing are:




it : 4




alien : 3




our : 3




and : 2










If the Writing.txt is empty or there is no word remaining after preprocessing, the output should be exact same as below:







Unable to continue:




Writing.txt is empty or



There is no word remaining after preprocessing.









Note: Your program will be tested on a di erent writing le, i.e., di erent types of writing with di erent number of words.




Things to note




If you decide to use in-built Python functions and structures, you must carefully consider their worst-case complexities. For example, inserting/retrieving an element in Python dictionary (which uses hashing) takes O(n) in worst-case. This is because, as we will later see in the lecture, although hashing is quite e cient in practice, its worst-case complexity for insertion/retrieval is still O(n). So DO NOT use hashing and python dictionary. It may not always be easy to determine the worst-case complexities of all in-built functions and structures. Therefore, it is strongly recommended that you use only the basic data structures (such as Python lists). This assignment can be easily completed using the basic data structures without using any advanced in-built functions.







-=o0o=-




END




-=o0o=-

























7

More products