CS471: Parallel Processing Assignment-2 Solved

 Search Engine Helper:


Write a parallel program to search a given corpus and return the most relevant search results. You are given a corpus called Aristo Mini Corpus (https://www.kaggle.com/allenai/aristo-mini-corpus).


Aristo Mini Corpus:


The Aristo Mini corpus contains 1,197,377 science-relevant sentences drawn from public data. It provides simple science-relevant text that may be useful to help answer elementary science questions. You will work on 1500 sentence only divided across 50 File, each file is 30 lines.


Input: a given query in form of a sentence or a question.

Output: search results that contain all the words of the query.




Search query:

Capital of Egypt


If the corpus has the following sentences:



There is a capital for each country.

Capital of Egypt is Cairo.



The Capital of Egypt is Cairo.

You can visit the country you want.


Output should be:


Capital of Egypt is Cairo.  


The Capital of Egypt is Cairo.



Pseudo code of search steps applied for each file:


For each Sentence in File:

             Match = true;

For each word in the query:

IF word not in CurrentSentence:

MatchScore = false;   IF MatchingScore is true:

                         Store Sentence;

                          ResultsFound += 1;


Parallel Scenario:


ü  You will use Master Slave Paradigm.

ü  Master will distribute the corpus files on slaves.

ü  Slaves will search the given part of a corpus.

ü  Each slave will return number of search results found and the corresponding relevant sentences. ü Master will collect the number of search results and write them to a file.


Expected input/output format:


Enter your query: sunlight energy nutrients


Output File:

Search Results Found = 2


Chlorophyll can make food the plant can use from carbon dioxide, water, nutrients, and energy from sunlight.

A process by which a plant produces its food using energy from sunlight, carbon dioxide from the air,and water and nutrients from the soil.




1-      Study the MPI lab of the scatter and gather methods.

2-      You have one week for questions about the assignment and the lab ( 22 Mar. to 28 Mar.).

3-      Use all functions you learned so far in MPI library. (For Allreduce and Allgather it is not a must to use them).

4-      You have to choose your functions carefully, which means if there is a value that should be sent to all slaves use MPI_Bcast, if there are values to be reduced using a specific operator use MPI_Reduce and so on.

5-      Calculate the running time of the parallel program.

6-      Run your code on the attached test cases, to ensure your result is right.





Grading Criteria:


Master workload distribution across slaves:  Using suitable MPI functions
Slave work:

•      Reading files and tokenizing queries.

•      Perform search and send back to master.
Master collection of results:

 writing them to a file (# of Search Results, and the results itself)
Handling remaining workload
Running and valid output
Calculate the parallel running time