Starting from:

$30

COMP5112- Assignment 1 MPI Programming Solved

Assignment Description
The Smith-Waterman algorithm identifies the similar regions in two genome sequences. In this assignment, you will implement an MPI version of the Smith-Waterman algorithm to measure the similarity between two strings.

The pseudocode of the Smith-Waterman algorithm is shown in Algorithm 1. Given string A = a1,a2,...,an and string B = b1,b2,...,bm, we first construct a scoring matrix H of size (n + 1) ∗ (m+1) and set the first row and column to zero. Then, we iteratively compute the scoring matrix using the following dynamic programming formula:

 Hi−1,j−1 + s(ai,bj)  Hi−1,j − w

Hij = max

Hi,j−1 − w

 0
(1 ≤ i ≤ n,1 ≤ j ≤ m)
where s is the substitution matrix is the match score, v is the

mismatch score, and w is the gap penalty. The value of u, v and w are constant and are give as part of the input to the algorithm. Finally, we return the highest score in the scoring matrix as the similarity score between A and B.

Input and Output
The input file will be in the following format:

1.   The first line contains two integers a_len and b_len (1 ≤ a_len ≤ 20000,1 ≤ b_len ≤ 20000), separated by a space. They represent the length of string a and b, respectively.

2.   The second line is a string a of length a_len, consisting of four letters A,T,G,C.

3.   The third line is a string b of length b_len, consisting of four letters A,T,G,C.




Algorithm 1: The Smith-Waterman Algorithm

Input : string A = a1,a2,...,an and B = b1,b2,...,bm, match score u, mismatch score v, gap penalty w

Output: the similarity score between A and B

1   int score[|A| + 1][|B| + 1];

2   for i ← 0 to |A| do

3score[i][0] ← 0;

4   end

5   for i ← 0 to |B| do

6score[0][i] ← 0;

7             end

8             int max_score ← 0;

9             for i ← 1 to |A| do

10           for j ← 1 to |B| do

11score[i][j] ← max(0,

12score[i − 1][j] − w,

13score[i][j − 1] − w,

14score[i − 1][j − 1] + sub_mat(ai,bj));

15max_score ← max(max_score,score[i][j]);

16           end

17           end



The output of the program consists of two lines:

1.   The similarity score between a and b, i.e., the highest score in the scoring matrix.

2.   The elapsed time in seconds of the execution.

We assume that the match score is 3, the mismatch score is -3, and the gap penalty is 2.

Here is an example input/output for your reference:

Input:

9 8

GGTTGACTA TGTTACGG
Output:

13

Time: 0.00001 s
Submission and Grading
The code skeleton mpi_smith_waterman_skeleton.cpp is provided. Your task is to complete the following function in the code:

int smith_waterman(int my_rank, int p, MPI_Comm comm, char *a, char *b, int a_len, int b_len);



The description of the parameters is as follows:

Parameter
Description
int my_rank
The rank (ID) of current process
int p
Number of processes
MPI_Comm comm
The MPI communicator
char *a
The string a
char *b
The string b
int a_len
The length of string a
int b_len
The length of string b
Notes:

1.   You will be given three files mpi_smith_waterman_skeleton.cpp, main.cpp and mpi_smith_ waterman.h. You only need to complete and submit the mpi_smith_waterman_skeleton.cpp to the Canvas.

2.   The sequential Smith-Waterman algorithm code is also provided for your reference. You can use it to learn the logic flow and verify the correctness of your MPI version. You can also compare the sequential running time with your parallel program performance.

3.   You can add helper functions and variables as you wish in the mpi_smith_waterman_skeleton .cpp file, but keep the other two files: main.cpp and mpi_smith_waterman.h, unchanged.

4.   We will use different input files and specify different numbers of processes (p 0 and p <= 8 in mpiexec –n <p) to test your program. We will also use multiple MPI nodes to test the correctness of your program.

5.   The correctness, running time and speedup of your program will be considered in grading.

6.   We will perform code similarity checks. In case a submission has code similarity issues, we may request clarification and deduct partial marks or full marks on a case-by-case basis.

More products