# Common elements

Common elements in K

In the National Football League, when teams are seeded for the playoffs, there is always the possibility that two or more teams may have the same won-lost-tied record. There is an ordered, step-by-step, tiebreaker process by which such ties are broken. If the first step in the process fails to break the tie, the next step is applied, and so on, until the tie is broken. This assignment focuses on one particular step, which happens to be the third step in the process: the won-lost-tied percentages in common games between the two teams are compared. Two games are considered common between two teams if each of those teams faced the same opponent. This problem can be abstracted into a more general problem: Given two collections of elements, A and B, generate a third collection, C, containing only those elements common in collections A and B, with duplicates allowed. Note that any type of collection (list, array, etc.) can be used, and no guarantees are made regarding the order of the elements in either collection. If we build a collection for each team in question that contains the name or identifier of the opponent that was faced in each game the team played, we end up with two separate collections of opponents. The goal is to generate a third collection that contains only the common opponents faced by both teams. For example, consider two teams from the AFC North Division from the 2011 season, the Pittsburgh Steelers and the Baltimore Ravens. Both teams finished the regular season with won-lost-tied records of 12-4-0. As it turns out, the Ravens won the tiebreaker not on the rule we’re describing in this assignment, but on the rule that the Ravens won both head-to-head matchups during the season. But let’s assume the tiebreaker had to be resolved by the won-lost-tied percentage in common games. Table 1 shows the entire regular season schedules for both teams, and indicates common opponents in bold. Although in this example, most of the games are common;i.e., both teams faced common opponents, this is not a requisite condition for the problem. The result returned by an algorithm solving this problem should return a third collection containing only those teams which both the Steelers and the Ravens faced during the season. The order in which these common teams are present in this collection is not important, and duplicates should be included. Table 2 shows one possible correct result.

Game Pittsburgh Steelers Baltimore Ravens

1 Baltimore Ravens Pittsburgh Steelers

2 Seattle Seahawks Tennessee Titans

3 IndianapolisColts St. Louis Rams

4 Houston Texans New York Jets

5 Tennessee Titans bye week

6 Jacksonville Jaguars Houston Texans

7 Arizona Cardinals Jacksonville Jaguars

8 New England Patriots Arizona Cardinals

9 Baltimore Ravens Pittsburgh Steelers

10 Cincinnati Bengals Seattle Seahawks

11 bye week Cincinnati Bengals

12 Kansas City Chiefs San Francisco 49ers

13 Cincinnati Bengals Cleveland Browns

14 Cleveland Browns Indianapolis Colts

15 San Francisco 49ers San Diego Chargers

16 St. Louis Rams Cleveland Browns

17 Cleveland Browns Cincinnati Bengals

Table 1. Regular season opponents of the Pittsburgh Steelers and Baltimore Ravens, listed in the order in which the games were played. Note that each team had a “bye” week, during which they did not play.

Seattle Seahawks

Indianapolis Colts

Houston Texans

Tennessee Titans

Jacksonville Jaguars

Arizona Cardinals

Cincinnati Bengals

Cincinnati Bengals

Cleveland Browns

San Francisco 49ers

St. Louis Rams

Cleveland Browns

Table 2. One possible correct result showing all common elements (duplicates included) between the two collections in Table 1.

A Quadratic Solution This problem can be solved in a number of ways. The most straightforward approach to this problem would be to set up a nested loop structure, where the collection #1 is traversed, and for each element in collection #1, collection #2 is traversed to see whether or not the element is present. Table 3 shows the regular season schedules for both the Steelers and the Ravens as generically labeled collections. Below is the algorithm, as pseudocode. Note that once a common element has been found and added to the collection of common elements, the inner loop is immediately terminated, to avoid adding the same element to the common collection multiple times.

for each element X in C1

{

for each element Y in C2 {

if X.compareTo(Y) == 0 {

add X to common collection Z break

}

}

}

The running time for this algorithm is quadratic. For 2 collections, both of length N, the worst case running time would be: O(N2). The running time worsens linearly with each additional collection. It should be apparent that for each additional collection, the worst case running time effectively adds 1 to the coefficient of the quadratic term in the Big-Oh formula, assuming all collections are of the same length. For example, for 3 collections, all of length N, the running time would be O(N * 2N), or O(2N2); for 4 collections, all of length N, the running time would be O(3N2); and so on. Note that the coefficient of the quadratic term is always equal to the number of collections minus 1. This is due to the fact that if any element comparisons are performed, ther must be at least 2 colelctions. If there is only a single collection, then the collection of common elements is simply the collection itself. The general formula, allowing for an arbitrary number of collections of differing lengths, is:

Formula 1: For k collections C1, C2, C3, ..., Ck, of lengths N1, N2, N3, ..., Nk, respectively, where one collection of which is chosen as the query collection, Cq, of length Nq, the worst case running time for finding all elements common to all k collections is proportional to: O(Nq * (∑[N1, N2, N3, ..., Nk] - Nq)). In Formula 1, the length, Nq, is the length of the collection chosen to be the “query” collection, which corresponds to the collection labeled C1 in the pseudocode. This designation of a query collection is essential for the algorithm to function, since any comparison requires two elements (not counting selfcomparisons).

If all the collections have the same length, the general formula becomes more specific as follows:

Formula 2: For k collections C1, C2, C3, ..., Ck, all of which have length = N, the worst case running time for finding all elements common to all k collections is proportional to: O(N * (kN - N)), or O((k - 1)N2). Although any one of the collections can be designated as the query collection, the running time will be minimized if the smallest collection is used as the query collection.

This is stated more generally by Corollary 1: Corollary 1: For any k collections, where Nq represents the size of the smallest collection, the number of elements common to all collections will be less than or equal to Nq.

C1 C2

Baltimore Ravens Pittsburgh Steelers

Seattle Seahawks Tennessee Titans

Indianapolis Colts St. Louis Rams

Houston Texans New York Jets

Tennessee Titans bye week

Jacksonville Jaguars Houston Texans

Arizona Cardinals Jacksonville Jaguars

New England Patriots Arizona Cardinals

Baltimore Ravens Pittsburgh Steelers

Cincinnati Bengals Seattle Seahawks

bye week Cincinnati Bengals

Kansas City Chiefs San Francisco 49ers

Cincinnati Bengals Cleveland Browns

Cleveland Browns Indianapolis Colts

San Francisco 49ers San Diego Chargers

St. Louis Rams Cleveland Browns

Cleveland Browns Cincinnati Bengals

What You Need to Do: If we were only interested in two relatively small collections, as shown in Table 3, the quadratic algorithm described above may be adequate. However, if N is very large, and/or there are many collections (i.e., k is very large), the quadratic algorithm may be too slow to be useful. There are a number of strategies that can be employed to reduce the running time, and at this point we have covered enough topics to design a more efficient algorithm. Based on the material covered thus far in this course, design and implement a more efficient algorithm for finding the common elements of a set of collections. Your algorithm should satisfy the following criteria:

1. It should be able to accept as input 0 to k collections, stored as simple arrays.

2. The elements of the collections should all be of type Comparable, and they should all be derived from the same base class (not counting the Object class). Implementation of the Comparable interface is necessary since the elements must be compared to each other in order to determine commonality. They must all be derived from the same base class in order for comparisons between two items to be meaningful.

3. Duplicate elements should be allowed; e.g., if there are M instances of the value, “XYZ”, in all the input collections, there should be M instances of the value, “XYZ”, in the collection of common elements.

4. The collections should be allowed to be of varying lengths; i.e., some collections may have more items than others.

5. One of the collections must be designated as the “query” collection, which is the collection containing the elements to which the elements in the other collections are compared.

6. The total number of element comparisons performed should be less than the value for the quadratic solution described above. That is, the total number of comparisons in the worst case should be less than kN2. Do not be concerned about average performance. Also, the total number of comparisons is defined, for this assignment, to be only those comparisons that are performed once the traversal of the query collection begins, and the other collections are checked for the presence of the elements in the query collection.

The framework for your algorithm should satisfy the following criteria, for ease in testing:

1. Create a class called CommonElements, to contain your algorithm and associated methods and attributes.

2. In your CommonElements class, encapsulate your algorithm within a method called findCommonElements, that has the following signature: public Comparable[] findCommonElements(Object[] collections). The argument to this method, collections, will be the set of k collections discussed earlier. Each collection will be represented as an array of objects of type Comparable.

3. The value returned by your findCommonElements method should be a collection of Comparable elements that contains only the elements common to all the input collections.

4. Since you are being asked to evaluate your algorithm based on the number of comparisons performed, you will need to have your findCommonElements method maintain a running total of comparisons performed for each set of collections tested. You should create an attribute called comparisons in your CommonElements class to store the number of comparisons, and provide a getter method called getComparisons() to return this value. In order to keep a running total of comparisons, you will need to instrument your code by incrementing the comparisons attribute each time a comparison between two elements is made. Since element comparisons are typically performed in if statements, you may need to increment comparisons immediately before each comparison is actually performed. Although that may sound counter-intuitive, if you try to increment comparisons inside the if statement, after the element comparison has been made, you will miss all the comparisons that cause -the condition inside the if statement to evaluate to false.

It is important that you adhere to the framework specification above. To facilitate testing of your program, I will use a test harness that will do the following:

1. Creates an instance of your CommonElements class.

2. Calls your findCommonElements method with a set of test collections as input.

3. Verifies that the collection that is returned by findCommonElements correctly contains the elements common to all the input collections.

4. Retrieves the number of comparisons that were performed, via your getComparisons() method.

5. Compares the number of comparisons performed to the target value stated in criterion #5 above for the algorithm. Thus, it is essential that you name your class and methods as described above, or my test harness will not work, and it will take longer to test your program.

A Note About Testing You will need to develop several sets of test collections for testing your algorithm. The grading rubric mentions covering the case where all the test collections have the same length, as well as covering the case where the test collections are of different lengths. You will also need to think about what constitutes the worst case scenario for this algorithm, since only that scenario will make your analysis of total comparisons performed a meaningful one. You can use the formulas in the grading rubric to tell you how many comparisons you should expect in the quadratic and linear cases. For example, if you have 5 total collections (1 query collection and 4 test collections), each of which contains 10 elements, the total number of comparisons performed in the worst case should be: (k - 1)N2 , which for k = 5 and N = 10 is: (5 - 1)102 , or 400 comparisons. For the linear algorithm, you should only have N*(k - 1), which is 10*(5 - 1), or 40 comparisons

In the National Football League, when teams are seeded for the playoffs, there is always the possibility that two or more teams may have the same won-lost-tied record. There is an ordered, step-by-step, tiebreaker process by which such ties are broken. If the first step in the process fails to break the tie, the next step is applied, and so on, until the tie is broken. This assignment focuses on one particular step, which happens to be the third step in the process: the won-lost-tied percentages in common games between the two teams are compared. Two games are considered common between two teams if each of those teams faced the same opponent. This problem can be abstracted into a more general problem: Given two collections of elements, A and B, generate a third collection, C, containing only those elements common in collections A and B, with duplicates allowed. Note that any type of collection (list, array, etc.) can be used, and no guarantees are made regarding the order of the elements in either collection. If we build a collection for each team in question that contains the name or identifier of the opponent that was faced in each game the team played, we end up with two separate collections of opponents. The goal is to generate a third collection that contains only the common opponents faced by both teams. For example, consider two teams from the AFC North Division from the 2011 season, the Pittsburgh Steelers and the Baltimore Ravens. Both teams finished the regular season with won-lost-tied records of 12-4-0. As it turns out, the Ravens won the tiebreaker not on the rule we’re describing in this assignment, but on the rule that the Ravens won both head-to-head matchups during the season. But let’s assume the tiebreaker had to be resolved by the won-lost-tied percentage in common games. Table 1 shows the entire regular season schedules for both teams, and indicates common opponents in bold. Although in this example, most of the games are common;i.e., both teams faced common opponents, this is not a requisite condition for the problem. The result returned by an algorithm solving this problem should return a third collection containing only those teams which both the Steelers and the Ravens faced during the season. The order in which these common teams are present in this collection is not important, and duplicates should be included. Table 2 shows one possible correct result.

Game Pittsburgh Steelers Baltimore Ravens

1 Baltimore Ravens Pittsburgh Steelers

2 Seattle Seahawks Tennessee Titans

3 IndianapolisColts St. Louis Rams

4 Houston Texans New York Jets

5 Tennessee Titans bye week

6 Jacksonville Jaguars Houston Texans

7 Arizona Cardinals Jacksonville Jaguars

8 New England Patriots Arizona Cardinals

9 Baltimore Ravens Pittsburgh Steelers

10 Cincinnati Bengals Seattle Seahawks

11 bye week Cincinnati Bengals

12 Kansas City Chiefs San Francisco 49ers

13 Cincinnati Bengals Cleveland Browns

14 Cleveland Browns Indianapolis Colts

15 San Francisco 49ers San Diego Chargers

16 St. Louis Rams Cleveland Browns

17 Cleveland Browns Cincinnati Bengals

Table 1. Regular season opponents of the Pittsburgh Steelers and Baltimore Ravens, listed in the order in which the games were played. Note that each team had a “bye” week, during which they did not play.

Seattle Seahawks

Indianapolis Colts

Houston Texans

Tennessee Titans

Jacksonville Jaguars

Arizona Cardinals

Cincinnati Bengals

Cincinnati Bengals

Cleveland Browns

San Francisco 49ers

St. Louis Rams

Cleveland Browns

Table 2. One possible correct result showing all common elements (duplicates included) between the two collections in Table 1.

A Quadratic Solution This problem can be solved in a number of ways. The most straightforward approach to this problem would be to set up a nested loop structure, where the collection #1 is traversed, and for each element in collection #1, collection #2 is traversed to see whether or not the element is present. Table 3 shows the regular season schedules for both the Steelers and the Ravens as generically labeled collections. Below is the algorithm, as pseudocode. Note that once a common element has been found and added to the collection of common elements, the inner loop is immediately terminated, to avoid adding the same element to the common collection multiple times.

for each element X in C1

{

for each element Y in C2 {

if X.compareTo(Y) == 0 {

add X to common collection Z break

}

}

}

The running time for this algorithm is quadratic. For 2 collections, both of length N, the worst case running time would be: O(N2). The running time worsens linearly with each additional collection. It should be apparent that for each additional collection, the worst case running time effectively adds 1 to the coefficient of the quadratic term in the Big-Oh formula, assuming all collections are of the same length. For example, for 3 collections, all of length N, the running time would be O(N * 2N), or O(2N2); for 4 collections, all of length N, the running time would be O(3N2); and so on. Note that the coefficient of the quadratic term is always equal to the number of collections minus 1. This is due to the fact that if any element comparisons are performed, ther must be at least 2 colelctions. If there is only a single collection, then the collection of common elements is simply the collection itself. The general formula, allowing for an arbitrary number of collections of differing lengths, is:

Formula 1: For k collections C1, C2, C3, ..., Ck, of lengths N1, N2, N3, ..., Nk, respectively, where one collection of which is chosen as the query collection, Cq, of length Nq, the worst case running time for finding all elements common to all k collections is proportional to: O(Nq * (∑[N1, N2, N3, ..., Nk] - Nq)). In Formula 1, the length, Nq, is the length of the collection chosen to be the “query” collection, which corresponds to the collection labeled C1 in the pseudocode. This designation of a query collection is essential for the algorithm to function, since any comparison requires two elements (not counting selfcomparisons).

If all the collections have the same length, the general formula becomes more specific as follows:

Formula 2: For k collections C1, C2, C3, ..., Ck, all of which have length = N, the worst case running time for finding all elements common to all k collections is proportional to: O(N * (kN - N)), or O((k - 1)N2). Although any one of the collections can be designated as the query collection, the running time will be minimized if the smallest collection is used as the query collection.

This is stated more generally by Corollary 1: Corollary 1: For any k collections, where Nq represents the size of the smallest collection, the number of elements common to all collections will be less than or equal to Nq.

C1 C2

Baltimore Ravens Pittsburgh Steelers

Seattle Seahawks Tennessee Titans

Indianapolis Colts St. Louis Rams

Houston Texans New York Jets

Tennessee Titans bye week

Jacksonville Jaguars Houston Texans

Arizona Cardinals Jacksonville Jaguars

New England Patriots Arizona Cardinals

Baltimore Ravens Pittsburgh Steelers

Cincinnati Bengals Seattle Seahawks

bye week Cincinnati Bengals

Kansas City Chiefs San Francisco 49ers

Cincinnati Bengals Cleveland Browns

Cleveland Browns Indianapolis Colts

San Francisco 49ers San Diego Chargers

St. Louis Rams Cleveland Browns

Cleveland Browns Cincinnati Bengals

What You Need to Do: If we were only interested in two relatively small collections, as shown in Table 3, the quadratic algorithm described above may be adequate. However, if N is very large, and/or there are many collections (i.e., k is very large), the quadratic algorithm may be too slow to be useful. There are a number of strategies that can be employed to reduce the running time, and at this point we have covered enough topics to design a more efficient algorithm. Based on the material covered thus far in this course, design and implement a more efficient algorithm for finding the common elements of a set of collections. Your algorithm should satisfy the following criteria:

1. It should be able to accept as input 0 to k collections, stored as simple arrays.

2. The elements of the collections should all be of type Comparable, and they should all be derived from the same base class (not counting the Object class). Implementation of the Comparable interface is necessary since the elements must be compared to each other in order to determine commonality. They must all be derived from the same base class in order for comparisons between two items to be meaningful.

3. Duplicate elements should be allowed; e.g., if there are M instances of the value, “XYZ”, in all the input collections, there should be M instances of the value, “XYZ”, in the collection of common elements.

4. The collections should be allowed to be of varying lengths; i.e., some collections may have more items than others.

5. One of the collections must be designated as the “query” collection, which is the collection containing the elements to which the elements in the other collections are compared.

6. The total number of element comparisons performed should be less than the value for the quadratic solution described above. That is, the total number of comparisons in the worst case should be less than kN2. Do not be concerned about average performance. Also, the total number of comparisons is defined, for this assignment, to be only those comparisons that are performed once the traversal of the query collection begins, and the other collections are checked for the presence of the elements in the query collection.

The framework for your algorithm should satisfy the following criteria, for ease in testing:

1. Create a class called CommonElements, to contain your algorithm and associated methods and attributes.

2. In your CommonElements class, encapsulate your algorithm within a method called findCommonElements, that has the following signature: public Comparable[] findCommonElements(Object[] collections). The argument to this method, collections, will be the set of k collections discussed earlier. Each collection will be represented as an array of objects of type Comparable.

3. The value returned by your findCommonElements method should be a collection of Comparable elements that contains only the elements common to all the input collections.

4. Since you are being asked to evaluate your algorithm based on the number of comparisons performed, you will need to have your findCommonElements method maintain a running total of comparisons performed for each set of collections tested. You should create an attribute called comparisons in your CommonElements class to store the number of comparisons, and provide a getter method called getComparisons() to return this value. In order to keep a running total of comparisons, you will need to instrument your code by incrementing the comparisons attribute each time a comparison between two elements is made. Since element comparisons are typically performed in if statements, you may need to increment comparisons immediately before each comparison is actually performed. Although that may sound counter-intuitive, if you try to increment comparisons inside the if statement, after the element comparison has been made, you will miss all the comparisons that cause -the condition inside the if statement to evaluate to false.

It is important that you adhere to the framework specification above. To facilitate testing of your program, I will use a test harness that will do the following:

1. Creates an instance of your CommonElements class.

2. Calls your findCommonElements method with a set of test collections as input.

3. Verifies that the collection that is returned by findCommonElements correctly contains the elements common to all the input collections.

4. Retrieves the number of comparisons that were performed, via your getComparisons() method.

5. Compares the number of comparisons performed to the target value stated in criterion #5 above for the algorithm. Thus, it is essential that you name your class and methods as described above, or my test harness will not work, and it will take longer to test your program.

A Note About Testing You will need to develop several sets of test collections for testing your algorithm. The grading rubric mentions covering the case where all the test collections have the same length, as well as covering the case where the test collections are of different lengths. You will also need to think about what constitutes the worst case scenario for this algorithm, since only that scenario will make your analysis of total comparisons performed a meaningful one. You can use the formulas in the grading rubric to tell you how many comparisons you should expect in the quadratic and linear cases. For example, if you have 5 total collections (1 query collection and 4 test collections), each of which contains 10 elements, the total number of comparisons performed in the worst case should be: (k - 1)N2 , which for k = 5 and N = 10 is: (5 - 1)102 , or 400 comparisons. For the linear algorithm, you should only have N*(k - 1), which is 10*(5 - 1), or 40 comparisons

You'll get a 1.4KB .ZIP file.