Starting from:

$30

COMP2210-Assignment 3 Identifying Line Segments in 2D Data Solved

This will explore an example feature extraction problem. Feature extraction is a subproblem of pattern recognition and is also used in areas such as statistical analysis, computer vision, and image processing. For example, an image processing problem may use a feature extraction algorithm to identify particular shapes or regions in a digitized image.

In this , we’re going to focus on a very simple feature extraction problem: Given a set of points in two-dimensional space, identify every subset of four or more points that are collinear. For example, given the set of points depicted in Figure 1, your program would detect the three groups of collinear points



 
 


as depicted by the line segments in Figure 2.

Figure 1: A set of 13 points.



Figure 2: Three collinear groups identified.

As always, we want our solution to be useful at scale. For example, Figure 3 plots ⇠100,000 points and Figure 4 shows the 34 collinear groups identified by blue line segments. Each collinear group in Figure 4 is composed of far more than four points; four is just the minimum number of points to qualify for the collinear pattern that we’re looking for. In the general problem statement we will refer to line segments instead of collinear groups, where each line segment must contain at least four points.

Problem Statement: Given a set of N distinct points in Quadrant I of the Cartesian plane, identify every line segment that connects a subset of four or more of the points. Each point will be specified as an (x,y) pair where x and y are non-negative int values. For example, the thirteen points in Figures 1 and 2 are: (1, 7), (2, 2), (2, 5), (3, 1), (4, 4), (5, 3), (5, 6), (6, 6), (7, 1), (7, 3), (7, 4), (7, 9), (8, 8).

You must solve this problem in terms of the classes and methods described in the following sections.

1



Figure 3: A set of ⇠100,000 points.

The Point class



Figure 4: 34 collinear groups identified.

You must create an immutable[1] data type Point that represents a point in Quadrant I of the Cartesian plane. A shell of the Point class is provided for you, and you must meet the requirements specified in this document and the provided source code comments.

Some fields and methods have been completed for you and must not be changed. Some methods have incomplete bodies and you must complete them yourself. You may add any number of private methods that you like, but you may not add any public method or constructor, nor may you change the signature of any public method or constructor. You may not add any fields to this class.

A few aspects of the Point class are described in more detail below.

The compareTo method.

This method must compare points by y-coordinates and then, if needed, by x-coordinates. Thus, the invoking point (x0,y0) is less than the parameter point (x1,y1) if and only if either y0 < y1 or if y0 = y1 and x0 < x1. For example, by this natural order of points, (0, 1) is less than (0, 2), (7, 1) is less than (5, 3), and (3, 0) is less than (4, 0). (See Figure 5.) Two points are equal if and only if y0 = y1 and x0 = x1. This is consistent with the equals method, which is provided for you and must not be changed.

The slopeTo method.

This method must return the slope between the invoking point (x0,y0) and the parameter point (x1,y1), which is given by the formula:



For example, for the point (3, 3), the slope to (1, 1) is 1.0, the slope to (4, 5) is 2.0, and the slope to (5, 2) is -0.5. (See Figure 6.)

Treat the slope of a horizontal line segment (e.g., {(1,3), (3,3)}) as positive zero[2]; treat the slope of a vertical line segment (e.g., {(3,3), (3,5)}) as positive infinity2; treat the slope of a degenerate line segment (between a point and itself, e.g., {(3,3), (3,3)}) as negative infinity2. (See Figure 6.)

The slopeOrder Comparator

This field of the Point class must compare two points by the slopes they make with the invoking point (x0,y0). Thus, the point (x1,y1) is less than the point (x2,y2) if and only if



For example, if the invoking point is (3, 3), then (5, 2) is less than (1, 1), and (1, 1) is less than (4, 5).

(See Figure 6.)

          1           

0    1             2             3             4             5             6             7             1             2      3             4             5 x          x

            Figure 5: Reference for Point natural order. Figure 6: Reference for Point slope and slope order

The Line Class
You must create a data type Line that models a line segment as a set of points. A set is an appropriate model since a point only appears once on a line segment and the order in which points are listed is irrelevant. That is, {(1,7), (4,4), (5,3), (7,1)} and {(4,4), (7,1), (1,7), (5,3)} each define the same line segment. However, note that Point has a total order defined so we will make a Line a sorted set of Points. This is convenient since it means that there is exactly one representation of a line segment – one with the points in ascending natural order. Thus, the line segment above would be represented as {(7,1), (5,3), (4,4), (1,7)}.

A shell of the Line class is provided for you, and you must meet the requirements specified in this document and the provided source code comments. Some fields and methods have been completed for you and must not be changed. Some methods have incomplete bodies and you must complete them yourself. You may add any number of private methods that you like, but you may not add any public method or constructor, nor may you change the signature of any public method or constructor. You may not add any fields to this class.

The field line in the Line class has been declared for you as a java.util.SortedSet. You must use TreeSet as the implementing class.

The compareTo method
This method must compare lines by first points and then, if needed, by last points. Thus, line1 < line2 if line1.first < line2.first or line1.first = line2.first and line1.last < line2.last. (See Figure 7 where l1 < l2 < l3.) Two lines line1 and line2 are equal if and only if line1.first = line2.first and line1.last = line2.last. This is consistent with the equals method, which is provided for you and must not be changed.

5

2

Figure 7: Reference for Line natural order.

The Extractor Class
You must create a class Extractor that finds all line segments of four or more collinear points in provided data. A shell of the Extractor class is provided for you, and you must meet the requirements specified in this document and the provided source code comments. A few aspects of the Extractor class are discussed in more detail below.

The Constructors

The first constructor for the Extractor class takes a single parameter of type String. This parameter is a filename for a file of Point data formatted as follows: The first line of the file contains a single int value N that is the number of lines Point data that follow. Each of the following N lines contains two int values separated by one or more blanks. The first int is the x value of a Point and the second int is the y value of a Point. There may be lines of text past these first N + 1 lines of data, but they should be ignored. A sample input file is shown below.

5

11000 11000 12000 10000 13000 10000 14000 10000 15000 10000

Instantiating an Extractor object with this data file would ensure that five distinct instances of the Point class are stored in a suitable data structure inside the new Extractor object.

The second constructor takes a Collection of points and creates an Extractor for this data.

The getLinesBrute method
This method implements a straight-forward, brute force approach to extracting the feature that we’re interested in. Since any combination of four distinct points that are collinear qualify as our feature, we can generate all combinations of four distinct points and check each combination to see if those four points are collinear. This brute force solution is a combinatoric approach to the problem: We generate all the combinations of N things taken four at a time and test each combination based on our feature criteria

(collinearity).

For example, let’s name the points in the given sample input file p1 through p5, as shown below.

5

11000 11000 (p1)

12000 10000 (p2)

13000 10000 (p3)

14000 10000 (p4)

15000 10000 (p5)

The table below shows all the combinations of these five points taken four at a time, along with the result of testing each combination for collinearity. Note that to check if four points p, q, r, and s are collinear, we check whether the slope between p and q, between p and r, and between p and s are all equal.

Combination
Collinear?
p1,p2,p3,p4
no
p1,p2,p3,p5
no
p1,p2,p4,p5
no
p1,p3,p4,p5
no
p2,p3,p4,p5
yes
The advantage of this brute force approach is that it’s simple to code. In this assignment, the number of points being selected out of the set of N total points is fixed at four, so four nested for loops can be used to generate all the possible combinations. That’s also the problem with this brute force approach: four nested loops each dependent on N will have O(N4) time complexity.

For a given number of points N, we know exactly how many combinations of four points will be computed. That is given by Nk where k = 4.



For our example above, this would give 5!4! = 5 combinations. If the input had 10 points, the brute force solution would have to test 4! different combinations of four points. For N = 20, the brute force solution would generate and test 4845 combinations of four points. For N = 1000, over 41 billion combinations of four points would be generated and tested by our program. You can see how this escalates very quickly and the brute force solution becomes impractical to apply as the problem size scales up.

The getLinesFast method
We can solve this problem much more efficiently if we use sorting as part of our solution. In a group of collinear points, the slope that any two points make with respect to each other is, by definition, the same. For example, in Figure 7 the slope between any two points on l1 is positive 1, the slope between any two points on l2 is 1.0, and the slope between two points on l3 is -1.0. Given a set of points, if we select a reference point and sort all other points with respect to the slope they make with that reference point, then all points mutually collinear with the reference point would be “duplicates” with respect to that ordering and would therefore be in arranged in continguous groups.

For example, here are the points in Figure 7.



          (1, 6)      (2, 2)      (2, 5)      (3, 1)      (3, 2)      (3, 3)      (3, 4)      (3, 5)      (4, 3)      (4, 4)      (5, 2)  (5, 5)



And here are the points in Figure 7 sorted with respect to the slope they make with (3, 4).



Note how all the points that are mutually collinear with (3, 4) are now in contiguous groups (l1 is highlighted in light red and l3 is highlighted in light blue). Underneath each point is the slope it makes with (3, 4)[3]. Similarly, here are the points in Figure 1.



          (1, 7)      (2, 2)      (2, 5)      (3, 1)      (4, 4)      (5, 3)      (5, 6)      (6, 6)      (7, 1)      (7, 3)      (7, 4)          (7, 9)      (8, 8)



And here are the points in Figure 1 sorted with respect to the slope they make with (7, 1).



All the points that are mutually collinear with (7, 1) are now in contiguous groups (highlighted). Underneath each point is the slope it makes with (7, 1).

We can now apply the following strategy to solve the problem.

1.   Sort the N points with respect to the slope that they make with one of the points p.

2.   Scan the sorted points to find all groups of three or more consecutive points having the same slope to p. Each such group is collinear with p and thus, together with p, form a line segment of at least four points.

3.   Repeat steps 1 and 2 for the remaining N          1 points.

How much faster is this sort-and-scan approach? We can sort in O(N logN) time and the subsequent scan is O(N). We have to perform these operations for all N points, so the total cost of this sort and scan approach is N ⇥ (NOlog(N2Nlog+NN)) =ON(N2 log4), and the clock-time diN + N2 which is O(Nfference is dramatic. Problem sizes that are2 logN). This is a significant asymptotic

improvement since

impractical for the brute force solution are solved quickly (or at least in a reasonable amount of time) by this sort-and-scan solution.

But there is an additional benefit of this approach: We are no longer limited to identifying only four-point line segments. We can now identify maximal line segments of four or more collinear points.

Notes and other requirements
Here are a couple of extra requirements plus a few things to keep in mind.

•    Start this one earlythis assignment. Read this handout carefully. Ask questions of your TA and of me. Ask questions on. There’s more reading, thinking, and up-front understanding to take care of on

Piazza. Start early and be proactive.

•    You’ve been provided with shells of thethat has already been done for you.            Point, Line, and Extractor classes. Don’t modify anything

•    Start by completing the remaining methods of theList or Extractor class before this is complete and correct, since both depend onPoint class. There’s no point[4] in attempting thePoint.

•    Complete theboth Line andLinePointclass after, you must have these working first.Point but before Extractor. Again, since Extractor depends on

•    When you begin work on theshorter and easier to get correct quickly. Once you are satisfied thatExtractor class, start with the getLinesBrutegetLinesBrutemethod. This isis correct, turn your attention to getLinesFast.

•    Inexample, if the line segmentgetLinesFast, do not print subsegments of lines containing five or more collinear points. Forp ! q ! r ! s ! tpexists in the data, you must identify it but you must! q ! r ! s. not identify any four-point subsegment such as



[1] This means that once you create a Point you can’t change its x or y value. Thus, the class is final, the fields are final and private, and there are no setter methods.
[2] See the Java documentation of the Double class for a discussion of positive zero, positive infinity, and negative infinity.
[3] Now do you see why we defined the slope of degenerate lines as negative infinity?
[4] Sorry; couldn’t resist. Here’s more: www.punoftheday.com

More products