# Geospatial-Operation_ CSE 512 Course Project Operation

1. Operation Checklist 1) Geometry union 2) Geometry convex hull 3) Geometry farthest pair 4) Geometry closest pair 5) Spatial range query 6) Spatial join query 2. Operation Requirement 1) Geometry union Definition: The union of a set S of polygons is the set of all points that lie in at least one of the polygons in S, where only the perimeter of all points is kept while inner segments are removed. Example: The figures below show the input and output of an union operation. The input is a set of polygons and the output is the perimeter of the area which is composed of these polygons. Function Name: GeometryUnion Arguments: (1) String InputLocation: the location of the input in HDFS (2) String OutputLocation: the location of the output in HDFS Requirement: Load a set of polygons, output the union result of this set. Input Dataset Schema: x1, y1, x2, y2 // Every row is a pair of points (longitude, latitude) which defines a rectangle. This dataset has a bunch of rectangle. Output Dataset Schema: x, y // Every row is a point (longitude, latitude). This dataset has a bunch of points. The result polygon is composed of all these points. Figure 1 Union Input Polygons Figure 2 Union Result Note: If the input data have rectangles that doesn’t interact with each other, you should output one polygon that convers all rectangles. 2) Geometry convex hull Definition: The convex hull of a set of points P is the smallest convex polygon that contains all points in P. The output of the convex hull operation is the points forming the convex hull ordered in a clockwise direction. Example: The figures below show the input and output of a convex hull operation. The input is a set of points and the output is the points which compose the convex hull. Function Name: GeometryConvexHull Arguments: (1) String InputLocation: the location of the input in HDFS (2) String OutputLocation: the location of the output in HDFS Requirement: Load a set of points, output the convex hull of this set. Input Dataset Schema: x, y // Every row is a point (longitude, latitude). This dataset has a bunch of points. Output Dataset Schema: x, y // Every row is a point (longitude, latitude). This dataset has a bunch of points. A convex hull is composed of all these points. Figure 3 Convex Hull Input Points Figure 4 Convex Hull Result 3) Geometry farthest pair Definition: Given a set of points P, the farthest pair is the pair of points that have the largest Euclidean distance between them. Example: Figure 5 shows the two points contributing to the farthest pair have to lie on the convex hull. The input of the farthest pair operation is a set of points and the output is a pair of points which have the farthest distances between each other. Function Name: GeometryFarthestPair Arguments: (1) String InputLocation: the location of the input in HDFS (2) String OutputLocation: the location of the output in HDFS Requirement: Load a set of points, output the farthest pair of this set. Input Dataset Schema: x, y // Every row is a point (longitude, latitude). This dataset has a bunch of points. Output Dataset Schema: x, y // Every row is a point (longitude, latitude). This dataset has two points. The farthest pair is composed of them. 4) Geometry closest pair Definition: Given a set of points P, the closest pair is the pair of points that have the smallest Euclidean distance between them. This pair of points should lie in the convex hull. Example: Figure 5 shows the two points contributing to the closest pair have to lie on the convex hull. The input of the farthest pair operation is a set of points and the output is a pair of points which have the closest distances between each other. Function Name: GeometryClosestPair Arguments: (1) String InputLocation: the location of the input in HDFS (2) String OutputLocation: the location of the output in HDFS Requirement: Load a set of points, output the closest pair of this set. Input Dataset Schema: x, y // Every row is a point (longitude, latitude). This dataset has a bunch of points. Output Dataset Schema: x, y // Every row is a point (longitude, latitude). This dataset has two points. The closest pair is composed of them. Figure 5 Farthest Pair and Closest Pair in Convex Hull 5) Spatial range query Definition: Spatial range queries are queries that inquire about certain spatial objects which lie in a certain query window. Here we are interested in finding objects within a rectangular window in space. Only objects falling inside this window are displayed. Example: The figures below show an example of rectangular window query. The input1 is a set of points, the input2 is a rectangle, and the output is a set of points which are all inside the rectangle. Function Name: SpatialRangeQuery Arguments: (1) String InputLocation1: the location of the input1 in HDFS (2) String InputLocation2: the location of the input2 in HDFS (3) String OutputLocation: the location of the output in HDFS Requirement: Load a set of polygons, output the query result of this set. Input1 Dataset Schema: id, x1, y1 // Every row represent one point (longitude, latitude). This set has a bunch of points. Input2 Dataset Schema: x1, y1, x2, y2 // This dataset has a pair of points (longitude, latitude) which defines a rectangle. This dataset is the query window of rang query. Output Dataset Schema: id // Every row is the id of a point. This set has a bunch of ids. Figure 6 Range Query Input Figure 7 Range Query Output 6) Spatial join query Definition: Spatial join operation is used to combine two or more datasets with respect to a spatial predicate. A typical example of a spatial join query is “Find all pair of rivers and cities that intersect”. Example: The figures below show an example of a join query. The input1 is a set of points, the input2 is a set of polygons. The output is the list of the points and the rectangles which contain them. Function Name: SpatialJoinQuery Arguments: (1) String InputLocation1: the location of the input 1 in HDFS (2) String InputLocation2: the location of the input 2 in HDFS (3) String OutputLocation: the location of the output in HDFS (4) String input1Type: Indicate input 1 is point, or rectangles. (“point”, “rectangle”) Requirement: Load two sets of polygons, output the join query result of this set. Input1 Dataset Schema: (1) A-id, x1, y1, x2, y2 // Every row is a pair of points (longitude, latitude) which defines a rectangle. This set has a bunch of rectangle. Both contain and overlap will be seen as contained, i.e., both should be output to the result. (2) A-id, x1, y1 //Every row is a pair (longitude, latitude) which defines a point. This set has a bunch of points. You will consider those points on the line of the rectangle in input 2 as “contained”, i.e., output to the join result. Input2 Dataset Schema: B-id, x1, y1, x2, y2 // Every row is a pair of points (longitude, latitude) which defines a rectangle. This set has a bunch of rectangle. Output Dataset Schema: B-id, A-id1, A-id2, A-id3, … A-idn // Every row has one B-id and 0 ~ n Bid and shows the rectangle (B-id) which contain this points (A-id) or recatangles. This dataset has a bunch of rows. If the join set is empty: you should output: Aid, NULL. Note: The first input data set will be either completely consists of point or completely consists of rectangles. Figure 8 (a) Join Query Input points Figure 8 (b) Join Query Input Rectangles Figure 8 (c) Join Query Effect Figure 9 Join Query Result

You'll get a 13.6MB .ZIP file.