Starting from:

$10

COMP135- Assignment 01 Solved

This assignment will get you started using NumPy. If you are new to using that library (or Python in general), you probably want to start with the tutorial provided at:

https://numpy.org/devdocs/user/quickstart.html

You have been supplied with some starter code, including the prototype for two functions that you need to complete. The code also contains detailed documentation of how the functions should operate, and some examples of their use. You can (and should) test your code to confirm that it does what is asked.

1.   (5 pts.) Along with your code, submit a completed version of the COLLABORATORS.txt file.

An example has been provided, which you should edit appropriately to include:

•   Your name.

•   The time it took you to compete the assignment.

•   Any resources you used to complete the assignment, including discussions with the instructor, TA’s, or fellow students, and any online or offline resources consulted. If you did not need to consult any outside resources, you can say so.

•   A brief description of what parts, if any, of the assignment caused you to seek help.

2.   (10 pts.) The first part of the code asks you to write a function to split data into a training set and a testing set (a very common task in ML). Your code will use basic NumPy array indexing and random number generation to take an input array containing L instances of F-dimensional data, and divide it into two mutually exclusive arrays of size M (for training) and N (for testing).

As part of its input, the function takes parameter frac_test, specifying the overall fraction of the data-set to use for testing purposes. It will use this fraction to determine the size N, rounding up to the nearest whole number: N = dfrac_test∗ Le

The function will also use NumPy functions like shuffle or permutation for doing random sampling of the data, so that the test/train instances are uniformly selected from the data-set:

https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html

Furthermore, we want the results of the function to be reproducible, for scientific purposes. That means that there should be a way to specify a source of randomness (a seed) such that it is possible to duplicate any random selection. In NumPy, this is generally handled by using a RandomState instance, or via an integer seed. See the linked discussion from the source code comments for insight into using such seeds in your code.

Notes: read the documentation of the function supplied with the starter code carefully. Ensure that your code meets the requirements as given. In addition, be sure that your code uses only basic Python and functions from NumPy. Do not call functions for data-set generation and manipulation from libraries like sklearn.

3. (10 pts.) The other function you are to complete finds nearest neighbors of given datainstances. That is, given a set of F-dimensional data of size N, and Q query instances (also F-dimensional), we want to compute the K closest vectors found in the data, for some integer value K, and for each query instance (for a total of Q × K neighbors).

In computing “closeness,” we will use the Euclidean distance between vectors. For two vectors

x ) and x ), this distance is given by:



Your function will take in a data-set (a 2-dimensional array of size N × F) and a query-set (a 2-dimensional array of size Q×F), and return a 3-dimensional array (of size Q×K ×F), where each row (indexed by Q) consists of the K nearest neighbors of the corresponding query vector. These neighbors should appear in order, closest to least-close.

Notes: it is possible that there will be ties among neighbors. If this occurs, such ties can be broken however you like (randomly or not). Again, be sure that your code uses only basic Python and functions from NumPy. Do not call functions from libraries like sklearn.

More products