Assignment 8: K-D Trees and Nearest Neighbor Search
K-D Trees and Nearest Neighbor Search
For this assignment you will be implementing a K-D tree and comparing its performance to a brute force method for nearest neighbor searching.
The assignment will have three parts.
Part 1: K-D Tree and Brute Force Nearest Neighbor Implementation
Create an implementation of a K-D tree. Your tree should be able to work with points of any dimension ($k$). You can assume a Euclidean distance as the measure of similarity.
You should support the following capabilities.
- Starting from an empty tree and given a list of k-dimensional points, build a K-D tree.
- Given a single, k-dimensional query point, return the nearest neighbor in the K-D tree (you must use the structure of your tree and not default to brute force search).
You DO NOT need to support any of the following capabilities
- Incrementally add new points to your K-D tree
- Delete points from your K-D tree
- Look for nearest neighbors of multiple points simultaneously
For extra credit you can implement the following
- 1 point of extra credit: implement the Quickselect method ($\Theta(n)$ runtime versus the typical $\Theta(n \log n)$) for median finding.
For your brute force search, you must support the following capability
- Given a list of k-dimensional points and a query, return the closest point to the query. The brute force approach involves searching through every point and calculating its distance to the query.
Part 2: Benchmarking Infrastructure
Create a function called runExperiment with the following capabilities.
/**
* Benchmark [KDTree] against brute force nearest neighbor.
* 1000 test points will be generated to test against the training
* points.
*
* @param k: the dimensionality of the dataset to create
* @param numPoints: the number of points to use to match against
* (1000 test points will be used)
* @return the triple of [Duration] objects where the first specifies
* the time to build the tree, the second specifies the time it takes
* to query the tree with 1,000 points, and the third is the time it
* takes to find the nearest neighbor to these points using the brute
* force approach.
*/
fun runExperiment(k: Int, numPoints: Int): Triple<Duration, Duration, Duration> {
// your implementation here
}
You can create the data points using a random number generator.
In your fun main(), use nested loops to test various values of k and numPoints. Make sure you allow numPoints to range over different orders of magnitude (e.g., 10, 100, 1000, 10000) rather than over a linear range (e.g., 10, 20, 30, 40).
Part 3: Running Some Experiments
In a markdown file or other document format, use your code from parts 1 and 2, run some experiments to compare the performance of brute force search to your k-d tree. Write up your data in a clear manner (either a table or a plot). In a brief report, summarize the situations where k-d trees are superior to a brute force search.