Assignment 8: K-D Trees and Nearest Neighbor Search

K-D Trees and Nearest Neighbor Search

For this assignment you will be implementing a K-D tree and comparing its performance to a brute force method for nearest neighbor searching.

The assignment will have three parts.

Part 1: K-D Tree and Brute Force Nearest Neighbor Implementation

Create an implementation of a K-D tree. Your tree should be able to work with points of any dimension ($k$). You can assume a Euclidean distance as the measure of similarity.

You should support the following capabilities.

Starting from an empty tree and given a list of k-dimensional points, build a K-D tree.
Given a single, k-dimensional query point, return the nearest neighbor in the K-D tree (you must use the structure of your tree and not default to brute force search).

You DO NOT need to support any of the following capabilities

Incrementally add new points to your K-D tree
Delete points from your K-D tree
Look for nearest neighbors of multiple points simultaneously

For extra credit you can implement the following

1 point of extra credit: implement the Quickselect method ($\Theta(n)$ runtime versus the typical $\Theta(n \log n)$) for median finding.

For your brute force search, you must support the following capability

Given a list of k-dimensional points and a query, return the closest point to the query. The brute force approach involves searching through every point and calculating its distance to the query.

Part 2: Benchmarking Infrastructure

Create a function called runExperiment with the following capabilities.

/**
 * Benchmark [KDTree] against brute force nearest neighbor.
 * 1000 test points will be generated to test against the training
 * points.
 *
 * @param k: the dimensionality of the dataset to create
 * @param numPoints: the number of points to use to match against
 *   (1000 test points will be used)
 * @return the triple of [Duration] objects where the first specifies
 * the time to build the tree, the second specifies the time it takes
 * to query the tree with 1,000 points, and the third is the time it
 * takes to find the nearest neighbor to these points using the brute
 * force approach.
 */
fun runExperiment(k: Int, numPoints: Int): Triple<Duration, Duration, Duration> {
    // your implementation here
}

You can create the data points using a random number generator.

In your fun main(), use nested loops to test various values of k and numPoints. Make sure you allow numPoints to range over different orders of magnitude (e.g., 10, 100, 1000, 10000) rather than over a linear range (e.g., 10, 20, 30, 40).

Part 3: Running Some Experiments

In a markdown file or other document format, use your code from parts 1 and 2, run some experiments to compare the performance of brute force search to your k-d tree. Write up your data in a clear manner (either a table or a plot). In a brief report, summarize the situations where k-d trees are superior to a brute force search.