One generalization of binary trees is the k-d tree, which stores k-dimensional data. Every internal node of a k-d tree indicates the dimension d and the value v in that dimension that it discriminates by. An internal node has exactly two children, containing data that is less-than-or-equal and data that is greater than v in dimension d. For example, if the node distinguishes on dimension 1, value 107, then the left child is for data with y value less than or equal to 107, and the right child is for data with y value greater than 107. Leaf nodes represent a bucket containing no more than b elements of k-dimensional data. All data are found in the leaves.
There are several strategies for building k-d trees. The offline method (1) accumulates all the data in an array, (2) finds the best dimension to discriminate on, namely, the one with the widest range (break ties by choosing the earliest dimension that has the widest range), (3) finds the best value of that dimension to discriminate on, namely, the median value in that dimension (using the QuickSelect algorithm with Lomuto's partitioning method), (4) separates the data into two subarrays based on that discriminant, (5) recurses back to step 2 on each subarray. Recursion terminates when an array has size b or smaller. One can also devise online methods that add to existing trees.
Write a program called kd
that
(1) takes three parameters, all positive integers:
k specifies the number of dimensions,
n specifies how many data points are to be placed in the tree, and
p specifies the number of probes into the tree;
(2) reads from standard input a list of n k-dimensional integer data
points;
(3) builds a k-d tree with those n values using the offline
method, with
b set to 10 (and ties going to the left subtree);
(4) reads p k-dimensional data values, called probes, and
for each probe, lists all the data points stored in the bucket where
the probe would be found if it were in the tree.
In step 3, you may assume that all integer data are distinct.
You have access to some useful tools. First, there is a sample
Makefile
at
http://www.cs.uky.edu/~raphael/courses/CS315/prog3/Makefile
.
It has a run target that compiles your program (either kd.c
or
kd.cpp
) and runs it. It also has a runWorking target that gets and
runs a working program so you can compare your output against it.
A second tool is randGen.pl
, which you used in the previous assignments.
As before, if you invoke it with no parameters, it chooses a
random seed and produces non-negative integers. If you invoke it with one
parameter s, then s is the seed for the pseudo-random number generator,
and the stream of numbers is therefore deterministic. If you invoke it with two
parameters s and m, then m is a modulus limiting the size of
the outputs; they will range from 0 to m-1.
Here is how you can invoke randGen.pl
with your program,
seeding the random-number generator with 42, limiting the random numbers to the
range [0 .. 9999], setting k to 3, n to 64, and p to 2:
./randGen.pl 42 10000 | ./kd 3 64 2
Warning: If you run randGen.pl
by itself, it generates an unbounded
list of numbers. You should always pipe its output into another program, such
as kd
or less
.
You can also get a working program that satisfies the specifications at
http://www.cs.uky.edu/~raphael/courses/CS315/prog3/workingKD
. The
Makefile
mentioned above automatically gets a copy of this file for you if
you make runWorking.
Submit via Canvas a copy of your program, any external documentation (including a Makefile) and its output when you run
% randGen.pl 43 1000 | kd 3 64 10The
Makefile
mentioned above has a zipAll target that creates a
package ready to submit.