Problem 9.3
Small order statistics
We showed that the worst-case number $T(n)$ of comparisons used by
SELECT
to select the $i$th order statistic from $n$ numbers satisfies $T(n) = \Theta(n)$, but the constant hidden by the $\Theta$-notation is rather large. When $i$ is small relative to $n$, we can implement a different procedure that usesSELECT
as a subroutine but makes fewer comparisons in the worst case.
- Describe an algorithm that uses $U_i(n)$ comparisons to find the $i$th smallest of $n$ elements, where $$ U_i(n) = \begin{cases} T(n) & \text{if } i \ge n/2 \\ \lfloor n/2 \rfloor + U_i(\lceil n/2 \rceil) + T(2i) & \text{otherwise} \end{cases} $$ (Hint: Begin with $\lfloor n/2 \rfloor$ disjoint pairwise comparisons, and recurse on the set containing the smaller element from each pair.)
- Show that, if $i < n/2$, then $U_i(n) = n + \O(T(2i)\lg(n/i))$.
- Show that, if $i$ is a constant less than $n/2$, then $U_i(n) = n + \O(\lg{n})$.
- Show that, if $i = n/k$ for $k \ge 2$, then $U_i(n) = n + \O(T(2n/k)\lg{k})$.
The algorithm
This is a modified version of SELECT
. Not only it finds the $i$th order
statistic, but it also partitions the array, thus finding the $i-1$ smaller
elements.
- If $i \ge n/2$, we just use
SELECT
- Otherwise, split the array in pairs and compare each pair.
- We take the smaller elements of each pair, but keep track of the other one.
- We recursively find the first $i$ elements among the smaller elements
- The $i$th order statistic is among the pairs containing the smaller
elements we found in the previous step. We call
SELECT
on those $2i$ elements. That's the final answer.
Just picking the smaller element of each pair is not enough. For example, if
we're looking for the 2nd order statistic and our pairs are 1, 2
, 3, 4
,
5, 6
, 7, 8
, 9, 10
, the answer is in the larger part of the first pair.
That's why we need to keep track and later perform SELECT
on $2i$ elements.
Steps 1-4 can be implemented in place by modifying the algorithm to put the
larger elements of the pairs on the inactive side of the pivot and modifying
PARTITION
to swap the elements on the inactive side every time it swaps
elements on the active side. More details can be found in the Instructor's
Manual.
The math
We can prove (b) by induction. This is the step:
$$ \begin{aligned} U_i(n) &= \lfloor n/2 \rfloor + U_i(\lceil n/2 \rceil) + T(2i) \\ &= \lfloor n/2 \rfloor + \lceil n/2 \rceil + \O(T(2i)\lg(\lfloor n/2 \rfloor / i)) + T(2i) \\ &= n + \O(T(2i)\lg(n/i)) + T(2i) \\ &= n + \O(T(2i)\lg(n/i)) \end{aligned} $$
This is a bit more sloppy that doing it with the substitution method, but that feels like grunt work to me at this point.
The other two are obvious:
$$ \begin{aligned} U_i(n) &= n + \O(T(2i)\lg(n/i)) \\ &= n + \O(\O(1)\lg(n/i)) \\ &= n + \O(\lg{n} - \lg{i}) \\ &= n + \O(\lg{n} - \O(1)) \\ &= n + \O(\lg{n}) \end{aligned} $$
$$ \begin{aligned} U_i(n) &= n + \O(T(2i)\lg(n/i)) \\ &= n + \O(T(2n/k)\lg(n/(n/k))) \\ &= n + \O(T(2n/k)\lg{k}) \\ \end{aligned} $$
Again, this reasoning is sloppy, but I don't feel like applying the substitution method.