Problem 5.2

Searching an unsorted array

The problem examines three algorithms for searching for a value $x$ in an unsorted array $A$ consisting for $n$ elements.

Consider the following randomized strategy: pick a random index $i$ into $A$. If $A[i] = x$, then we terminate; otherwise, we continue the search by picking a new random index into $A$. We continue picking random indices into $A$ until we find an index $j$ such that $A[j] = x$ or until we have checked every element of $A$. Note that we pick from the whole set of indices each time, so that we may examine a given element more than once.

Write pseudocode for a procedure RANDOM-SEARCH to implement the strategy above. Be sure that your algorithm terminates when all indices into $A$ have been picked.

Suppose that there is exactly one index $i$ such that $A[i] = x$. What is the expected number of indices into $A$ that we must pick before we find $x$ and RANDOM-SEARCH terminates?

Generalizing your solution to part (b), suppose that there are $k \ge 1$ indices $i$ such that $A[i] = x$. What is the expected number of indices into $A$ that we must pick before we find $x$ and RANDOM-SEARCH terminates? Your answer should be a function of $n$ and $k$.

Suppose that there are no indices $i$ such that $A[i] = x$. What is the expected number of indices into $A$ that we must pick before we have checked all elements of $A$ and RANDOM-SEARCH terminates?

Now consider a deterministic linear search algorithm, which we refer to as DETERMINISTIC-SEARCH. Specifically, the algorithm searches $A$ for $x$ in order, considering $A[1], A[2], A[3], \ldots, A[n]$ until either it finds $A[i] = x$ or it reaches the end of the array. Assume that possible permutations of the input array are equally likely.

Suppose that there is exactly one index $i$ such that $A[i] = x$. What is the average-case running time of DETERMINISTIC-SEARCH? What is the worst-case running time of DETERMINISTIC-SEARCH?

Generalizing your solution to part (e), suppose that there are $k \ge 1$ indices $i$ such that $A[i] = x$. What is the average-case running time of DETERMINISTIC-SEARCH? What is the worst-case running time of DETERMINISTIC-SEARCH? Your answer should be a function of $n$ and $k$.

Suppose that there are no indices $i$ such that $A[i] = x$. What is the average-case running time of DETERMINISTIC-SEARCH? What is the worst-case running time of DETERMINISTIC-SEARCH?

Finally, consider a randomized algorithm SCRAMBLE-SEARCH that works by first randomly permuting the input array and then running the deterministic linear search given above on the resulting permuting array.

Letting $k$ be the number of indices $i$ such that $A[i] = x$, give the worst-case and expected running time of SCRAMBLE-SEARCH for the cases in which $k = 0$ and $k = 1$. Generalizing your solution to handle the case in which $k \ge 1$.

Which of the three searching algorithms would you use? Explain your answer.

RANDOM-SEARCH pseudocode

RANDOM-SEARCH(x, A, n):
  v = ∅
  while |∅| ≠ n
      i = RANDOM(1, n)
      if A[i] = x
          return i
      else:
          v = v ∩ i
  return ␀

v can be implemented in multiple ways - a hash table, a tree or a bitmap. The last one would probabily perform best and consume the least space.

RANDOM-SEARCH with one match

RANDOM-SEARCH is well-modelled by Bernoulli trials. The expected number of picks is $n$.

RANDOM-SEARCH with multiple matches

In similar fashion, the expected number of picks is $n/k$.

RANDOM-SEARCH with no matches

This is modelled by the balls and bins problem, explored in section 5.4.2. The answer is $n(\ln{n} + \O(1))$.

DETERMINISTIC-SEARCH with one match

The worst-case running time is $n$. The average-case is $(n+1)/2$ (obviously).

DETERMINISTIC-SEARCH with multiple matches

The worst-case running time is $n-k+1$. The average-case running time is $(n+1)/(k+1)$. Let $X_i$ be an indicator random variable that the $i$th element is a match. $\Pr\{X_i\} = 1/(k+1)$. Let $Y$ be an indicator random variable that we have found a match after the first $n-k+1$ elements ($\Pr\{Y\} = 1$). Thus:

$$ \E[X] = \E[X_1 + X_2 + \ldots + X_{n-k} + Y] = 1 + \sum_{i=1}^{n-k}\E[X_i] = 1 + \frac{n-k}{k+1} = \frac{n+1}{k+1} $$

DETERMINISTIC-SEARCH with no matches

Both the worst-case and average case is $n$.

SCRAMBLE-SEARCH matches

It's the same as DETERMINISTIC-SEARCH, only we replace "average-case" with "expected".

Best algorithm

Definitelly DETERMINISTIC-SEARCH. SCRAMBLE-SEARCH gives better expected results, but for the cost of randomly permuting the array, which is a linear operation. In the same time we could have scanned the full array and reported a result.