Problem 11.4

Hashing and authentication

Let $\mathscr{H}$ be a class of hash functions in which each hash function $h \in \mathscr{H}$ maps the universe $U$ of keys to $\{0, 1, \ldots, m - 1\}$. We say that $\mathscr{H}$ is k-universal if, for every fixed sequence of $k$ distinct keys $\langle x^{(1)}, x^{(2)}, \ldots, x^{(k)} \rangle$ and for any $h$ chosen at random from $\mathscr{H}$, the sequence $\langle h(x^{(1)}), h(x^{(2)}), \ldots, h(x^{(k)}) \rangle$ is equally likely to be any of the $m^k$ sequences of length $k$ with elements drawn from $\{0, 1, \ldots, m - 1\}$.

a. Show that if the family $\mathscr{H}$ of hash functions is 2-universal, then it is universal.

b. Suppose that the universe $U$ is the set of $n$-tuples of values drawn from $\mathbb{Z}_p = \{0, 1, \ldots, p - 1\}$, where $p$ is prime. Consider an example $x = \langle x_0, x_1, \ldots, x_{n-1} \rangle \in U$. For any $n$-tuple $a = \langle a_0, a_1, \ldots, a_{n-1} \rangle \in U$, define the hash function $h_a$ by

$$ h_a(x) = \left( \sum_{j=0}^{n-1} a_j x_j \right) \bmod p $$

Let $\mathscr{H} = \{h_a\}$. Show that $\mathscr{H}$ is universal, but not 2-universal. (Hint: Find a key for which all hash functions in $\mathscr{H}$ produce the same value.)

c. Suppose that we modify $\mathscr{H}$ slightly from part (b): for any $a \in U$ and for any $b \in \mathbb{Z}_p$, define

$$ h_{ab}'(x) = \left( \sum_{j=0}^{n-1} a_j x_j + b \right) \bmod p $$

and $\mathscr{H}' = \{h_{ab}'\}$. Argue that $\mathscr{H}'$ is 2-universal. (Hint: Consider fixed $n$-tuples $x \in U$ and $y \in U$, with $x_i \ne y_i$ for some $i$. What happens to $h_{ab}'(x)$ and $h_{ab}'(y)$ and $a_i$ and $b$ over range $\mathbb{Z}_p$?)

d. Suppose that Alice and Bob secretly agree on a hash function $h$ from a 2-universal family $\mathscr{H}$ of hash functions. Each $h \in \mathscr{H}$ maps from a universe of keys $U$ to $\mathbb{Z}_p$, where $p$ is prime. Later, Alice sends a message $m$ to BoB over the Internet, where $m \in U$. She authenticates this message to Bob by also sending an authentication tag $t = h(m)$, and Bob checks that the pair $(m, t)$ he receives indeed satisfies $t = h(m)$. Suppose that an adversary intercepts $(m, t)$ en route and tries to fool Bob by replacing the pair $(m, t)$ with a different pair $(m', t')$. Argue that the probability that the adversary succeeds in fooling Bob into accepting $(m', t')$ is at most $1/p$, no matter how much computing power the adversary has, and even if the adversary knows the family $\mathscr{H}$ of hash functions used.

a. 2-universal implies universal

This is pretty much by the definition. 2-universal means that the tuple/pair $\langle a, b \rangle$ is equally likely to be any of the $m^2$ possible pairs, $m$ of which contain the same element repeating, placing the chance of collision at $1/m$, which is the requirement for "universal".

b. One possible family of hash functions

In order to convince ourselves that it's universal, we need to establish an upper bound on the probability of $h_a(x) = h_a(y)$ when $x \ne y$. This holds when:

$$ \sum_{j=0}^{n-1} (x_k - y_k)a_j = 0 \mod p $$

...or fully we're looking for:

$$ \Pr\{ \sum_{j=0}^{n-1} (x_k - y_k)a_j \bmod p = 0 \} \le \frac{1}{p} $$

Let's acknowledge that $(x_k - y_k)$ is fixed, and the only thing we're considering is the possible values for $a_j$. Furthermore, let's also note that $a_j < p$.

Now let's establish for which tuples the above condition holds. If we fix the first $n - 1$ elements of the tuple, we're left with a choice of the last one. Since $a_{n-1} < p$, there is only one possible value for the last element that will be produce a sum equal to 0 modulo $p$. All other will be $\ne p$. That is, $1$ in every $p$ functions will produce a collision, and the overall probability is $1/p$, which is the requirement for universality.

It's not 2-universal, however, because all functions of the family produce $h_a(x) = 0$ when $x = \langle 0, 0, \ldots, 0 \rangle$.

c. A better family of functions

At this point it gets pretty intuitive that this is 2-universal, because it eliminates the problem with the zeroes. Following the hint, if we have two tuples that differ only for some it, that is $x_i \ne y_i$, we'll have $h_{ab}'(x) = h_{ab}'(y)$ only when:

$$ a_i x_i + b = a_i y_i + b \mod p $$

Or rather:

$$ a_i (x_i - y_i) + b = 0 \mod p $$

Since $x_i - y_i$ is fixed, and both $a_i < p$ and $b < p$, there is only one value of $b$ that satisfies the equation for a given value of $a_i$. That is, there are $p/p^2 = 1/p$ pairs which collide.

This argument can be formalized, but honestly, it's not worth it.

d. Hash fingerprints, but with more words

Well, there if the adversary has $(m, t)$ and they would like to craft $m'$, they need to calculate $t'$ correctly. They can limit the family of functions in $\mathscr{H}$ to only those that produce $h(m) = t$. But even then, 2-universal implies that for $\langle m, m' \rangle$, any of the possible $\langle t, t' \rangle$ are equality likely (probability $1/p^2$), which in turn means that for any fixed $t$, any of the $p$ possible values of $t'$ is equally likely as well.

With no additional information, the only thing our adversary can do is pick any of the subset they identified, and they have only $1/p$ chance to get it right for the next message.

It's worth noting two things:

If the adversary had multiple pairs of $(m_i, t_i)$, they can narrow it down further, assuming they can compute the subset of the family $\mathscr{H}$. Now, if the family is $(i+1)$-universal, they are back to looking at $1/p$ probability after the $i$th message.
The functions in the family can be picked in a way, where the adversary cannot easily identify the ones for which $h(m) = t$ given $m$ and $t$, given that $P \ne NP$.