I had to implement reservoir sampling during an interview and I had no idea how to do that. So I had to figure it out on the spot. This is where all those years of induction paid off, since there’s a nice derivation based on writing down the definitions of everything and noticing that there’s a natural inductive pattern.

## The Problem

Sample \(k\) elements from a stream.

## Definitions

**Sample**- Uniformly at random.
**Stream**- A sequence of
*finite*length (uniform sampling wouldn’t make sense on a countably infinite sequence). It can only be accessed one element at a time in a linear order, and is exhausted by iteration.

Python iterators like `(i for i in range(5))`

are good examples of
streams.

## The Solution

The constraints of the stream data structure suggest a solution that
comes from considering simple cases and noticing that *you don’t know
how long the stream is*.

We need an algorithm that works for any length, but we don’t know what
that length *is* up front. All we get is one element at a time (with
something like `StopIteration`

when the stream is empty) and all we can
do with that element is store or ignore it.

Let’s call that unknown length \(N\) and consider simple cases.

For now, assume \(k = 1\). We only want to sample a single element.

### \(N = 1\)

Our stream has only one element. Let’s call it \(x_1\). We only find out
that there’s one element when we try to access it twice and get \(x_1\)
and then `StopIteration`

.

So for our algorithm to cover this case, it *has* to store \(x_1\) no
matter what.

### \(N = 2\)

Since we don’t *know* that \(N = 2\), we have to cover the \(N = 1\)
case and keep \(x_1\).

Then to have a uniform sampling over \(\{x_1, x_2\}\), we have to discard \(x_1\) and keep \(x_2\) with probability \(\frac{1}{2}\).

### \(N = 3\)

This is the \(N = 2\) case with one additional element. So we keep \(x_1\) with probability 1 and \(x_2\) with probability \(\frac{1}{2}\).

To keep the sampling uniform, you’ll see that \(x_3\) needs to be kept with probability \(\frac{1}{3}\).

This suggests a pattern of keeping the \(i\)-th element with probability \(\frac{1}{i}\). This is correct, and I’ll let you look up the proof on Wikipedia.

Now we consider how to sample \(k\) elements rather than just 1. We start with the smallest valid case.

### \(N = k\)

As before, we need to keep the first \(k\) elements no matter what in case there are no others.

(The data structure that stores these \(k\) elements is our
*reservoir*).

### \(N = k + 1\)

A bit of arithmetic shows that we need to keep our \(i\)-th element with probability \(\frac{k}{i}\).

If we keep it, we uniformly pick one element from our reservoir and replace it with this new element.

And then we’re done. The stream data structure’s constraints of unknown length and nonrandom access basically solves the problem for us.