Importance Sampling, Trivialized


Algebraic manipulation is always a fun way to make something profound seem like mere trickery.

We’ll derive importance sampling by showing how it reduces to multiplying and dividing by the same thing (and is therefore equivalent because we just multiplied by 1).

We have some random variable \(x\) with PDF \(p\) that we want to take the expectation of. My notation will be sloppy except where it counts.

\[\mathbb{E}_p(x) = \int x \cdot p(x) dx\]

We now introduce another PDF \(q\). By multiplying and dividing by it, we can get an expectation with respect to \(q\) instead of \(p\).

\[\begin{align*} \mathbb{E}_p(x) = \int x \cdot p(x) dx && \text{Definition of expectation with respect to PDF p} \\ = \int x \cdot p(x) \cdot \frac{q(x)}{q(x)} dx && \text{Multiply and divide by q(x)} \\ = \int (x \cdot \frac{p(x)}{q(x)}) \cdot q(x) dx && \text{Move one of the q's over and notice you have an expectation with respect to q} \\ = \mathbb{E}_q(x \cdot \frac{p(x)}{q(x)}) && \text{Rejoice} \\ \end{align*}\]

Remembering the trick makes it easy to re-derive thankfully. I never remember it and had to do this derivation twice in the 10 minutes it took to write.

Related Posts

Random Thought: LC Theorem

I finally have an answer to "who's your favorite singer?"

My Top Tip for Helping People Get Started Programming

GPT-f

Random paper on angles

An Image is Worth 16x16 Words

Random stuff

Lossless Data Compression with Neural Networks by Fabrice Bellard

Downscaling Numerical Weather Models With GANs (My CI 2019 Paper)

Learning Differential Forms and Questions