Perplexity

1 minute read

Perplexity is commonly used to evaluate language models.

Definition and Calculation

First, we need to know

\(\begin{align} &\quad {\rm probability}\ p = \frac{1}{\rm choices}, \\ &\Leftrightarrow {\rm choices}\ c = \frac{1}{p}. \end{align}\)

Perplexity is the geometric mean of choices. \(N\) is the number of data, \(c\) is the number of choices.

\(\begin{align} \left( \prod_{i=1}^{N} c_i \right)^{\frac{1}{N}} &= \left( \prod_{i=1}^{N} \frac{1}{p_i} \right)^{\frac{1}{N}} \\[10pt] &= \exp \left( \log \left( \prod_{i=1}^{N} \frac{1}{p_i} \right)^{\frac{1}{N}} \right) \\[10pt] &= \exp \left( \dfrac{\sum_{i=1}^N -(\log p_i)}{N} \right) \\[10pt] &= \exp \left( \frac{-{\rm loglik}}{N} \right) \end{align}\)

If perplexity becomes negative, you might need to take into account normalization constants. If you calculate the perplexity right after you initialize the model (randomly fill parameters), perplexity could be greater than the number of unique words in the corpus.

Explanation

Ideally, we want to know \(p(w)\), but we need to consider the complete data log-likelihood \(p(w,z)\). So, we take

\(\begin{align} &\sum_z p(w,z) \approx \frac{1}{S} \sum_{s=1}^S p(w, z(s)) \end{align}\)

\(S\) is the number of simulation after enough number of iterations. \(z(s)\) is the value of latent variable under \(s^{\rm th}\) simulation. If we take the mean of perplexity, it could be an approximation of all possible \(z\).

Test Perplexity

\(\begin{align} p(\mathbf{w}^{\rm Test} | \mathbf{w}^{\rm Train}) = \int p(\mathbf{w}^{\rm Test} | \theta) p(\theta | \mathbf{w}^{\rm Train}) d\theta \end{align}\)

This is a weighted average by the trained parameters. In the following example, we consider three topics.

Topics \(p(z)\)	\(p(w\|z)\)	\(p(w,z)\)
\(p(z=1)=0.5\)	\(0.05\)	\(0.025\)
\(p(z=2)=0.2\)	\(0.03\)	\(0.006\)
\(p(z=3)=0.1\)	\(0.01\)	\(0.001\)

We sum up \(p(w,z)\).

Share on

Twitter Facebook LinkedIn

Perplexity

Definition and Calculation

Explanation

Test Perplexity

Share on

You May Also Enjoy

Yet another expression of Bernoulli distribution

アメリカに引っ越す

Debug, Debug, Debug

Faster Log Gamma Calculation