Perplexity is commonly used to evaluate language models.
Definition and Calculation
First, we need to know
Perplexity is the geometric mean of choices. is the number of data, is the number of choices.
If perplexity becomes negative, you might need to take into account normalization constants. If you calculate the perplexity right after you initialize the model (randomly fill parameters), perplexity could be greater than the number of unique words in the corpus.
Ideally, we want to know , but we need to consider the complete data log-likelihood . So, we take
is the number of simulation after enough number of iterations. is the value of latent variable under simulation. If we take the mean of perplexity, it could be an approximation of all possible .
This is a weighted average by the trained parameters. In the following example, we consider three topics.
We sum up .