Entropy
LMs: Language Models
LMs may be evaluated extrinsically through their embedded performance on other tasks
- instead An LM may be evaluated intrinsically according to how
information
uncertainty, surprised, information
enocde outcome as binary tree, i.e. the required bit =
- additive bits for independent events
entropy: the average uncertainty/information/surprisal of a (discrete) random variable X.
- $H(X) = \sum_x P(x) -\lox_2 p(x)
- less average uncertainty (entropy) when the probabilities are skewed (means easy to predict and less bits to use).
- maximum entropy when uniform distrobution, i.e. uniform X with V choices, then
Huffman???
An LM P may infinitely generates one word after another which lead
- A corpus c is a prefix of x
we use per-word entropy rate, i.e.
Joint Enropy: the average amount of information nedded to specify multiple variables simultaneously.
- maximum when independent
Conditional entropy: the average amount of information needed to specify one variable given that you know another
Mutual information: the average amount of information sahred between variables
- the amount of uncertainty removed in variable X if you know Y
Until now, we use to present the possbility, but as long as we are using Language the such won't fix, so we would like to use a distribution to approxmiate it.
Cross-entropy: measures the uncertainty of a distribution of samples drawn from H(X;Q) = \sum_x P(x) - \log_2 Q(x)$
- we still keep a here to avoid the bad approxmiation
- as nears , cross-entropy nears entropy
- we pay for this mismatch with added uncertainty
Some notices:
- we can evaluate but not
- corpus is drawn from
prove:
- Let be c's sentences where
- by large
- negative log likelihood (NLL)
- with time invariance, ergodicity and , NLL approaches as
- 6 can be proved by markov chain, mento carlo...
KL divergence see more on csc412
perplexity
Decision
entropy, KL divergence and perplexity can all be used to justify a preference for one method/idea over another
shallow statistics are often not enough to be truly meaningful