Likelihood in Statistics

The likelihood function, $\mathcal{L}$, is defined as the probability/likelihood of observing a specific set of data, $\mathcal{Y}$, under a probability distribution parameterized by $z$:

$$ \mathcal{L}(z) = p(\mathcal{Y}\mid z) $$

Where:

Example: Bernoulli Distribution (Coin Toss)

Suppose we have a sequence of independent coin tosses:

$$ \mathcal{Y}=[\text{heads}, \text{heads}, \text{tails}] $$

To model this, we treat each toss as a Bernoulli trial. The parameter $z$ represents the probability of the coin landing on heads. Therefore, the probabilities for single events are directly tied to this parameter:

embed - 2025-02-08T002947.502.svg

$$ p(y^{(i)} = \text{heads} \mid z) = z \quad p(y^{(i)} = \text{tails} \mid z) = 1 - z $$

Because the coin tosses are independent events, the joint probability of the entire sequence is simply the product of their individual probabilities:

$$ \mathcal{L}(z) = p(\mathcal{Y}\mid z) = p(y^{(1)}=\text{heads}\mid z) \cdot p(y^{(2)}=\text{heads}\mid z) \cdot p(y^{(3)}=\text{tails}\mid z) $$

We can rewrite this likelihood strictly in terms of the distribution parameter $z$:

$$ \mathcal{L}(z) = z \cdot z \cdot (1 - z) = z^2(1 - z) $$

If we assume the coin is biased with the parameter $z = 0.8$ (meaning an 80% chance of heads and a 20% chance of tails), we can plug this value in to compute the overall likelihood of observing this exact sequence:

$$ p(\mathcal{Y} \mid z) = 0.8 \times 0.8 \times 0.2 = 0.128 $$

A Better Guess of Distribution Parameters

In the previous example, we assumed we already knew the coin's bias ($z = 0.8$). But in real-world scenarios, we observe the data first and have to work backward to figure out the parameter.

Suppose we observe the sequence $\mathcal{Y}=[\text{heads}, \text{heads}, \text{tails}]$, but we don't know the true value of $z$. How do we decide which parameter best describes our coin?

Let's test three different hypotheses for our cause, $z$, using our likelihood formula $\mathcal{L}(z) = z^2(1 - z)$: