Course 2.2 - Bayes' Theorem

Bayes’ Theorem is a fundamental identity in probability theory that helps us find a conditional probability $P(B∣A)$ when we know the “reverse” conditional probability $P(A∣B)$. Formally:

$$ P(B \mid A)\;=\;\frac{P(A, B)}{P(A)}\;=\;\frac{P(A \mid B)\, P(B)}{P(A)}. $$

$P(A, B)$ is the joint probability of A and B.
$P(A)$ is the marginal probability (or “overall” probability) of A.
$P(A \mid B)$ is the conditional probability of A given B.

This identity lets us “reverse” the condition, going from $P(A \mid B)$ to $P(B \mid A)$.

Key Point: Mathematically, there is no inherent “direction” in this formula—it’s purely a statement about how probabilities relate.

Example: Balls in Bags

Imagine you have two bags of balls:

Bag A (e.g., 3 blue balls out of 10 balls),
Bag B (e.g., 8 blue balls out of 10 balls).

You pick a ball at random from one bag, and it turns out to be blue. You want to know the probability that this blue ball came from Bag B. Bayes’ Theorem tells us:

$$ P(\text{Bag B} \mid \text{Blue}) \;=\; \frac{P(\text{Blue} \mid \text{Bag B}) \times P(\text{Bag B})}{P(\text{Blue})}. $$

$P(\text{Bag B})$ is the probability of choosing Bag B (before seeing the color), without a bias, it is 0.5.
$P(\text{Blue} \mid \text{Bag B})$ is the probability of drawing a blue ball, given we chose Bag B, it is 0.8.
$P(\text{Blue})$ is the overall (marginal) probability of drawing a blue ball, no matter which bag it came from, it is (8+3) / 20 = 0.55.

Similarly, if the ball turns out to be red, you might ask for $P(\text{Bag A} \mid \text{Red})$. The same theorem applies, just switching the labels from “Blue” to “Red,” and from “Bag B” to “Bag A.”

Directed Inference - Bayesian Graph (Prior, Posterior, Likelihood)

In Bayesian statistics, the same formula is interpreted as a process of updating beliefs:

$$ \underbrace{P(\theta)}{\text{prior}} \;\to\; \underbrace{P(\text{data} \mid \theta)}{\text{likelihood}} \;\to\; \underbrace{P(\theta \mid \text{data})}_{\text{posterior}}. $$