Bayes’ Theorem is a fundamental identity in probability theory that helps us find a conditional probability $P(B∣A)$ when we know the “reverse” conditional probability $P(A∣B)$. Formally:
$$ P(B \mid A)\;=\;\frac{P(A, B)}{P(A)}\;=\;\frac{P(A \mid B)\, P(B)}{P(A)}. $$
This identity lets us “reverse” the condition, going from $P(A \mid B)$ to $P(B \mid A)$.
Key Point: Mathematically, there is no inherent “direction” in this formula—it’s purely a statement about how probabilities relate.
Imagine you have two bags of balls:
You pick a ball at random from one bag, and it turns out to be blue. You want to know the probability that this blue ball came from Bag B. Bayes’ Theorem tells us:
$$ P(\text{Bag B} \mid \text{Blue}) \;=\; \frac{P(\text{Blue} \mid \text{Bag B}) \times P(\text{Bag B})}{P(\text{Blue})}. $$
Similarly, if the ball turns out to be red, you might ask for $P(\text{Bag A} \mid \text{Red})$. The same theorem applies, just switching the labels from “Blue” to “Red,” and from “Bag B” to “Bag A.”
In Bayesian statistics, the same formula is interpreted as a process of updating beliefs:
$$ \underbrace{P(\theta)}{\text{prior}} \;\to\; \underbrace{P(\text{data} \mid \theta)}{\text{likelihood}} \;\to\; \underbrace{P(\theta \mid \text{data})}_{\text{posterior}}. $$