Maximum A Posteriori (MAP) estimation can be viewed as an extension of Maximum Likelihood Estimation (MLE). In statistical inference, this distinction is about how each method incorporates prior knowledge.

Essentially, while MLE offers a data-centric view, MAP delivers a balanced perspective, amalgamating both data and prior knowledge to deduce parameter values.

MAP Formal

The MAP estimation is about finding the parameter values $\theta$ that maximize the posterior probability given the data. The posterior probability is indeed a product of the likelihood and the prior probability, as you've described. However, the way it's typically represented in the formula might need some refinement for clarity and accuracy.

Standard Notation:

The standard way to express the posterior probability in MAP estimation is:

$$ P(\theta | X = x) = \frac{P(X = x | \theta) \cdot P(\theta)}{P(X = x)} $$

Where:

It is essential to recognize that although the evidence (denominator in the Bayesian formula) plays a crucial role in determining the posterior probability, it does not influence the specific value of the parameter $\theta$ that maximizes this probability. This independence arises because the evidence is a constant with respect to $\theta$. As a result, in the context of optimization for MAP estimation, we can focus solely on the numerator, which is the product of the likelihood and the prior. This approach allows us to simplify the formula without impacting the determination of the optimal $\theta$. For convenience, we can define a new term, such as $\mathcal{B}$, to represent this product of the likelihood and the prior, keeping in mind that this term is essentially the unnormalized version of the posterior probability.

$$ \mathcal{B}(\theta) = P(X = x|\theta) \cdot P(\theta) $$

Where: