Hand - MLE for Normal Distribution

Normal (i.e., Gaussian) distribution is characterised by 2 parameters, the mean value and variance i.e., $\theta=\{\mu, \sigma^2\}$.

We assume we do NOT know the equations to calculate the mean value $\mu$ and the variance $\sigma^2$ of Normal distribution, can we derive it using MLE?

Given some observed events/samples from a Normal distribution, $Z = [z^{(1)}, z^{(2)}, ..., z^{(N)}]$. For example the weights of apples of a certain category; usually they should follow Normal distribution.

So likelihood of observing these events (e.g., the first apple has weight $z_1$) can be calculated as below,

$$ \begin{aligned} L(\mu, \sigma)&=\mathcal{N}(z^{(1)}|\mu, \sigma^2) \cdots \mathcal{N}(z^{(N)}|\mu, \sigma^2)\\ &= \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{z^{(1)}-\mu}{\sigma}\right)^{\!2}} \cdots \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{z^{(N)}-\mu}{\sigma}\right)^{\!2}} \end{aligned} $$

In this case, to maximise $L$ with respect to $\mu$ and $\sigma$, one method is to solve the 1st-order derivative of $L$ with respect to $\mu$ and $\sigma$. Obviously, no one : ) want to do that.

A more conventional approach is to maximise $\log(L)$, namely log likelihood function which usually written in $l(z \mid \mu,\sigma)$. We know the logarithmic function $\log(f)$ has a nice property that it will not affect the maxima or minima of a positive-value function.

For the example above with $N$ samples, the log likelihood is given below,

$$ \begin{aligned} l(\mu, \sigma) &=\ln(\frac{1}{\sigma\sqrt{2\pi}})+\ln( e^{ -\frac{1}{2}\left(\frac{z^{(1)}-\mu}{\sigma}\right)^{\!2}}) \cdots \ln(\frac{1}{\sigma\sqrt{2\pi}})+\ln( e^{ -\frac{1}{2}\left(\frac{z^{(N)}-\mu}{\sigma}\right)^{\!2}}) \\ &=N\ln(\frac{1}{\sigma\sqrt{2\pi}}) -\frac{1}{2}\left( \left(\frac{z^{(1)}-\mu}{\sigma}\right)^{\!2}+\, \cdots + \left(\frac{z^{(N)}-\mu}{\sigma}\right)^{\!2}\, \right) \end{aligned} $$

here ${1}/{\sigma\sqrt{2\pi}}$ can be write as ($2\pi\sigma^2)^{-1/2}$ so that $\ln(2\pi\sigma^2)^{-1/2}=-1/2\ln(2\pi\sigma^2)$

We can further tidy up the equation as below, for the second term, take out the common term $\sigma$,

$$ l(z|\mu, \sigma)=-\frac{N}{2}\ln({2\pi\sigma^2}) -\frac{1}{2\sigma^2}\left( \left(z^{(1)}-\mu\right)^{\!2}+\, \cdots + \left(z^{(N)}-\mu\right)^{\!2}\, \right) $$

Let us take the partial derivative of $l$ with respect of $\mu$,

$$ \frac{\partial{l}}{\partial{\mu}}=\frac{\partial}{\partial{\mu}}\left( -\frac{N}{2}\ln(2\pi\sigma^2)-\frac{1}{2\sigma^2}\left( \left(z^{(1)}-\mu\right)^{\!2}+\, \cdots + \left(z^{(N)}-\mu\right)^{\!2}\, \right) \right)=0 $$

Since we are calculating the derivative with respect to \mu, -N/2\ln(2\pi\sigma^2) will disappear because it does not contain \mu i.e., a constant.

$$ \frac{\partial{l}}{\partial{\mu}}=\frac{\partial}{\partial{\mu}}\left( -\frac{1}{2\sigma^2}\left( \left(z^{(1)}-\mu\right)^{\!2}+\, \cdots + \left(z^{(N)}-\mu\right)^{\!2}\, \right) \right)=0 $$

Because $-1/2\sigma^2$ is viewed as multiplicative constant when calculating derivative over $\mu$ which does not affect the solution of $\mu$, therefore, the equation can be further reduced into,

$$ \frac{\partial{l}}{\partial{\mu}}=\frac{\partial}{\partial{\mu}}\left(

\left(z^{(1)}-\mu\right)^{\!2}+\, \cdots + \left(z^{(N)}-\mu\right)^{\!2}\, \right)=0 $$

Using the chain rule, you can easily get,

$$ -2(z^{(1)}-\mu)+ \cdots + -2(z^{(N)}-\mu)=0\\\mu=\frac{z^{(1)}+\cdots+z^{(N)}}{N} $$