When a target is a probability, rate, or proportion, it is strictly bounded between $0$ and $1$, so $y^{(i)} \in (0,1)$.

To model this probabilistically, we assume that the target variable $y$, given the input $x$, follows a Beta distribution parameterized by two positive shape values $\alpha > 0$ and $\beta > 0$.

$$ y^{(i)} \mid x^{(i)},\theta \sim \text{Beta}(\alpha^{(i)}, \beta^{(i)}) $$

Our neural network processes the input $x$ and outputs values used to derive the predicted shapes $\alpha^{(i)}$ and $\beta^{(i)}$, giving us $z^{(i)} = \{\alpha^{(i)}, \beta^{(i)}\} = f_\theta(x^{(i)})$.

Deriving the Final Loss Function

The likelihood of observing our target $y^{(i)}$ given the predicted shapes is defined by the Beta probability density function:

$$ p(y^{(i)} \mid x^{(i)}, \theta) = \frac{\Gamma(\alpha^{(i)}+\beta^{(i)})}{\Gamma(\alpha^{(i)})\Gamma(\beta^{(i)})} \left(y^{(i)}\right)^{\alpha^{(i)}-1} \left(1-y^{(i)}\right)^{\beta^{(i)}-1} $$

Apply Negative Log-Likelihood (NLL)

We take the negative natural logarithm of this function.

$$ \min_{\theta} \sum_{i=1}^{N} -\log \left[ \frac{\Gamma(\alpha^{(i)}+\beta^{(i)})}{\Gamma(\alpha^{(i)})\Gamma(\beta^{(i)})} \left(y^{(i)}\right)^{\alpha^{(i)}-1} \left(1-y^{(i)}\right)^{\beta^{(i)}-1} \right] $$

Using logarithm rules to expand the Gamma functions and exponents, we get:

$$ \footnotesize \sum_{i=1}^{N} \left( \log\Gamma\left(\alpha^{(i)}\right) + \log\Gamma\left(\beta^{(i)}\right) - \log\Gamma\left(\alpha^{(i)}+\beta^{(i)}\right) - \left(\alpha^{(i)}-1\right)\log\left(y^{(i)}\right) - \left(\beta^{(i)}-1\right)\log\left(1-y^{(i)}\right) \right) $$

Similar to the Gamma distribution, all parts of the equation interact with the model's predicted parameters $\alpha^{(i)}$ and $\beta^{(i)}$, meaning no terms are dropped as constants.

This leaves us with the final Beta Loss formula:

$$ \footnotesize \min_{\theta} \sum_{i=1}^{N} \overbrace{ \left( \log\Gamma\left(\alpha^{(i)}\right) + \log\Gamma\left(\beta^{(i)}\right) - \log\Gamma\left(\alpha^{(i)}+\beta^{(i)}\right) - \left(\alpha^{(i)}-1\right)\log\left(y^{(i)}\right) - \left(\beta^{(i)}-1\right)\log\left(1-y^{(i)}\right) \right) }^{\text{Beta Loss}} $$