NLL Modelling for Count Regression (Poisson)

In count regression, we aim to predict how many times an event happens in a fixed window, meaning the target is a non-negative integer $y^{(i)} \in \{0, 1, 2, \dots\}$.

Applications of Count Regression

Count regression is widely used across various fields to model phenomena where the outcome is a discrete count of events. Common real-world examples include:

Healthcare: Predicting the number of times a patient visits the emergency room in a given year or the number of asthma attacks a patient experiences per month.
Traffic & Transportation: Estimating the number of traffic accidents on a specific stretch of highway per week, or the number of bicycles crossing a bridge per day.
Retail & Business: Forecasting the number of customers entering a store in a given hour, or the number of product defects in a daily manufacturing batch.
Insurance: Modeling the number of insurance claims filed by a policyholder over the duration of their coverage.

Definition of Poisson distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space.

It is built on the assumption that these events occur with a known constant mean rate and independently of the time since the last event. If a random variable $Y$ follows a Poisson distribution with a rate parameter $\lambda$, the probability of observing exactly $k$ events is given by the probability mass function (PMF):

$$ P(Y = k) = \frac{\lambda^k e^{-\lambda}}{k!} $$

Where:

$k \in \{0, 1, 2, \dots\}$ is the number of occurrences.
$\lambda > 0$ is the expected number of occurrences (the distribution parameter, rate), which uniquely also serves as both the mean and the variance of the distribution.
$e$ is Euler's number ($e \approx 2.71828$).

Conditioned Likelihood

To model this probabilistically, we assume that the target variable $y$, given the input $x$, follows a Poisson distribution parameterized by a rate $\lambda=z^{(i)} > 0$. For the sake of consistency, the following text uses $z^{(i)}$ instead of $\lambda$ to represent the rate.

$$ y^{(i)} \mid x^{(i)},\theta \sim \text{Poisson}(z^{(i)}) $$

Our neural network processes the input $x$ and outputs the predicted expected rate, $z^{(i)} = f_\theta(x^{(i)})$, ensuring $z^{(i)} > 0$.

Applications of Count Regression

Definition of Poisson distribution

Conditioned Likelihood

Deriving the Final Loss Function