Perceptron, also known as single layer perceptrion, is a type of machine learning model that can be tracked back to the 1950s and 1960s. It is a simple algorithm for binary classification just like logistic regression.

The primary reason we explore perceptrons is that their core components, especially linear transformations and activation functions are widely used in modern deep neural networks.

Notation Alert: In this context, we're aligning our notation with that commonly found in Python-based neural network frameworks such as PyTorch. For instance, we express the operation as $x@\mathbf{W} + b$, diverging from the traditional linear algebra expression $\mathbf{w}^Tx + b$. This modification caters to the Python data science convention where each row (or the last dimension) represents an input sample $x$. Consequently, the weight matrix $\mathbf{W}$ is post-multiplied by the sample vector. This approach harmonizes the mathematical representation with the coding practices.

Original Perceptron

The original perceptron was developed for classification tasks. Its computing process, with 3 inputs and 3 weights, is visualized below.

embed(3).svg

The equation of the above diagram is given as follows:

embed(3).svg

$$ y = \sigma(z)=\sigma(x@\mathbf{W}+b)= \sigma(\begin{bmatrix} x_1 & x_2 & ...& x_D \end{bmatrix}\begin{bmatrix} w_1 \\ w_2 \\ \vdots\\ w_D \end{bmatrix}+ b)=\sigma(\sum_{i=1}^D{w_ix_i} + b) $$

where:

The features in an input vector $x$ are represented as $x_1, x_2, \cdots, x_D$.
Each feature is associated with a weight, represented as $w_1, w_2, \cdots, w_D$, with their vector (1-D matrix) form denoted by $\mathbf{W}$.
The bias, denoted by $b$, is a scalar value added to the inner product of the inputs and weights.
The activation function, denoted by $\sigma$, is a function that transforms the dot product of the inputs and weights into the output.

The activation function used in the original perceptron is a step function. This function is defined as follows:

$$ \hat{y} = \text{step}(z) = \begin{cases} 1 & \text{if } z > 0 \\ 0 & \text{if } z \leq 0 \end{cases}

In essence, the step activation function assigns a value of 1 to any input that is greater than zero, and a value of 0 to inputs that are less than or equal to zero. This binary output reflects the classification of the input data.

Logistic Regression as a "Softer" Perceptron

Logistic regression and the original perceptron are fundamentally similar yet distinct in two key aspects. Firstly, while the original perceptron employs a step function as its activation function, logistic regression utilizes the Sigmoid function, defined as $\sigma(z) = \frac{1}{1 + e^{-z}}$. Secondly, the original perceptron's loss is a simple binary determination of correct or incorrect predictions, in contrast to logistic regression, which uses the negative log likelihood$-\left(y\log(\hat{y}) + (1 - y)\log(1 - \hat{y})\right)$ as its loss function.

embed(5).svg