Neural Network Can Fold Papers

Let’s delve into the geometric interpretation of logistic regression and explore how it can be extended to support nonlinear decision boundaries, contrasting these with its inherent linear boundary. Again, in this section, we do not consider how the neural network is trained but focus instead on how a trained neural network is used to make classifications.

Logistic Regression: Simplifying the Case to Two Dimensions

Since visualizing high-dimensional data is inherently challenging, let’s simplify the scenario by considering two-dimensional input data $\mathbf{x} = \left[x_1, x_2\right]$, where each sample consists of two features.

Logistic regression first applies a linear transformation to this input, resulting in a scalar value $z$. This process can be expressed mathematically as:

$$ z = \mathbf{w}\cdot\mathbf{x} + b $$

Here, $\mathbf{w}$ is the weight vector, and $b$ is the bias term.

To understand the role of $z$ in logistic regression, we begin by visualizing the data points $\mathbf{x}$, colored according to their binary labels $y$. Specifically, positive samples ($y = 1$) are assigned a yellow color, while negative samples ($y = 0$) are colored purple.

Untitled Diagram.drawio (4) (1).png

Scaling by $|\mathbf{w}|$: After the model has been fully trained, $\mathbf{w}$ becomes a fixed parameter. Consequently, $z$ serves as a scaled distance affected by $|\mathbf{w}|$. While this scaling alters the magnitude of $z$, it does not affect the order or relative relationships among the $z$ values for different data points.

Interpreting $z$: The value of $z$ provides two critical pieces of information:

Absolute value of $z$: This indicates the distance of the sample from the decision boundary. Larger magnitudes of $z$ imply greater confidence in the prediction.
Sign of $z$: The sign denotes which side of the decision boundary the sample lies on. Positive $z$ values correspond to predictions of $y = 1$, while negative $z$ values correspond to $y = 0$.

Decision Boundary as a Separator: Geometrically, the decision boundary, represented by $\mathbf{w}\cdot \mathbf{x} + b = 0$, partitions the feature space into two halves. Points on one side are predicted as $y = 1$, while points on the other side are predicted as $y = 0$.

The visualization below provides a complete illustration of the classification process. Please examine the figure thoroughly as it clearly demonstrates how classification is executed.

embed (51).svg

In this diagram, each 2D point is mapped to a specific position along the z-axis. Purple dots and yellow crosses depict this mapping within the figure. Following this, the z-values are fed through a sigmoid function, $\sigma(z)$, which assigns a probability ranging from 0 to 1 to each sample. A z-value greater than zero implies a probability above 0.5, categorizing the point as belonging to the positive class. On the other hand, a z-value less than zero indicates a probability below 0.5, classifying the point as belonging to the negative class.

The critical decision point in z-space, where class probability shifts, is at $z = 0$. This marks our decision boundary $\mathbf{w}·\mathbf{x} + b$ in the original feature space. Here, it manifests as a decision line (the dashed line in the diagram), dividing the space into regions of positive and negative classifications.

Linear to Nonlinear

In logistic regression, the decision boundary is understood to be a straight line that divides a 2D space into two halves. However, in cases involving nonlinear classification tasks, as illustrated below, a linear decision boundary (akin to a straight line) is insufficient for effectively distinguishing between the purple and yellow samples.

embed (50).svg