In the previous examples, we explored the capabilities of shallow and wide neural networks as well as deep neural networks. Now, let's discuss why deep neural networks are currently popular compared to shallow and wide ones.

Depth: Reuse of Functions

The primary reason lies in the ability of deep neural networks to structurally analyze data and extract structured patterns. Shallow networks, due to their inherent shallowness, have weaker capabilities in structurally analyzing data. The concept of depth versus shallowness can be understood through code abstraction. Programmers often strive to reduce the number of lines of code by encapsulating frequently used logic into functions.

These reusable functions can greatly simplify code complexity, allowing a given code length to embody more intricate logic. For example, the inner product is commonly used function in data analytics, machine learning, signal processing, etc.

embed (52).svg

In a single project, we may require the implementation of functions ranging from linear regression to PCA. Since all of these functions rely on the inner product, it is a good practice to encapsulate the inner product operation within a function. Then, when needed by the four higher-level functions, we can simply call this function instead of reimplementing it each time.

This concept forms the basis for the hierarchical structure found in deep neural networks, where there are more layers (instead of more neurons per layer). In these networks, the initial layers can be seen as fundamental sub-functions, akin to the building blocks (e.g., inner product) of code. As we progress to subsequent layers, they build upon these foundational components, combining them to create more complex functions.

Let's take cat recognition as an example:

embed - 2024-01-13T214835.165.svg

The process of recognizing a cat involves identifying shapes such as triangles for ears and circles for eyes. To discern these shapes, the neural network initially learns to recognize lines and curves (edges). In this context, "Cat Recognition" corresponds to the final output layer of the network, while the detection of a "45° Line" represents the functions of earlier hidden layers processing edge information.

Problem of Shallow Structure

A shallow neural network imposes constraints similar to a programmer needing to avoid nesting functions within functions and instead write all the code in a single script.

In the context of neural networks, "shallow" refers to the absence or minimal use of a hierarchical structure. In a shallow neural network, each higher-level feature extraction task necessitates the reimplementation of lower-level feature extraction, ultimately leading to the proliferation of redundant neurons, wasting neurons and constraining the network's learning potential.

embed - 2024-01-13T220523.656.svg

Visualization of CNN Kernels

Let's take a practical example involving a CNN designed for car recognition. In the figure below, you can see how each layer within the neural network handles information. The initial layers primarily focus on extracting fundamental patterns, such as edges at various angles. On the other hand, the later layers are responsible for identifying more intricate and specific patterns specific to cars, such as wheels, lights, and the front bumper. As you move from the lower layers to the higher ones, you'll notice a concept of composition, with the later layers resembling a combination of 'AND' and 'NAND' logic gates. If certain patterns are detected while others are not, the system classifies the input accordingly, determining it to be a car or not.

Untitled