Input Layers and Data Types

The design of input layers in deep learning is profoundly influenced by the type and dimensionality of the data at hand. Each data type—whether it be the flat structure of tabular data, the pixel grids of images, or the sequential flow of text and video—demands a tailored approach that preserves its intrinsic structure.

Dimensionality acts as the key that unlocks the potential of these data forms, guiding us in building architectures that not only capture essential patterns but also respect the natural order of information.

Traditional Tabular Data

Data Dimensions: Typically a two-dimensional array or matrix.
Structure Description: Each row represents a data sample, and each column represents a feature.

Price	Make	Year	Mileage	Fuel Type	Engine Size (Litres)	Power
26950	KIA	2019	96560.4	Diesel	1.6	115
12690	Volkswagen	2013	96560.4	Petrol	1.4	122
12250	Volkswagen	2013	125000	Petrol	1.2	85
Nissan	Qashqai	2016	159750	Diesel	1.5	110

embed - 2025-02-18T204414.069.svg

Per Sample Dimension: $[D]$ where $D$ is the feature index.
Application Scenarios: Useful for tasks such as classification (e.g., predicting loan approval) and regression (e.g., estimating prices).
Input Layer Design Consideration: The flat structure of tabular data typically leads to the use of dense layers. However, if the data exhibits an inherent order (such as time series in a tabular format), the sequential information can be captured using architectures like recurrent or transformer-based layers.

Image Data

Data Dimensions: A three-dimensional array (height, width, color channels).
Structure Description: Image data is composed of multiple pixels, each containing information for different color channels (e.g., RGB).

embed - 2025-02-18T205924.345.svg

Per Sample Dimension: $[C, H, W]$ where $C$ is the channel index, $H$ is the height, and $W$ is the width.
Application Scenarios: Image classification, object detection.
Input Layer Design Consideration: CNNs are designed to exploit the spatial hierarchy of images. The sequential order of pixels is handled by convolutions that extract local features, while deeper layers capture broader contextual relationships.

Audio Data

Data Dimensions: Waveforms are usually one or two-dimensional arrays, while spectrograms are two-dimensional arrays.
Structure Description: Audio signals can be represented as amplitude values over time (a time series), or converted into a spectrogram that shows frequency versus amplitude.