The design of input layers in deep learning is profoundly influenced by the type and dimensionality of the data at hand. Each data type—whether it be the flat structure of tabular data, the pixel grids of images, or the sequential flow of text and video—demands a tailored approach that preserves its intrinsic structure.
Dimensionality acts as the key that unlocks the potential of these data forms, guiding us in building architectures that not only capture essential patterns but also respect the natural order of information.
Traditional Tabular Data
- Data Dimensions: Typically a two-dimensional array or matrix.
- Structure Description: Each row represents a data sample, and each column represents a feature.
Price |
Make |
Year |
Mileage |
Fuel Type |
Engine Size (Litres) |
Power |
26950 |
KIA |
2019 |
96560.4 |
Diesel |
1.6 |
115 |
12690 |
Volkswagen |
2013 |
96560.4 |
Petrol |
1.4 |
122 |
12250 |
Volkswagen |
2013 |
125000 |
Petrol |
1.2 |
85 |
Nissan |
Qashqai |
2016 |
159750 |
Diesel |
1.5 |
110 |

- Per Sample Dimension: $[D]$ where $D$ is the feature index.
- Application Scenarios: Useful for tasks such as classification (e.g., predicting loan approval) and regression (e.g., estimating prices).
- Input Layer Design Consideration: The flat structure of tabular data typically leads to the use of dense layers. However, if the data exhibits an inherent order (such as time series in a tabular format), the sequential information can be captured using architectures like recurrent or transformer-based layers.
Image Data
- Data Dimensions: A three-dimensional array (height, width, color channels).
- Structure Description: Image data is composed of multiple pixels, each containing information for different color channels (e.g., RGB).


- Per Sample Dimension: $[C, H, W]$ where $C$ is the channel index, $H$ is the height, and $W$ is the width.
- Application Scenarios: Image classification, object detection.
- Input Layer Design Consideration: CNNs are designed to exploit the spatial hierarchy of images. The sequential order of pixels is handled by convolutions that extract local features, while deeper layers capture broader contextual relationships.
Audio Data
- Data Dimensions: Waveforms are usually one or two-dimensional arrays, while spectrograms are two-dimensional arrays.
- Structure Description: Audio signals can be represented as amplitude values over time (a time series), or converted into a spectrogram that shows frequency versus amplitude.