Modern deep learning models are endlessly stretching the boundaries of size and complexity, from massive language models with hundreds of billions of parameters to specialized architectures like mixtures of experts. Despite the clear gains in performance, these larger and more intricate models inevitably pose significant challenges in terms of training costs, memory usage, and deployment feasibility.

In this tutorial, we’ll discuss a systematic way to analyze and design neural network architectures by:

  1. Introducing the “Four Quadrants” framework, which classifies parameterization strategies along two major axes (Dense vs. Partial, Independent vs. Shared).
  2. Exploring three key “dimensions” along which parameters are arranged and reused (Data Features, Data Sequences/Relations, and Model Depth).

By understanding these design axes and dimensions, you can make more informed decisions that balance expressiveness (capacity to capture complex patterns) and efficiency (speed, scalability, and resource usage).

The Four Quadrants - Quadrant Axes

This figure is just a example, it is not accurate!

This figure is just a example, it is not accurate!

Dense vs. Partial Parameterization

In a dense setup, every weight is potentially involved in processing any given input. Transformer encoders, for instance, use the same attention layers for all tokens, and each weight in these layers remains active for every token processed.

embed - 2025-02-03T190334.432.svg

In partial architectures, the model explicitly divides or gates subsets of parameters for specific inputs or tasks. Examples include:

Independent vs. Shared Weights

An architecture with fully independent weights has a unique parameter for each connection or neuron. While this maximizes representational power, it also inflates the parameter count dramatically. A simple MLP with a weight matrix of size