From a software engineering perspective, machine learning model is the same as a computer programming function $f$ that describes a system or a process that maps from inputs (arguments) $x$ and outputs (return value) $y$, i.e., $f: x \rightarrow y$.
A simple math function looks like below,
def f(x):
y = 3x**2+x+9
return y
In this example, the in-out relationships are explicity given so we can easily calculate $y$ based on an input $x$. Other examples like known physical and mechanical systems. e.g., the relationship between time and speed in the case of free fall $v = g \times t$ if we know the gravity $g$.
Programming is never easy. In many cases especially dealing with analysing unstructured data like images, audio, and text, it is very difficult to explicitly design logic that can correctly predict output $y$ based on input $x$. This is where and when machine learning show its strength.
Machine learning is a data-driven methodology aiming to approximate or replicate the target function $f$ by utilizing gathered data $x$ and/or $y$, that is, $\text{ML}: x, y \rightarrow f$. This approach can also be seen as a problem of system (function, process, etc.) identification. In the next section, we will examine the various complexity levels of the system identification problem.
Preliminary Note: While training neural networks fundamentally involves estimating parameters, in the following discussion, we will explore some non-parametric methods like function structure search. Although these methods aren't directly employed in neural networks, they are insightful for understanding the underlying operations of some neural networks.
In this scenario, understanding the input-output relationship is relatively simple, even with minimal data. Consider the following diagram. Can you deduce and program the function that maps x to y?
Drawing on high school mathematics, we know the formula for a linear function is:
$$ y = kx +b $$
Here, $k$ represents the slope, and $b$ is the y-intercept. In this example, the intercept is $2$ and the slope, calculated as $2 / 1$, is $2$, as determined from the graph. Therefore, the final function is:
$$ y = 2x+2 $$
This function was derived using primarily domain knowledge, with no data collected from the line (except maybe 2 points as estimated visually).
This linear function example is actually the simplest one. In reality, most functions cannot be directly derived from reading the chart. We need to make assumptions about the structure of the function and then restore the function's parameters through a series of processes. This leads us to the challenge of parameter estimation. Consider the toy example illustrated below. With some prior knowledge, can you deduce the function that maps $x$ to $y$ from the following diagram?