What is Machine Learning? A Gental Introduction

Machine learning is fundamentally about learning a function—similar to how we define simple mathematical functions. For example:

def f(x):
    y = 3x**2+x+9
    return y

In this case, the function is explicitly defined by a set of rules we coded ourselves. The input-output relationship is clear, so given an input $x$, we can directly compute the output $y$. This is typical in known systems, such as physical or mechanical models. For instance, in free fall, the relationship between velocity and time is given by $v = g \times t$ assuming we know the gravitational constant $g$.

Why Not Rule-based Coding?

However, in many real-world scenarios, writing such rules by hand can be tedious—or even impossible. Imagine you want to teach a computer to recognize a picture of a cat. You could try to write down a set of rules, like "if it has pointy ears, whiskers, and fur, then it's a cat." But what about all the different cat breeds, poses, and backgrounds? Writing rules for every possibility would be nearly impossible.

So What?

This is where machine learning comes into play. Instead of giving the computer explicit instructions, we give it data—in this case, thousands of pictures labeled as "cat" or "not a cat." The computer then "learns" the patterns on its own.

From a technical standpoint, a machine learning model is like a function, let's call it f. This function takes an input x (the image) and produces an output y (the label "cat").

Machine learning is the process of using data x,y to figure out what this function f looks like.

How-to?

Now that we understand the goal is to determine the function from the data, the question is: how is this possible? The following content will present several examples and strategies for identifying the function.

Level 0: Look at the Pictures and Talk

Sometimes, it's fairly straightforward for us to read the function such as a linear function. Look at the diagram below—can you figure out and code the function that maps $x$ to $y$?

embed (68).svg

We know a linear function is written as:

$$ y = kx +b $$

Here, $k$ represents the slope, and $b$ is the y-axis intercept. In this example, the intercept is $2$ and the slope $2$, as determined from the graph. Therefore, the final function is:

$$ y = 2x+2 $$

This function was determined by interpreting the figure (data visualization) and applying some domain knowledge, without collecting actual data from the line—aside from perhaps visually estimating two points.

Level 1: Parameter Estimation - Model is Known

This simple linear example is easy to interpret. In a lot of real cases, we often know the function form and but needs to estimate its parameters. This is called parameter estimation.