A machine learning tutorial part 1: Basic Guide

In this article I have decided to make a quick and easy tutorial for machine learning for anyone who is willing to learn about AI from beginning.

what is AI:

Artificial Intelligence which commonly known as AI, is simply the area of research that enable machines/computers to mimic natural human intelligence. The main purpose of AI is to give machine the ability to make self-decisions by analyzing the data by themselves, just like humans making decisions. There are many subfields of AI that are several subfields of Artificial Intelligence.

In a simple manner, the main purpose of AI is to artificially build an intelligence that can work similarly to natural human intelligence. The major and most common subfield of AI is known as machine learning.

What is Machine Learning:

Artificial Intelligence is included with several subfields. One of the major subfields of modern AI is machine learning. Machine learning is a major subfield of AI that enables computers/machines to learn by analyzing data and detect data patterns to perform specific tasks by themselves without human intervention. The core principle of machine learning is to build and implement algorithmic structures (ML algorithms) for data analyzing and make decision based on data pattern results.

There are several types of ML techniques.

Supervised learning
Semi-supervised learning
Unsupervised learning
Reinforcement learning
Deep learning

Supervised Learning:

supervised learning is a ML technique that use labeled datasets train ML algorithms to detect data patterns that uncover the relationships between input and out data. The primary goal is to create a model that can predict accurate output results on new data based on labeled data.

semi-supervised learning:

Semi-supervised learning is a hybrid approach that combines elements of both supervised and unsupervised learning. In this method, a model is trained on a dataset that contains a small amount of labeled data (where the correct outputs are known) and a much larger amount of unlabeled data (where the outputs are unknown). The idea is to leverage the labeled data to guide the learning process while using the unlabeled data to uncover additional patterns or structure in the data. This approach is particularly useful when labeling data is expensive or time-consuming, as it reduces the need for extensive labeled datasets. For example, in a semi-supervised scenario, you might have a few labeled images of cats and dogs, and a large pool of unlabeled animal images. The model uses the labeled examples to start learning and then refines its understanding by exploring the unlabeled data.

Unsupervised learning:

Unsupervised learning, on the other hand, involves training a model on a dataset that has no labeled data at all. The goal is to discover hidden patterns, groupings, or structures within the data without any predefined categories or outputs. Common techniques include clustering (e.g., grouping similar items together) and dimensionality reduction (e.g., simplifying the data while preserving its essence). For instance, if you give an unsupervised learning model a bunch of customer data with no labels, it might identify natural clusters of customers with similar buying habits.

Reinforcement learning:

Reinforcement learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. Unlike supervised learning, where the model is trained on labeled data, or unsupervised learning, where the model finds patterns in unlabeled data, reinforcement learning is about learning through trial and error to maximize a cumulative reward.

Here’s how it works: The agent observes the current state of the environment, takes an action, and receives feedback in the form of a reward (positive or negative) and a new state. Over time, the agent learns a policy—a strategy or set of rules—that tells it which actions to take in different states to achieve the highest total reward. The process is often modeled as a Markov Decision Process, where the future depends only on the current state, not the past.

Deep learning:

Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers—hence the “deep”—to model and solve complex problems. It’s inspired by the way the human brain processes information, with interconnected nodes (neurons) that pass signals to one another. What sets deep learning apart from traditional machine learning is its ability to automatically learn features from raw data, rather than relying on humans to hand-engineer them.

In a deep learning model, data flows through layers of neurons. Each layer transforms the input in some way—extracting patterns, refining representations, or making abstractions—before passing it to the next layer. The early layers might detect simple features (like edges in an image), while deeper layers combine these into more complex concepts (like shapes or objects). This hierarchical feature learning makes deep learning especially good at tasks like image recognition, natural language processing, and speech synthesis

A brief history on evolution of machine learning:

The first type of ML techniques human used were supervised learning. It was the first stage of machine learning for computers. Later, with technological advancements throughout the time, it was clear that semi-supervised learning can be achieved and it made the foundation for unsupervised learning later on. Then AI researchers wanted to build AI systems that have the ability to learn from feedbacks they receive just like teaching a child by giving them candy as a reward. This was the beginning of reinforcement learning. The most recent development of machine learning is called deep learning. It is a more advanced approach of machine learning where we use a neural network architecture that is being designed simulate a human brain.

Now, let’s take a look into major supervised learning algorithms.

There are two major types of machine learning algorithmic structures.

Classification:

Classification in machine learning (ML) is a type of supervised learning where the goal is to predict the category or class that a given input belongs to. It’s about teaching a model to assign labels to data points based on patterns it learns from a training dataset, where the correct labels are already known. Think of it as sorting things into buckets—like deciding whether an email is “spam” or “not spam,” or identifying if a photo contains a “cat” or a “dog.”

How It Works:

Training Data: You start with a dataset where each data point has features (e.g., email word count, presence of certain words) and a known label (e.g., “spam” or “not spam”).
Learning: The ML algorithm analyzes the features and figures out how they relate to the labels. It builds a model that captures these patterns—like noticing that emails with “win a prize” are often spam.
Prediction: When given a new, unlabeled data point, the model uses what it learned to predict its class. For example, a new email gets classified as “spam” based on its features.
Output: The prediction is a discrete label—usually from a fixed set of options (e.g., “positive,” “negative,” “neutral” for sentiment analysis).

Types of Classification:

Binary Classification: Two classes (e.g., “yes” vs. “no,” “true” vs. “false”).
Multi-Class Classification: More than two classes (e.g., “cat,” “dog,” “bird”).
Multi-Label Classification: Each data point can belong to multiple classes at once (e.g., a movie tagged as both “comedy” and “romance”).

Regression:

Regression in machine learning (ML) is a type of supervised learning where the goal is to predict a continuous output value rather than a discrete category. It’s about modeling the relationship between input features and a numeric target variable, so you can estimate things like prices, temperatures, or scores. If classification is sorting things into buckets, regression is drawing a line (or curve) to predict where a number falls.

How It Works:

Training Data: You start with a dataset where each data point has features (e.g., house size, number of bedrooms) and a known numeric outcome (e.g., price in dollars).
Learning: The algorithm analyzes the features and learns a mathematical function that maps them to the target value. For example, it might find that bigger houses with more bedrooms tend to cost more.
Prediction: For a new data point, the model uses this function to predict a number. Say a house is 2,000 sq ft with 3 bedrooms—the model might predict a price of $300,000.
Output: The result is a continuous value, not a fixed category.

Types of Regression:

Linear Regression: Assumes a straight-line relationship (e.g., price = a * size + b * bedrooms + c).
Polynomial Regression: Fits a curved line for more complex relationships.
Logistic Regression: Technically for classification, but predicts probabilities (a special case).
Ridge/Lasso Regression: Linear regression with regularization to prevent overfitting.
Nonlinear Regression: For when the relationship isn’t a simple line or polynomial.

following are major algorithm types in supervised learning.

K-nearest neighbors algorithm:

The k-Nearest Neighbors (k-NN) algorithm is a simple, intuitive machine learning method used for classification and regression. It’s based on the idea that similar things are close to each other in a feature space. Imagine you’re trying to figure out if a new fruit is an apple or an orange—you look at the fruits it’s most similar to (its “neighbors”) and decide based on what they are.

Here is the work process:

Data Setup: You start with a labeled dataset where each data point has features (e.g., size, color) and a known label (e.g., “apple” or “orange”). These points are plotted in a multi-dimensional space based on their feature values.

New Point: When a new, unlabeled point comes in (e.g., a mystery fruit), k-NN measures the distance between this point and all the points in the dataset. Common distance metrics include Euclidean distance (straight-line distance) or Manhattan distance.

Find Neighbors: It picks the “k” closest points—the nearest neighbors. The value of “k” is a parameter you choose (e.g., k=3 means look at the 3 closest points).

Decision Rule:

For classification, it takes a majority vote among the k neighbors. If 2 out of 3 neighbors are “apples,” the new point is classified as an “apple.”
For regression, it averages the values of the k neighbors. If the neighbors have values like 5, 6, and 7, the prediction might be 6.

Output: The algorithm assigns the new point a label (classification) or value (regression) based on its neighbors.

Decision Tree algorithm:

The Decision Tree algorithm is a popular machine learning method used for both classification and regression tasks. It works by breaking down a dataset into smaller subsets based on feature values, creating a tree-like structure of decisions that lead to a final prediction. Think of it as a flowchart or a game of 20 questions: you ask yes/no questions to narrow down the possibilities until you reach an answer.

How It Works:

Root Node: Start with the entire dataset at the top (the “root”).
Splitting: Pick a feature and a threshold (e.g., “Is age > 30?”) to split the data into two or more branches. The goal is to make the resulting groups as pure as possible—meaning they contain mostly one class (for classification) or similar values (for regression).
Repeat: For each branch, repeat the splitting process using different features or thresholds, building the tree level by level. These splits form internal nodes (decision points) and branches (outcomes of the decision).
Leaf Nodes: Stop splitting when you reach a stopping criterion—like a maximum tree depth, a minimum number of samples per node, or when further splits don’t improve purity much. The end points are called leaf nodes, and each leaf represents a final prediction (e.g., a class label or a numeric value).
Prediction: To classify or predict for a new data point, follow the tree from the root, answering the questions at each node based on the point’s features, until you land at a leaf.

Logistic regression algorithm:

Logistic regression is a statistical method used for binary classification, which means it’s designed to predict the probability of an outcome that can fall into one of two categories—like “yes” or “no,” “true” or “false,” or “0” or “1.” Despite the name, it’s not a regression technique for predicting continuous values (like height or temperature); instead, it’s all about modeling the likelihood of a discrete outcome.

Naive bayes algorithm:

The Naive Bayes algorithm is a simple yet powerful probabilistic machine learning method used primarily for classification tasks. It’s based on Bayes’ Theorem, which deals with calculating the probability of an event given prior knowledge. The “naive” part comes from its key assumption: it assumes that all features (or predictors) in the dataset are independent of each other, which is rarely true in real-world scenarios but often works well enough in practice.

Support vector machines:

Support Vector Machines (SVMs) are a powerful supervised machine learning algorithm used for classification and regression, though they’re most commonly applied to classification tasks. The core idea is to find the best possible boundary (or hyperplane) that separates data points of different classes with the widest possible margin.

Here is the key concept behind support vector machines.

Imagine you have a dataset with two classes plotted in a 2D space (e.g., cats vs. dogs based on height and weight). An SVM tries to draw a straight line (or hyperplane in higher dimensions) that separates the two groups. But it doesn’t just pick any line—it chooses the one that maximizes the distance (margin) between the line and the nearest points from each class. These nearest points are called support vectors, and they’re critical because they define the boundary.

SVMs shine in tasks like text classification (e.g., spam detection), image recognition, and bioinformatics (e.g., protein classification). They were a go-to algorithm before deep learning took over for complex problems, but they’re still valuable for smaller, structured datasets or when interpretability matters.

In short, SVM is like a perfectionist architect: it builds the most robust boundary it can, balancing precision and flexibility, even if it takes some clever tricks to get there.

Now let’s move on to unsupervised learning algorithm types.

k-means clustering:

It is one of the most popular clustering methods used in machine learning. Unlike supervised learning, the training data that this algorithm uses is unlabeled, meaning that data points do not have a defined classification structure.

While various types of clustering algorithms exist, including exclusive, overlapping, hierarchical and probabilistic, the k-means clustering algorithm is an example of an exclusive or “hard” clustering method. This form of grouping stipulates that a data point can exist in just one cluster. This type of cluster analysis is commonly used in data science for market segmentation, document clustering, image segmentation and image compression. The k-means algorithm is a widely used method in cluster analysis because it is efficient, effective and simple.

K-means is an iterative, centroid-based clustering algorithm that partitions a dataset into similar groups based on the distance between their centroids. The centroid, or cluster center, is either the mean or median of all the points within the cluster depending on the characteristics of the data.

Artificial neural networks:

A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.

Every neural network consists of layers of nodes, or artificial neurons—an input layer, one or more hidden layers, and an output layer. Each node connects to others, and has its own associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

In this article, we have discussed about the basic understanding into the field of machine learning.

Vishwa GW