In this beginners’ tutorial, we’ll explain the machine learning algorithm types and some popular algorithms.
Machine learning is a critical skill for data science. According to our analysis, 64% of the Indeed job postings require machine learning skills for data scientists.
Following this guide, you can break into machine learning by understanding:
- What is machine learning, in simple words.
- What are supervised learning, unsupervised learning, and reinforcement learning, the three main types of machine learning algorithm.
- 10 commonly used machine learning algorithms, with some step-by-step example projects.
In the end, you’ll gain an overview of machine learning (ML) and when to use these algorithms when practicing ML.
Let’s get started!
Machine Learning for Beginners: What is machine learning?
In simple words, machine learning is when the computers being able to learn and perform certain tasks, without being programmed to do so.
Like the process of humans learning from experience, computers can learn from the “training” dataset provided to it.
The computer applies machine learning algorithms to create mathematical models. After this process, machines can make predictions or decisions on a new dataset.
What are the relationships: machine learning vs. data science vs. AI vs. deep learning?
This is a question that often confuses beginners. Here’s a quick explanation.
Machine learning is a subset of data science, where data science contains other data-related processes. And it’s also a fundamental concept within Artificial Intelligence (AI). While deep learning is a subset of machine learning based on neural networks with “deep” or multiple hidden layers.
What are the different Types of Machine Learning Algorithms?
Based on the problems, we can divide machine learning algorithms into three main types:
- Supervised Learning – learn based on existing labels/target to make better predictions.
- Unsupervised Learning – learn without labels/target to identify insights/clusters.
- Reinforcement Learning – learn based on trials and errors to maximize rewards.
Each of these three machine learning algorithm types also has a breakdown of sub-categories. Here is a chart showing the ML types.

Let’s look at them one by one!
Supervised Learning
Within supervised learning problems, the machines are provided labeled training dataset, where there are both input variables (X) and an output variable (y).
The objective of the problem is to find a suitable mapping function f from X to y. Then when new inputs come in, we can apply f to predict the corresponding output.
y = f(X) or Output = f(Input)
The machines learn and improve the function f through iterative optimization. This process involves minimizing the difference between the estimated and the actual output.
The process is called “supervised” learning since it is “supervised” by the human-labeled output variable.
We can divide supervised learning into regression and classification based on whether the output variable/target is numerical or categorical.
Categorical data are divided into categories such as gender (male/female), competition levels (low/medium/high). While numerical data are represented by numbers such as body weight, the number of dogs.
Classification
When the target is a categorical variable, we use classification.
Based on the training observations with known labeled categories, classification is the problem of predicting the categories a new observation belongs to. It is about learning the patterns among observations based on experience.

Below are some examples of classification applications:
- Classify customers of banks into different credit profiles based on existing customers’ status.
– Input variables include income, age, gender, location, expenses, etc.
– Output variable: whether a customer defaults on the payment to banks. - Identify fraud among bank transactions based on fraudulent historical tags.
– Input variables include transaction amount, date/time, customer history, merchants, etc.
– Output variable: whether a transaction is a fraud or not. - Apply sentiment analysis on Yelp review data based on historical sentiment labels.
– Input variables include the text of reviews.
– Output variable: whether the review is positive, negative, or neutral.
Related article: How to do Sentiment Analysis with Deep Learning (LSTM Keras)
A tutorial showing an example of sentiment analysis: learn how to build a deep learning model to classify the Yelp review data in Python step-by-step.
Regression
When the output variable is numerical, we have a regression problem.
It is about estimating the relationship between the target (dependent variable) based on at least one input variable (independent variable/feature). It is often used to predict or forecast based on experience.

Below are some examples of regression problems:
- Predict housing prices based on historical sales.
– Input variables may include the size and age of the property, number of bathrooms, property tax, etc.
– Output variable: the price of the house. - Time series forecasting.
– Input variables are the values of the time series (the historical data).
– Output variable: the value of the time series at a later time. - Customer Spending based on a sample of existing customers.
– Input variables can have customer demographics, income, etc.
– Output variable: how much a customer spends.
Unsupervised Learning
When we only have input variables (X) but no corresponding output variable, the training dataset is unlabeled. The machine can still extract structures within the dataset, such as grouping or clustering of data points.
This becomes an unsupervised learning problem. You can also think of it as “data mining” without any existing knowledge/labels.
Association Rule Learning
Association rules are used to analyze transactions to find interesting relationships among different variables. It is mostly used in market basket analysis.
What other products are likely to be purchased when a customer buys both laptop and a mouse?

From the algorithms, we may find a rule that there’s a 90% probability the customer will also buy a laptop cover.
We can then design marketing strategies to make it convenient for the customers to shop. If it’s a physical store, we can place these items closer. If it’s an online store, we could recommend laptop cover to a customer who has both laptop and mouse in the shopping cart.
Clustering
Clustering is the corresponding unsupervised procedure of classification. When the training dataset doesn’t have labeled categories, we use clustering. The algorithm helps grouping observations into categories based on some measure of inherent similarity or distance.
In other words, clustering is about separating a set of data objects into clusters. The observations in the same group are more like each other than the observations in different groups.

It is used to recognize patterns among clusters in different fields such as biology, marketing, etc.
For example, we can segment customers into groups such that customers in the same group behave similarly.
Then we can target specific groups with customized marketing campaigns to generate more sales.
Dimensionality Reduction
This unsupervised problem is about reducing the number of input variables while retaining most information in the dataset.

There are several advantages of reducing the features:
- less processing time/storage space is needed.
- the potential of removing multi-collinearity.
- easier to visualize the data.
- avoid the curse of dimensionality.
For example, say we have a dataset with 100 features/input variables.
If we can reduce 100 to 5 while retaining 90% of the original dataset’s valuable information, it’ll be easier to perform other tasks. For example, we can use the reduced dataset for clustering without losing much accuracy.
Reinforcement Learning
When the training dataset is a set of rules/environment, and we want the machine/agent to maximize the rewards within this environment, this is reinforcement learning.
We want the machines to learn by trial and error, which is often expected in games.

In this case, the machines are not bound by any experience, and they learn based on each trial’s feedback. This makes the machines with the potential of becoming better than humans. That’s how AlphaGo beats our best human players!
With feedback/labels from the environment, reinforcement learning is somewhat similar to supervised learning.
Although unsupervised learning and reinforcement learning sound more advanced, supervised learning is more common and should be the main focus of the study.
After learning the types of machine learning problems, let’s look at 10 commonly used algorithms.
10 Popular Machine Learning Algorithms
In this section, we’ll introduce some of the most practical machine learning algorithms. It is hard to classify them according to the above categories since one algorithm could be used for multiple machine learning types.
Linear Regression
Linear Regression is one of the fundamental algorithms that every data analyst or scientist should know. It is often the first predictive model that machine learning beginners learn.
It is called “linear” regression since the relationship between the output (y) and the input variables (X) is assumed to be linear. Or we can say, the mapping functions f are linear predictor functions.
This algorithm is widely used in both the industry and academia since it’s simple and interpretable with well-studied theories.
The goal is to fit the linear model while minimizing a cost function, which determines how well the linear equation fits the training data. The most used cost function is Mean Squared Error (MSE).
Further Reading: Linear Regression in Machine Learning: Practical Python Tutorial
Logistic Regression
Logistic regression is an algorithm used for classification problems that gives the probability of a particular class. The class could be binary such as fraud/legit or multiple classes such as win/tie/loss.
Similar to linear regression, it is called “logistic” regression since logistic functions are used to model the variables.

It is often the first predictive classification model that machine learning beginners learn. Logistic regression is also widely used in both industry and academia due to its simplicity and interpretability.
Like linear regression, logistic regression also consists of a cost function, which is usually cross entropy.
Further Reading: Logistic Regression for Machine Learning: complete Tutorial
Lasso and Ridge Regression
Next, let’s look at linear/logistic regression’s variation – Lasso and Ridge regression. They are linear or logistic regression with added penalty terms/regularization in the cost function.
Regularization is used to reduce model complexity to deal with overfitting and multicollinearity problems. With the penalty term, the model should provide better predictions on new data inputs.
Lasso and Ridge regressions are different by the added penalty terms.
Note: the same regularization techniques can be applied to other machine learning algorithms as well.
Related article: How to Improve Sports Betting Odds — Step by Step Guide in Python
Sports betting could be more than using your gut feeling. This guide shows you the step by step ridge regression to sports bet smarter using Python.
Decision Tree
Decision tree learning is an algorithm that consists of nodes in a tree-like structure. Within each internal node, there is a decision function to determine the next path to take. For each observation, the prediction of the output or decision is made at a terminal node/leaf.

This algorithm can be used for both regression and classification. It’s both easy to visualize and non-parametric, which makes the model easy to interpret and implement.
Unlike other models, we don’t need to scale the input data, which makes it easier to use.
Further Reading: Decision Tree Model in Machine Learning: Practical Tutorial with Python
Random Forest
Random forest consists of an ensemble of decision trees. It is one of the most popular and powerful machine learning algorithms.
Multiple decision trees are trained independently within the algorithm, with each one randomly distinct from the others. Each of the trees predicts a data point. Then these predictions are aggregated together using specific methods (e.g., average) to form a final prediction.
Usually, the final prediction is more accurate than each tree. But the complexity of the model also makes it harder to interpret compared to decision trees.
Like decision trees, we don’t need to scale the input data.
Further Reading: Unlocking Random Forest in Machine Learning
Gradient Boosting
Gradient boosting is also an ensemble of models. It is a boosting method that combines weak prediction models into a single strong one in an iterative way.
This ensembling method often involves decision trees. But unlike random forests, the trees are not trained independently, but rather sequentially within Gradient tree boosting. The training data for the next tree depends on the output of the previous tree in the sequence. The trees are trained to minimize a cost function, with Mean Squared Error (MSE) and cross entropy being the most popular.
Gradient tree boosting is generally more accurate than random forests. It is the most successful algorithm for Kaggle competitions with structured datasets.
One disadvantage of this method is its difficulty in interpretation.
Related article: Hyperparameter Tuning with Python: XGBoost Step-by-Step Guide
Gradient boosting has many hyperparameters, which makes it harder to tune. Check out this practical guide to improve your model’s performance, learn how to use this machine learning technique with XGBoost example.
Neural Networks
Neural Networks is an algorithm inspired by the biological neural networks within animal brains.
Within the networks, a collection of connected units or nodes (neurons) is aggregated into layers. Each layer may perform a different transformation on its inputs.

Within each node, the algorithm has multiple linear/logistic regressions stacked to transform the input to output. During this process, non-linear activation functions are also applied to the linear equations to introduce complexity into the equation.
The objective of the algorithm is to find the equation that best fits the training data. The judgment of well fit is also determined by a cost function like linear/logistic regression.
There are different popular neural network models, such as multilayer perceptron (MLP), long short-term memory (LSTM), convolutional neural network (CNN). The CNNs are the best for computer vision problems, while LSTMs are best for natural language processing (NLP) problems.
Related articles:
How to do Sentiment Analysis with Deep Learning (LSTM Keras)
A tutorial showing an example of sentiment analysis: learn how to build a deep learning model to classify the reviews data in Python step-by-step.
3 Steps to Time Series Forecasting: LSTM with TensorFlow Keras
A machine learning time series analysis example with Python. See how to transform the dataset and fit LSTM with the TensorFlow Keras model.
Hyperparameter Tuning with Python: Keras Step-by-Step Guide
Neural Networks have many hyperparameters, which makes it harder to tune. This is a practical guide to Hyperparameter Tuning with Keras and Tensorflow in Python. Read on to implement this machine learning technique to improve your model’s performance.
K-Means Clustering
k-means clustering is usually the first unsupervised learning algorithm for machine learning beginners to know.
The algorithm assigns each observation into one of the k clusters, with k being the parameter chosen by us. The algorithm aims to find the clusters that result in the lowest within-cluster sum of squares.
Principal Component Analysis (PCA)
Principal component analysis (PCA) is the most popular dimensionality reduction algorithm.
Given a dataset with N input variables/features, we can apply PCA and transform it into a dataset with n features, where n < N.
The algorithm uses a set of n linear functions called principal components and “displays” the data in terms of these functions.
If there is a lot of collinearity/correlation among the features, we would be able to reduce the dimensionality of the dataset without losing much information.
Isolation Forests
Isolation forest is commonly used for anomaly or outlier detection on unsupervised data.
All we need to do is to specify a proportion of outliers in the dataset, and the algorithm will separate the data into two groups: normal and outliers.
The algorithm uses a similar idea to random forests, which is an ensemble of trees trained and worked together to make better predictions. The main difference is that the isolation forest uses isolation trees that are easier to create than decision trees.
Related article: How to apply Unsupervised Anomaly Detection on bank transactions
This is a practical example of unsupervised learning of anomaly (outlier) detection. Learn how to apply the algorithms with a step-by-step guide in Python.
You’ve made it! Hope you got a good overview of machine learning and its algorithm types.
With a good foundation, keep learning and dig deeper!
Leave a comment for any questions you may have or anything else.
Related “Break into Data Science” resources:
How to Learn Data Science Online: ALL You Need to Know
A detailed review of resources online, including courses, books, free tutorials, portfolios building, and more.
Python crash course: breaking into Data Science
A FREE Python online course, beginner-friendly tutorial. Start your successful data science career journey.
What are the In-Demand Skills for Data Scientists in 2020
Why Python, SQL, Machine Learning are the most in-demand skills for data science.
SQL Tutorial for Beginners: Learn SQL for Data Analysis
An ultimate tutorial to learn SQL for data analysis (from beginner to advanced). Learn & master SQL queries with this practical guide.