In this tutorial, you’ll learn and apply popular Automated Machine Learning (AutoML) tools in Python.
Applying machine learning in real life can be complicated. The process could be time-consuming and resource-intensive, especially challenging for beginners. To simplify this task, we can automate the process largely with free Python AutoML packages.
By following this guide, you’ll learn:
- What is AutoML, and how to use it in Python?
- How to use popular and general Python AutoML libraries:
- H2O
- TPOT
- PyCaret
- AutoGluon
Throughout the guide, you’ll use a time series dataset as an example to try each AutoML tool to find well-performing model pipelines in Python.
If you are interested in learning AutoML to see which tool is best for your need, this practical tutorial will get you started.
Let’s jump in!
How to start using AutoML in Python
What is AutoML in Python?
First of all, why AutoML?
Applying machine learning to solve real-world problems is not easy. It involves many steps to reach a production-ready model. This process can take much effort, even for industry experts. So it is challenging, if not impossible, for machine learning beginners.
Luckily, the demand for machine learning has been increasing dramatically. It has driven the efforts to automate the process to make ML simpler and more approachable. And that’s what AutoML is used for.
Automated Machine Learning (AutoML) is the process of automating machine learning workflows. In an ideal situation, we, as the users, only need to provide a dataset. The AutoML tool should automatically produce good-performing model pipelines for us.

So AutoML should handle tasks like:
- data preprocessing
- algorithm selection
- hyperparameter tuning
- model training
With Python being one of the most common data science languages, there are quite a few AutoML Python libraries that we can use. We’ve reviewed the popular AutoML Python packages. And we want to introduce 4 easy-to-use and relatively up-to-date ones:
- H2O
- TPOT
- PyCaret
- AutoGluon
These Python AutoML tools can help you produce high-performing machine learning models with less thinking and coding. They are not only useful for machine learning beginners but also experienced data scientists.
It could be exciting to just start throwing data into them, but please read the below tips first.
Before using AutoML tools in Python
Even though automating the entire machine learning process sounds attractive and promising, the existing AutoML tools are still limited and require human interventions. So before using these AutoML packages, please make sure you’ve learned the basics of below:
- Python: you still need to know basic Python. It is highly recommended to conduct basic data cleaning before feeding the data into AutoML tools
- machine learning: you still need to have the basic knowledge to run AutoML tools properly and understand the results
Also, we strongly recommend you to follow the below tips to set up AutoML tools in Python:
- follow the installation guide: the AutoML packages often rely on other tools, which could be more complicated to set up than standard Python libraries. It is better to follow their official installation guide
- create virtual environments: the AutoML packages could also be based on different versions of Python and tools. To avoid conflicts, it is better to set up a separate environment for each AutoML tool
All right! I also want to give another tip about expectations for the AutoML tools. You need to budget for a long time to train using these tools. This is because they often consider many choices of preprocessing steps, machine learning algorithms, methods of ensembling, and so on. So it needs to be run for a long enough time (even hours to days) to optimize the results. However, we’ll set a maximum run time for each tool within this tutorial to shorten your learning time.
I know you can’t wait to try the AutoML tools. Next, let’s quickly look at our example dataset and preprocess it.
Preprocess the Example dataset
We’ll use the individual household electric power consumption dataset. It is a time-series recording of a household’s electric power usage between 2006 and 2010.
As mentioned earlier, before using the AutoML packages in Python, I recommend you clean data by yourself. So below is the process of data cleaning and preprocessing.
Further learning: if you have trouble understanding the code below, check out our course Python for Data Analysis with projects. This course shows how to use Python for basic analysis, essential before applying AutoML.
In the end, we have the training and test sets: df_train
and df_test
. The target is electricity_usage
, while the 9 features are below:
- electricity_usage_1hr_lag
- electricity_usage_2hr_lag
- electricity_usage_3hr_lag
- electricity_usage_4hr_lag
- electricity_usage_5hr_lag
- electricity_usage_6hr_lag
- electricity_usage_7hr_lag
- electricity_usage_8hr_lag
- month
We’ll use the previous 8 hours of electricity usage and the month of the year to predict the household’s electricity usage.
Now we are ready to feed the datasets into AutoML tools!
H2O

Intro
H2O is an open-source, in-memory, distributed, scalable and fast machine learning platform. Its core code is written in Java. But we can use it in other languages like Python, Scala, and R. The tool currently supports both supervised and unsupervised learning problems.
As mentioned earlier, please follow the installation guide since the process is not as straightforward as the standard Python libraries.
Train with AutoML
To use H2O in Python, we first initialize a connection between our Python and an H2O local server.
If the connection is successful, we can see a summary of the cluster status like below. I’ve only printed part of the summary since it’s long.
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found. Attempting to start a local H2O server... ; OpenJDK 64-Bit Server VM (build 17.0.1+12-39, mixed mode, sharing) Starting server from C:\Users\liann\anaconda3\Lib\site-packages\h2o\backend\bin\h2o.jar Ice root: C:\Users\liann\AppData\Local\Temp\tmp0t1e8bbx JVM stdout: C:\Users\liann\AppData\Local\Temp\tmp0t1e8bbx\h2o_liann_started_from_python.out JVM stderr: C:\Users\liann\AppData\Local\Temp\tmp0t1e8bbx\h2o_liann_started_from_python.err Server is running at http://127.0.0.1:54321 Connecting to H2O server at http://127.0.0.1:54321 ... successful.
H2O uses its unique objects. So instead of using pandas
DataFrames, we need to convert them to H2OFrame
, a 2D array of uniformly-typed columns. It is similar to the pandas
DataFrame in many ways. You can read more about it here. Then we also identify the target and features.
Next, we can use an H2OAutoML
object to automate our supervised machine learning model training. This object trains several models, and by default cross-validated.
We’ve set a couple of parameters in the argument:
sort_metric='mse'
: set the MSE (mean squared error) as the metric to sort the model performance bymax_runtime_secs=5*60
: specify 5 minutes as the maximum time the process will run forseed=666
: set seed for reproducibility. However, because we’ve set themax_runtime_secs
, this can not guarantee the same results after each run. You can read more about it here
Within the AutoML progress note, you’ll notice it says, “XGBoost is not available; skipping it”. This is because I’m running this in a Windows environment, and XGBoost is not supported on Windows with H2O. You can read more about the limitation here.
Below the progress bar, you’ll see the Model Details about the best-performing model trained in this session.
AutoML progress: | 10:05:12.611: AutoML: XGBoost is not available; skipping it. 10:05:12.624: Step 'best_of_family_xgboost' not defined in provider 'StackedEnsemble': skipping it. 10:05:12.624: Step 'all_xgboost' not defined in provider 'StackedEnsemble': skipping it. ███████████████████████████████████████████████████████████████| (done) 100% Model Details ============= H2OStackedEnsembleEstimator : Stacked Ensemble Model Key: StackedEnsemble_AllModels_3_AutoML_1_20220302_100512 No model summary for this model ModelMetricsRegressionGLM: stackedensemble ** Reported on train data. ** MSE: 0.25770256452650037 RMSE: 0.5076441317758932 MAE: 0.34986002383931386 RMSLE: 0.21355570733121418 R^2: 0.6653029342713153 Mean Residual Deviance: 0.25770256452650037 Null degrees of freedom: 10047 Residual degrees of freedom: 10031 Null deviance: 7737.841923940112 Residual deviance: 2589.395368362276 AIC: 14926.41110344668 ModelMetricsRegressionGLM: stackedensemble ** Reported on cross-validation data. ** MSE: 0.35857040562993847 RMSE: 0.5988074862841466 MAE: 0.4082713452285525 RMSLE: 0.24890166892926036 R^2: 0.5493550817323876 Mean Residual Deviance: 0.35857040562993847 Null degrees of freedom: 34742 Residual degrees of freedom: 34726 Null deviance: 27644.918328643114 Residual deviance: 12457.811602800952 AIC: 62998.89118626651
Print best performing model(s)
We can print the leaderboard if we want to compare the top-performing models.
This returns an H2OFrame
storing the top models and their metrics.
model_id | mse | mean_residual_deviance | rmse | mae | rmsle | training_time_ms | predict_time_per_row_ms | algo |
---|---|---|---|---|---|---|---|---|
StackedEnsemble_AllModels_3_AutoML_1_20220302_100512 | 0.35857 | 0.35857 | 0.598807 | 0.408271 | 0.248902 | 1201 | 0.075261 | StackedEnsemble |
StackedEnsemble_AllModels_4_AutoML_1_20220302_100512 | 0.35865 | 0.35865 | 0.598874 | 0.408348 | 0.248916 | 879 | 0.07141 | StackedEnsemble |
StackedEnsemble_AllModels_2_AutoML_1_20220302_100512 | 0.358774 | 0.358774 | 0.598977 | 0.40858 | 0.248953 | 429 | 0.046865 | StackedEnsemble |
StackedEnsemble_BestOfFamily_3_AutoML_1_20220302_100512 | 0.358876 | 0.358876 | 0.599062 | 0.408973 | 0.249059 | 409 | 0.029082 | StackedEnsemble |
StackedEnsemble_AllModels_1_AutoML_1_20220302_100512 | 0.358881 | 0.358881 | 0.599067 | 0.409069 | 0.248949 | 280 | 0.024984 | StackedEnsemble |
StackedEnsemble_BestOfFamily_2_AutoML_1_20220302_100512 | 0.359382 | 0.359382 | 0.599485 | 0.409906 | 0.249145 | 269 | 0.017429 | StackedEnsemble |
StackedEnsemble_BestOfFamily_1_AutoML_1_20220302_100512 | 0.360317 | 0.360317 | 0.600264 | 0.411528 | 0.249621 | 579 | 0.007317 | StackedEnsemble |
GBM_1_AutoML_1_20220302_100512 | 0.360923 | 0.360923 | 0.600769 | 0.412772 | 0.250035 | 782 | 0.007483 | GBM |
GBM_2_AutoML_1_20220302_100512 | 0.36299 | 0.36299 | 0.602487 | 0.414703 | 0.25077 | 460 | 0.005719 | GBM |
GBM_grid_1_AutoML_1_20220302_100512_model_17 | 0.364255 | 0.364255 | 0.603535 | 0.41623 | 0.251649 | 410 | 0.005731 | GBM |
Calculate metrics
We can also calculate MSE based on the holdout test dataset df_test
. Please note that this is different from the above results since the above calculation was based on the training set.
0.29635466797374066
Plot predicted and actual data comparisons
Lastly, we can also plot and compare the predicted and actual electricity usage for the test set.
You can see how closely the prediction follows the actual target.

TPOT

Intro
TPOT (Tree-based Pipeline Optimization Tool) is a Python Automated Machine learning tool based on the popular machine learning package scikit-learn
. It automates the process, including feature engineering, model selection, parameter optimization. The tool follows a technique called genetic programming, which applies operations similar to natural genetic processes to evolve programs.
Please follow its official installation guide to set it up.
Train with AutoML
TPOT is designed to be as similar as scikit-learn
, so you may find it easier to use if you are familiar with scikit-learn
.
First, we’ll create an instance of the class TPOTRegressor
since ours is a regression problem. Within the argument, we’ve set some parameters:
generations=10
andpopulation_size=10
: these two parameters are related to the genetic programming. In general, the higher these numbers, the better TPOT can work. But we’ve set them to be lower than their default values of 100 to simplify the processverbosity=2
: this determines how much information TPOT prints out while it’s running. 2 means it will print more information as well as provide a progress barscoring='neg_mean_squared_error'
: this is the function used to evaluate the quality of a pipeline. We’ll use the negative mean squared error, which is the negative MSEmax_time_mins=5
: limit the optimization time of TPOT to 5 minutesrandom_state = 666
: set the seed of the pseudo-random number generator for reproducibility
Then, we’ll separate the features and target of the training set.
Now we are ready to feed the data into TPOT to optimize a machine learning pipeline/model. The fit
function uses genetic programming with cross-validation to find the optimal pipeline.
You should see the progress bar moving, and when the maximum time is reached, it returns the current best pipeline.

Print best performing model(s)
We can already see the best pipeline from the result above. But we can also export the optimized pipeline as Python code.

Calculate metrics
We can also evaluate the pipeline using the test set.
We’ve set the score as the negative MSE. The MSE would be the positive part of this number below.
-0.2989048361963268
We could verify this by using the sklearn
code below.
0.2989048361963268
Plot predicted and actual data comparisons
In the end, let’s also plot the comparison of the prediction and actual data of the test set.

PyCaret

Intro
PyCaret is a Python library that automates machine learning workflows. It is built based on other Python machine learning libraries, including scikit-learn
, XGBoost
, LightGBM
, etc. The library mainly targets citizen data scientists who prefer low code but is also for other data science users. We can use PyCaret for both supervised and unsupervised learning problems.
Please follow the installation guide to set up PyCaret.
Train with AutoML
We’ll train our dataset with the regression module from PyCaret. PyCaret also has a time series module. But it is still in beta at the time of this article being written, so we are not using it here.
First, we setup
the training environment and create the transformation pipeline. Within the argument, we’ve set the parameter session_id
for reproducibility.
When the setup
function is executed, PyCaret automatically infers the data types in the dataset. So you’ll see the columns in the dataset together with their inferred data types printed. If they are correct, you can press enter to continue the process. If they are not, you can use the numeric_features
, categorical_features
, date_features
parameters in the setup
function to specify them.

Then, the process will begin and print a summary. I won’t show it since it’s long.
Next, we can use the compare_models
function to train and evaluate the model performance based on cross-validation. We’ve set two parameters for its argument:
sort='MSE'
: set MSE (mean squared error) as the sorting criteria of the resultsbudget_time=5
: set the time limit to be 5 minutes
This prints out a list of models and their scores. I’ve only printed the top 5 to save space.
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | TT (Sec) | |
---|---|---|---|---|---|---|---|---|
lightgbm | Light Gradient Boosting Machine | 0.4159 | 0.3645 | 0.6036 | 0.5357 | 0.2523 | 0.5605 | 0.2220 |
catboost | CatBoost Regressor | 0.4158 | 0.3656 | 0.6045 | 0.5343 | 0.2526 | 0.5545 | 3.8820 |
gbr | Gradient Boosting Regressor | 0.4202 | 0.3675 | 0.6061 | 0.5318 | 0.2540 | 0.5726 | 2.1120 |
rf | Random Forest Regressor | 0.4232 | 0.3745 | 0.6118 | 0.5229 | 0.2571 | 0.5804 | 4.8620 |
xgboost | Extreme Gradient Boosting | 0.4262 | 0.3837 | 0.6193 | 0.5112 | 0.2589 | 0.5683 | 2.0300 |
Print best performing model(s)
To look at the best-performing model, we can print it out.
LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0, importance_type='split', learning_rate=0.1, max_depth=-1, min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31, objective=None, random_state=666, reg_alpha=0.0, reg_lambda=0.0, silent='warn', subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
Calculate metrics
To calculate the MSE metric on the test set, we can use the mean_squared_error
function from sklearn
.
0.2978445058103306
Plot predicted and actual data comparisons
And lastly, let’s also plot to compare the actual and predicted data.

AutoGluon

Intro
AutoGluon is an AutoML tool that works not only for tabular data but also for text and images. It focuses on automated stack ensembling, deep learning, etc. It seems only to cover supervised learning problems.
Please follow the installation guide to set up AutoGluon.
Train with AutoML
We’ll use the tabular
module for our example. Within the TabularPredictor
, we set the eval_metric
to be the mean squared error. Then, we can use the fit
function to use AutoGluon to train models. We also set the time_limit
to be 5 minutes.
When it’s running, you should be seeing the process and its summary printed. I won’t show it since it’s long.
Print best performing model(s)
We can use the leaderboard
function to see the models and their information.
Here are the top 5 models.
model | score_val | pred_time_val | fit_time | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|
0 | WeightedEnsemble_L2 | -0.348620 | 0.303443 | 287.461536 | 0.002577 | 0.276444 | 2 | True | 12 |
1 | LightGBMXT | -0.351087 | 0.010267 | 1.051211 | 0.010267 | 1.051211 | 1 | True | 3 |
2 | LightGBM | -0.351415 | 0.006418 | 0.353956 | 0.006418 | 0.353956 | 1 | True | 4 |
3 | CatBoost | -0.353631 | 0.005425 | 4.793479 | 0.005425 | 4.793479 | 1 | True | 6 |
4 | XGBoost | -0.353716 | 0.009078 | 1.165040 | 0.009078 | 1.165040 | 1 | True | 9 |
Calculate metrics
We can use the evaluate
function to see the metrics for our test set.
You can see metrics, including the negative mean squared error being printed.
Evaluation: mean_squared_error on test data: -0.29892013448684046 Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value. Evaluations on test data: { "mean_squared_error": -0.29892013448684046, "root_mean_squared_error": -0.546735890981048, "mean_absolute_error": -0.41346365122169965, "r2": 0.4071766623884153, "pearsonr": 0.6835849536261334, "median_absolute_error": -0.30548684794108083 }
Plot predicted and actual data comparisons
And we can also plot to compare the actual and predicted values.

Other AutoML Python Tools
Besides these 4 libraries, there are also other Python AutoML tools. We’ve excluded them from this guide for the following reasons:
- Auto-Sklearn: the package only explicitly supports the Linux operating system
- HyperOpt-Sklearn: the package is less updated based on their GitHub history
- Google or other cloud services: these often cost money. With that said, you can usually try them for free
Please test them out as you need.
In this tutorial, you’ve learned about popular AuoML tools in Python.
Hope you can try them out to automate your machine learning process.
We’d love to hear from you. Leave a comment for any questions you may have or anything else.