We all enjoy building machine learning or statistical models. But, one important step that’s often left out is Hyperparameter Tuning.
In this article, you’ll see:
- why you should use this machine learning technique.
- how to use it with XGBoost step-by-step with Python.
This article is a companion of the post Hyperparameter Tuning with Python: Keras Step-by-Step Guide. To see an example with Keras, please read the other article.
If you want to improve your model’s performance faster and further, let’s dive right in!
FAQ: What is and Why Hyperparameter Tuning/Optimization
What are the hyperparameters anyway?
A hyperparameter is a parameter whose value is set before the learning process begins.
By contrast, the values of other parameters are derived via training the data.Wikipedia
For example, Neural Networks has many hyperparameters, including:
- number of hidden layers
- number of neurons
- learning rate
- activation function
- and optimizer settings
Why do we care about these hyperparameters?
Because these hyperparameters are crucial to the performance, speed, and quality of the machine learning models. Hence we should optimize them.
Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data. The objective function takes a tuple of hyperparameters and returns the associated loss.Wikipedia
But these hyperparameters all look complicated. Combining them together results in a higher dimension problem, which is even worse.
How should we choose the values of these hyperparameters?
Often, we choose them based on our experience and through the trial-and-error process. It is very manual and doesn’t guarantee the best for our models.
What are the better methods to tune the hyperparameters?
We need a systematic method to optimize them.
There are basic techniques such as Grid Search, Random Search; also more sophisticated techniques such as Bayesian Optimization, Evolutionary Optimization.
Now let’s see hyperparameter tuning in action step-by-step.
Related article: What is the Coronavirus Death Rate with Hyperparameter Tuning
Step #1: Preprocessing the Data
Within this post, we use the Russian housing dataset from Kaggle. The goal of this project is to predict housing price fluctuations in Russia. We are not going to find the best model for it but will only use it as an example.
Before we start building the model, let’s take a look at it.
To prepare the data df for modeling demonstration, we process it only lightly by:
- separating the target log(price_doc) (the logarithm of the housing price), as y, from the rest of the numeric features, as X.
We are only going to use numeric features in this example.
- imputing the missing value with the median for numeric features.
- splitting the X and y further to training and testing datasets.
Now we have a new training dataset dtrain with 90% data from the original dataset.
Related article: Data Cleaning in Python: the Ultimate Guide (2020)
In this previous post, we explored data cleaning techniques using this same dataset.
Step #2: Defining the Objective for Optimization
Before starting the tuning process, we must define an objective function for hyperparameter optimization.
We are going to use XGBoost to model the housing price. It is a popular optimized distributed library, which implements machine learning algorithms under the Gradient Boosting framework.
So we create the objective function xgboost_cv_score_ax as below:
- The key inputs p_names include the main hyperparameters of XGBoost that will be tuned.
- RMSE (Root Mean Square Error) is used as the score/loss function that will be minimized for hyperparameter optimization.
- And we also use K-Fold Cross Validation to calculate the score (RMSE) for a given set of hyperparameter values.
For any set of given hyperparameter values, this function returns the mean and standard deviation of the score (RMSE) from the 7-Fold cross-validation.
You can see the details in the Python code below.
By reading the XGBoost documentation, we also set the limits for the values of hyperparameters that will be tuned.
There is NO consensus method of choosing these limits. You need to understand these hyperparameters and make decisions.
Step #3: Choosing the Package: Ax
In our previous article (What is the Coronavirus Death Rate with Hyperparameter Tuning), we applied hyperparameter tuning using the hyperopt package. Although it is a popular package, we found it clunky to use and also lacks good documentation.
We’ve been looking for other packages and finally found Ax (Adaptive Experimentation Platform).
Ax is a new platform that helps to optimize any kind of experiment, including machine learning, A/B tests, and simulations. It was developed by Facebook and is part of the Facebook Open Source projects now.
We picked it for its:
- built-in feature that enables saving results to a JSON file or a MySQL database.
- supports of dependent parameter constraints.
For example, we can set the limits of parameter m and n to 1 < m < 10, 0 < n < 10, m*n > 10. While most other packages don’t support the m*n > 10 condition.
- good visualization function.
- decent documentation.
It is not all clear but better than other packages such as hyperopt.
Ax also has three different APIs (usage modes) for hyperparameter tuning:
- Loop API is the simplest to use. But it doesn’t allow enough customization.
- Service API allows more control, especially over the trials than Loop API. This control is valuable since:
– You can schedule a trial to happen at a different time, or even parallel to other trials.
– You can also save progress after each trial, instead of waiting for all trials to finish to see the results.
- Developer API allows the most control but requires the most knowledge, which is not easy since:
– the documentation for the features isn’t all clear.
– the example on the website doesn’t show how you can get the best parameters after the optimization.
– you often have to look through the API to complete the process.
Because of the limitations of Loop API and the lack of clear examples for Developer API, we are going to use the Service API.
Related article: What is the Coronavirus Death Rate with Hyperparameter Tuning
Step #4: Optimizing/Tuning the Hyperparameters
Finally, we can start the optimization process.
Within the Service API, we don’t need much knowledge of Ax data structure. So we can just follow its sample code to set up the structure.
We create the experiment xgboost_experiment with the objective function and hyperparameters list built previously.
After running the above code, you’ll see the logging note below.
As mentioned earlier, there are different methods of searching for optimal values of hyperparameters. According to the Ax_client documentation for generation_strategy, the approach is intelligently chosen based on properties of search space if not choosing explicitly. As you can see from the note, the Gaussian Process (Sobol+GPEI), which belongs to Bayesian Optimization, is chosen in this exercise by Ax_client.
Next, let’s run the experiment trials to evaluate the different hyperparameter values.
Since this project is for demonstration only, we run a small number of 25 trials.
Each trial evaluates the possible combinations of hyperparameter values and spits out the scores output from the xgboost_cv_score_ax function. The Ax_client keeps track of the history of parameters and scores, and makes an intelligent guess of the next better set of parameters.
As the code runs, you can see the logging note for each trial pop up as well.
Step #5: Printing/Visualizing the Results
Printing the Results
To look at the results as a table, you can use the below code to print them out.
To look at the best values of hyperparameters, we use the below code.
Why is it not the set of values with the minimum score (xgboost_cv), which is trial_index = 10 from the above table?
Because the Ax_client also considers other factors of the model performance. Below are the optimal values of the hyperparameters (trial_index = 23)
Visualizing the Results
Besides printing the number, you can also visualize the results.
We can plot the evolution of the xgboost_cv score over iterations. You can see that the score reaches the minimum value after 11 iterations/trials.
Or we can print the contour plots showing pairs of hyperparameters.
For example, we print learning_rate and max_depth in the below plot – the lighter the color, the lower the score (xgboost_cv). You can see that the best values of these two hyperparameters coincide with the printed optimal values (learning_rate = 0.287 and max_depth = 47).
Evaluating the Results
Plus, we can use this model to make predictions on the test dataset. The below Python code prints out the RMSE of the test results being 0.489, which is close to the RMSE from the training dataset.
Step #6: Saving the Results – Optional
As mentioned before, Ax also allows us to save the process to JSON file.
This is convenient when we want to pause and resume the process at a later time.
That’s it. You did it! Leave a comment if you have any questions.
Read Hyperparameter Tuning with Python: Keras Step-by-Step Guide if you want to see an example with Keras and Tensorflow in Python.