This article is a complete guide to Hyperparameter Tuning.
In this post, you’ll see:
- why you should use this machine learning technique.
- how to use it with Keras (Deep Learning Neural Networks) and Tensorflow with Python.
This article is a companion of the post Hyperparameter Tuning with Python: Complete Step-by-Step Guide. To see an example with XGBoost, please read the previous article.
If you want to improve your model’s performance faster and further, let’s get started!
FAQ: What is and Why Hyperparameter Tuning/Optimization
What are the hyperparameters anyway?
A hyperparameter is a parameter whose value is set before the learning process begins.
By contrast, the values of other parameters are derived via training the data.Wikipedia
For example, Neural Networks has many hyperparameters, including:
- number of hidden layers
- number of neurons
- learning rate
- activation function
- and optimizer settings
Why do we care about these hyperparameters?
Because these hyperparameters are crucial to the performance, speed, and quality of the machine learning models. Hence we should optimize them.
Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data. The objective function takes a tuple of hyperparameters and returns the associated loss.Wikipedia
But these hyperparameters all look complicated. Combining them together results in a higher dimension problem, which is even worse.
How should we choose the values of these hyperparameters?
Often, we choose them based on our experience and through the trial-and-error process. It is very manual and doesn’t guarantee the best for our models.
What are the better methods to tune the hyperparameters?
We need a systematic method to optimize them.
There are basic techniques such as Grid Search, Random Search; also more sophisticated techniques such as Bayesian Optimization, Evolutionary Optimization.
Now let’s see hyperparameter tuning in action step-by-step.
Related article: What is the Coronavirus Death Rate with Hyperparameter Tuning
Step #1: Preprocessing the Data
Within this post, we use the Russian housing dataset from Kaggle. The goal of this project is to predict housing price fluctuations in Russia. We are not going to find the best model for it but will only use it as an example.
Before we start building the model, let’s take a look at it.
To prepare the data df for modeling demonstration, we process it only lightly by:
- separating the target log(price_doc) (the logarithm of the housing price), as y, from the rest of the numeric features, as X.
We are only going to use numeric features in this example.
- imputing the missing value with the median for numeric features.
- splitting the X and y further to training and testing datasets.
- scaling the features for both training and testing datasets.
If not transformed, the dataset can’t fit neural networks.
Now we have a new training dataset X_train_scaled with 90% data from the original dataset.
Related article: Data Cleaning in Python: the Ultimate Guide (2020)
In this previous post, we explored data cleaning techniques using this same dataset.
Step #2: Defining the Objective for Optimization
Before starting the tuning process, we must define an objective function for hyperparameter optimization.
We are going to use Tensorflow Keras to model the housing price. It is a deep learning neural networks API for Python.
First, we need to build a model get_keras_model. This function defines the multilayer perceptron (MLP), which is the simplest deep learning neural network. An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer.
Then based on the model, we create the objective function keras_mlp_cv_score as below:
- The key inputs parameterization include the hyperparameters of MLP that will be tuned:
- MSE (Mean Squared Error) is used as the score/loss function that will be minimized for hyperparameter optimization.
- And we also use Cross-Validation to calculate the score (MSE) for a given set of hyperparameter values.
For any set of given hyperparameter values, this function returns the mean and standard deviation of the score (MSE) based on cross-validation.
You can see the details in the Python code below.
Based on our own experience, we also set the limits for the values of hyperparameters that will be tuned.
There is NO consensus method of choosing these limits. You need to understand these hyperparameters and make decisions.
Step #3: Choosing the Package: Ax
In our previous article (What is the Coronavirus Death Rate with Hyperparameter Tuning), we applied hyperparameter tuning using the hyperopt package. Although it is a popular package, we found it clunky to use and also lacks good documentation.
We’ve been looking for other packages and finally found Ax (Adaptive Experimentation Platform).
Ax is a new platform that helps to optimize any kind of experiment, including machine learning, A/B tests, and simulations. It was developed by Facebook and is part of the Facebook Open Source projects now.
We picked it for its:
- built-in feature that enables saving results to a JSON file or a MySQL database.
- supports of dependent parameter constraints.
For example, we can set the limits of parameter m and n to 1 < m < 10, 0 < n < 10, m*n > 10. While most other packages don’t support the m*n > 10 condition.
- good visualization function.
- decent documentation.
It is not all clear but better than other packages such as hyperopt.
Ax also has three different APIs (usage modes) for hyperparameter tuning:
- Loop API is the simplest to use. But it doesn’t allow enough customization.
- Service API allows more control, especially over the trials than Loop API. This control is valuable since:
– You can schedule a trial to happen at a different time, or even parallel to other trials.
– You can also save progress after each trial, instead of waiting for all trials to finish to see the results.
- Developer API allows the most control but requires the most knowledge, which is not easy since:
– the documentation for the features isn’t all clear.
– the example on the website doesn’t show how you can get the best parameters after the optimization.
– you often have to look through the API to complete the process.
Because of the limitations of Loop API and the lack of clear examples for Developer API, we are going to use the Service API.
Related article: What is the Coronavirus Death Rate with Hyperparameter Tuning
Step #4: Optimizing/Tuning the Hyperparameters
Finally, we can start the optimization process.
Within the Service API, we don’t need much knowledge of Ax data structure. So we can just follow its sample code to set up the structure.
We create the experiment keras_experiment with the objective function and hyperparameters list built previously.
After running the above code, you’ll see the logging note below.
As mentioned earlier, there are different methods of searching for optimal values of hyperparameters. According to the Ax_client documentation for generation_strategy, the approach is intelligently chosen based on properties of search space if not choosing explicitly. As you can see from the note, the Sobol, a type of Uniform Search, is selected in this exercise by Ax_client.
Next, let’s run the experiment trials to evaluate the different hyperparameter values.
Since this project is for demonstration only, we run a small number of 25 trials.
Each trial evaluates the possible combinations of hyperparameter values and spits out the scores output from the keras_mlp_cv_score function. The Ax_client keeps track of the history of parameters and scores, and makes an intelligent guess of the next better set of parameters.
As the code runs, you can see the logging note for each trial pop up as well.
Step #5: Printing/Visualizing the Results
Printing the Results
To look at the results as a table, you can use the below code to print them out.
To look at the best values of hyperparameters, we use the below code.
Visualizing the Results
Besides printing the number, you can also visualize the results.
We can plot the evolution of the keras_cv score over iterations. You can see that the score reaches the minimum value after 2 iterations/trials.
Evaluating the Results
Plus, we can use this model to make predictions on the test dataset.
The below Python code prints out the MSE of the test results being 0.30, which is close to the MSE from the training dataset.
Step #6: Saving the Results – Optional
As mentioned before, Ax also allows us to save the process to JSON file.
This is convenient when we want to pause and resume the process at a later time.
That’s it. You did it! Leave a comment if you have any questions.
Read Hyperparameter Tuning with Python: Complete Step-by-Step Guide if you want to see an example with XGBoost in Python.