Hyperparameter Tuning with Python: Keras Step-by-Step Guide
 Why and How to use with an example of Keras

Lianne & Justin

Lianne & Justin

Share on twitter
Share on linkedin
Share on facebook
Share on email
hyperparameter tuning lady jumping hyper
Source: Adobe Stock

This article is a complete guide to Hyperparameter Tuning.

In this post, you’ll see:

If you want to improve your model’s performance faster and further, let’s get started!


FAQ: What is and Why Hyperparameter Tuning/Optimization

What are the hyperparameters anyway?

hyperparameter is a parameter whose value is set before the learning process begins.

By contrast, the values of other parameters are derived via training the data.

Wikipedia

For example, Neural Networks has many hyperparameters, including:

  • number of hidden layers
  • number of neurons
  • learning rate
  • activation function
  • and optimizer settings

Why do we care about these hyperparameters?

Because these hyperparameters are crucial to the performance, speed, and quality of the machine learning models. Hence we should optimize them.

Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data. The objective function takes a tuple of hyperparameters and returns the associated loss.

Wikipedia

But these hyperparameters all look complicated. Combining them together results in a higher dimension problem, which is even worse.

How should we choose the values of these hyperparameters?

Often, we choose them based on our experience and through the trial-and-error process. It is very manual and doesn’t guarantee the best for our models.

What are the better methods to tune the hyperparameters?

We need a systematic method to optimize them.

There are basic techniques such as Grid Search, Random Search; also more sophisticated techniques such as Bayesian Optimization, Evolutionary Optimization.

While we are not covering the details of these approaches, take a look at Wikipedia or this YouTube video for details.

Now let’s see hyperparameter tuning in action step-by-step.

Related article: What is the Coronavirus Death Rate with Hyperparameter Tuning


Step #1: Preprocessing the Data

Within this post, we use the Russian housing dataset from Kaggle. The goal of this project is to predict housing price fluctuations in Russia. We are not going to find the best model for it but will only use it as an example.

Before we start building the model, let’s take a look at it.

kaggle russin housing dataset example

To prepare the data df for modeling demonstration, we process it only lightly by:

  • separating the target log(price_doc) (the logarithm of the housing price), as y, from the rest of the numeric features, as X.
    We are only going to use numeric features in this example.
  • imputing the missing value with the median for numeric features.
  • splitting the X and y further to training and testing datasets.
  • scaling the features for both training and testing datasets.
    If not transformed, the dataset can’t fit neural networks.

Now we have a new training dataset X_train_scaled with 90% data from the original dataset.

Related article: Data Cleaning in Python: the Ultimate Guide (2020)
In this previous post, we explored data cleaning techniques using this same dataset.

Step #2: Defining the Objective for Optimization

Before starting the tuning process, we must define an objective function for hyperparameter optimization.

We are going to use Tensorflow Keras to model the housing price. It is a deep learning neural networks API for Python.

First, we need to build a model get_keras_model. This function defines the multilayer perceptron (MLP), which is the simplest deep learning neural network. An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer.

Then based on the model, we create the objective function keras_mlp_cv_score as below:

  • The key inputs parameterization include the hyperparameters of MLP that will be tuned:
    – num_hidden_layers
    – neurons_per_layer
    – dropout_rate
    – activation
    – optimizer
    – learning_rate
    – batch_size
  • MSE (Mean Squared Error) is used as the score/loss function that will be minimized for hyperparameter optimization.
  • And we also use Cross-Validation to calculate the score (MSE) for a given set of hyperparameter values.

For any set of given hyperparameter values, this function returns the mean and standard deviation of the score (MSE) based on cross-validation.

You can see the details in the Python code below.

Based on our own experience, we also set the limits for the values of hyperparameters that will be tuned.

There is NO consensus method of choosing these limits. You need to understand these hyperparameters and make decisions.

Step #3: Choosing the Package: Ax

In our previous article (What is the Coronavirus Death Rate with Hyperparameter Tuning), we applied hyperparameter tuning using the hyperopt package. Although it is a popular package, we found it clunky to use and also lacks good documentation.

We’ve been looking for other packages and finally found Ax (Adaptive Experimentation Platform).

hyperparameter tuning ax platform package

Ax is a new platform that helps to optimize any kind of experiment, including machine learning, A/B tests, and simulations. It was developed by Facebook and is part of the Facebook Open Source projects now.

We picked it for its:

  • built-in feature that enables saving results to a JSON file or a MySQL database.
  • supports of dependent parameter constraints.
    For example, we can set the limits of parameter m and n to 1 < m < 10, 0 < n < 10, m*n > 10. While most other packages don’t support the m*n > 10 condition.
  • good visualization function.
  • decent documentation.
    It is not all clear but better than other packages such as hyperopt.

Ax also has three different APIs (usage modes) for hyperparameter tuning:

  • Loop API is the simplest to use. But it doesn’t allow enough customization.
  • Service API allows more control, especially over the trials than Loop API. This control is valuable since:
    – You can schedule a trial to happen at a different time, or even parallel to other trials.
    – You can also save progress after each trial, instead of waiting for all trials to finish to see the results.
  • Developer API allows the most control but requires the most knowledge, which is not easy since:
    – the documentation for the features isn’t all clear.
    – the example on the website doesn’t show how you can get the best parameters after the optimization.
    – you often have to look through the API to complete the process.

Because of the limitations of Loop API and the lack of clear examples for Developer API, we are going to use the Service API.

Related article: What is the Coronavirus Death Rate with Hyperparameter Tuning

Step #4: Optimizing/Tuning the Hyperparameters

Finally, we can start the optimization process.

Within the Service API, we don’t need much knowledge of Ax data structure. So we can just follow its sample code to set up the structure.

We create the experiment keras_experiment with the objective function and hyperparameters list built previously.

After running the above code, you’ll see the logging note below.

hyperparameter tuning keras ax log note Sobol search

As mentioned earlier, there are different methods of searching for optimal values of hyperparameters. According to the Ax_client documentation for generation_strategy, the approach is intelligently chosen based on properties of search space if not choosing explicitly. As you can see from the note, the Sobol, a type of Uniform Search, is selected in this exercise by Ax_client.

Next, let’s run the experiment trials to evaluate the different hyperparameter values.

Since this project is for demonstration only, we run a small number of 25 trials.

Each trial evaluates the possible combinations of hyperparameter values and spits out the scores output from the keras_mlp_cv_score function. The Ax_client keeps track of the history of parameters and scores, and makes an intelligent guess of the next better set of parameters.

As the code runs, you can see the logging note for each trial pop up as well.

Step #5: Printing/Visualizing the Results

Printing the Results

To look at the results as a table, you can use the below code to print them out.

hyperparameter tuning keras results scores

To look at the best values of hyperparameters, we use the below code.

hyperparameter tuning keras best parameters

Visualizing the Results

Besides printing the number, you can also visualize the results.

We can plot the evolution of the keras_cv score over iterations. You can see that the score reaches the minimum value after 2 iterations/trials.

hyperparameter tuning keras score performance over trials

Evaluating the Results

Plus, we can use this model to make predictions on the test dataset.

The below Python code prints out the MSE of the test results being 0.30, which is close to the MSE from the training dataset.

Step #6: Saving the Results – Optional

As mentioned before, Ax also allows us to save the process to JSON file.

This is convenient when we want to pause and resume the process at a later time.


That’s it. You did it! Leave a comment if you have any questions.

Read Hyperparameter Tuning with Python: Complete Step-by-Step Guide if you want to see an example with XGBoost in Python.

Before you leave, don’t forget to sign up for the Just into Data newsletter! Or connect with us on Twitter, Facebook.
So you won’t miss any new data science articles from us!

Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on facebook
Facebook
Share on email
Email
Lianne & Justin

Lianne & Justin

2 thoughts on “Hyperparameter Tuning with Python: Keras Step-by-Step Guide<br /><div style='color:#7A7A7A;font-size: large;font-family:roboto;font-weight:400;'> Why and How to use with an example of Keras</div>”

  1. Thank you for the tutorial. It looks great.
    The question I have is – when choosing the “num_neurons_per_layer”, is it the same for all hidden layers? How do I know the # neurons per each layer?
    Thanks in advance!

    1. Hi Tarun,

      In this example the num_neurons_per_layer is the same for each layer. If you want to choose the number of neurons for each layer, you’ll need to specify each of them as a hyperparameter. Or if there is a pattern of # of neurons for each layer when you know the initial layer you could work off of that too.

      Let us know if you have any other questions.
      Thanks for reading!

Leave a Comment

Your email address will not be published. Required fields are marked *

More recent articles

Scroll to Top
We use cookies to ensure you get the best experience on our website.  Learn more.