Hyperparameter Tuning with Python: Keras Step-by-Step Guide
 Why and How to use with an example of Keras

Lianne & Justin

Lianne & Justin

hyperparameter tuning lady jumping hyper
Source: Adobe Stock

This article is a complete guide to Hyperparameter Tuning.

In this post, you’ll see:

If you want to improve your model’s performance faster and further, let’s get started!


FAQ: What is and Why Hyperparameter Tuning/Optimization

What are the hyperparameters anyway?

hyperparameter is a parameter whose value is set before the learning process begins.

By contrast, the values of other parameters are derived via training the data.

Wikipedia

For example, Neural Networks has many hyperparameters, including:

  • number of hidden layers
  • number of neurons
  • learning rate
  • activation function
  • and optimizer settings

Why do we care about these hyperparameters?

Because these hyperparameters are crucial to the performance, speed, and quality of the machine learning models. Hence we should optimize them.

Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data. The objective function takes a tuple of hyperparameters and returns the associated loss.

Wikipedia

But these hyperparameters all look complicated. Combining them together results in a higher dimension problem, which is even worse.

How should we choose the values of these hyperparameters?

Often, we choose them based on our experience and through the trial-and-error process. It is very manual and doesn’t guarantee the best for our models.

What are the better methods to tune the hyperparameters?

We need a systematic method to optimize them.

There are basic techniques such as Grid Search, Random Search; also more sophisticated techniques such as Bayesian Optimization, Evolutionary Optimization.

While we are not covering the details of these approaches, take a look at Wikipedia or this YouTube video for details.

Now let’s see hyperparameter tuning in action step-by-step.

Related article: What is the Coronavirus Death Rate with Hyperparameter Tuning


Step #1: Preprocessing the Data

Within this post, we use the Russian housing dataset from Kaggle. The goal of this project is to predict housing price fluctuations in Russia. We are not going to find the best model for it but will only use it as an example.

Before we start building the model, let’s take a look at it.

kaggle russin housing dataset example

To prepare the data df for modeling demonstration, we process it only lightly by:

  • separating the target log(price_doc) (the logarithm of the housing price), as y, from the rest of the numeric features, as X.
    We are only going to use numeric features in this example.
  • imputing the missing value with the median for numeric features.
  • splitting the X and y further to training and testing datasets.
  • scaling the features for both training and testing datasets.
    If not transformed, the dataset can’t fit neural networks.

Now we have a new training dataset X_train_scaled with 90% data from the original dataset.

Related article: Data Cleaning in Python: the Ultimate Guide (2020)
In this previous post, we explored data cleaning techniques using this same dataset.

Step #2: Defining the Objective for Optimization

Before starting the tuning process, we must define an objective function for hyperparameter optimization.

We are going to use Tensorflow Keras to model the housing price. It is a deep learning neural networks API for Python.

First, we need to build a model get_keras_model. This function defines the multilayer perceptron (MLP), which is the simplest deep learning neural network. An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer.

Then based on the model, we create the objective function keras_mlp_cv_score as below:

  • The key inputs parameterization include the hyperparameters of MLP that will be tuned:
    – num_hidden_layers
    – neurons_per_layer
    – dropout_rate
    – activation
    – optimizer
    – learning_rate
    – batch_size
  • MSE (Mean Squared Error) is used as the score/loss function that will be minimized for hyperparameter optimization.
  • And we also use Cross-Validation to calculate the score (MSE) for a given set of hyperparameter values.

For any set of given hyperparameter values, this function returns the mean and standard deviation of the score (MSE) based on cross-validation.

You can see the details in the Python code below.

Based on our own experience, we also set the limits for the values of hyperparameters that will be tuned.

There is NO consensus method of choosing these limits. You need to understand these hyperparameters and make decisions.

Step #3: Choosing the Package: Ax

In our previous article (What is the Coronavirus Death Rate with Hyperparameter Tuning), we applied hyperparameter tuning using the hyperopt package. Although it is a popular package, we found it clunky to use and also lacks good documentation.

We’ve been looking for other packages and finally found Ax (Adaptive Experimentation Platform).

hyperparameter tuning ax platform package

Ax is a new platform that helps to optimize any kind of experiment, including machine learning, A/B tests, and simulations. It was developed by Facebook and is part of the Facebook Open Source projects now.

We picked it for its:

  • built-in feature that enables saving results to a JSON file or a MySQL database.
  • supports of dependent parameter constraints.
    For example, we can set the limits of parameter m and n to 1 < m < 10, 0 < n < 10, m*n > 10. While most other packages don’t support the m*n > 10 condition.
  • good visualization function.
  • decent documentation.
    It is not all clear but better than other packages such as hyperopt.

Ax also has three different APIs (usage modes) for hyperparameter tuning:

  • Loop API is the simplest to use. But it doesn’t allow enough customization.
  • Service API allows more control, especially over the trials than Loop API. This control is valuable since:
    – You can schedule a trial to happen at a different time, or even parallel to other trials.
    – You can also save progress after each trial, instead of waiting for all trials to finish to see the results.
  • Developer API allows the most control but requires the most knowledge, which is not easy since:
    – the documentation for the features isn’t all clear.
    – the example on the website doesn’t show how you can get the best parameters after the optimization.
    – you often have to look through the API to complete the process.

Because of the limitations of Loop API and the lack of clear examples for Developer API, we are going to use the Service API.

Related article: What is the Coronavirus Death Rate with Hyperparameter Tuning

Step #4: Optimizing/Tuning the Hyperparameters

Finally, we can start the optimization process.

Within the Service API, we don’t need much knowledge of Ax data structure. So we can just follow its sample code to set up the structure.

We create the experiment keras_experiment with the objective function and hyperparameters list built previously.

After running the above code, you’ll see the logging note below.

hyperparameter tuning keras ax log note Sobol search

As mentioned earlier, there are different methods of searching for optimal values of hyperparameters. According to the Ax_client documentation for generation_strategy, the approach is intelligently chosen based on properties of search space if not choosing explicitly. As you can see from the note, the Sobol, a type of Uniform Search, is selected in this exercise by Ax_client.

Next, let’s run the experiment trials to evaluate the different hyperparameter values.

Since this project is for demonstration only, we run a small number of 25 trials.

Each trial evaluates the possible combinations of hyperparameter values and spits out the scores output from the keras_mlp_cv_score function. The Ax_client keeps track of the history of parameters and scores, and makes an intelligent guess of the next better set of parameters.

As the code runs, you can see the logging note for each trial pop up as well.

Step #5: Printing/Visualizing the Results

Printing the Results

To look at the results as a table, you can use the below code to print them out.

hyperparameter tuning keras results scores

To look at the best values of hyperparameters, we use the below code.

hyperparameter tuning keras best parameters

Visualizing the Results

Besides printing the number, you can also visualize the results.

We can plot the evolution of the keras_cv score over iterations. You can see that the score reaches the minimum value after 2 iterations/trials.

hyperparameter tuning keras score performance over trials

Evaluating the Results

Plus, we can use this model to make predictions on the test dataset.

The below Python code prints out the MSE of the test results being 0.30, which is close to the MSE from the training dataset.

Step #6: Saving the Results – Optional

As mentioned before, Ax also allows us to save the process to JSON file.

This is convenient when we want to pause and resume the process at a later time.


That’s it. You did it! Leave a comment if you have any questions.

Read Hyperparameter Tuning with Python: Complete Step-by-Step Guide if you want to see an example with XGBoost in Python.

Before you leave, don’t forget to sign up for the Just into Data newsletter! Or connect with us on Twitter, Facebook.
So you won’t miss any new data science articles from us!

Twitter
LinkedIn
Facebook
Email
Lianne & Justin

Lianne & Justin

16 thoughts on “Hyperparameter Tuning with Python: Keras Step-by-Step Guide<br /><div style='color:#7A7A7A;font-size: large;font-family:roboto;font-weight:400;'> Why and How to use with an example of Keras</div>”

  1. Thank you for the tutorial. It looks great.
    The question I have is – when choosing the “num_neurons_per_layer”, is it the same for all hidden layers? How do I know the # neurons per each layer?
    Thanks in advance!

    1. Hi Tarun,

      In this example the num_neurons_per_layer is the same for each layer. If you want to choose the number of neurons for each layer, you’ll need to specify each of them as a hyperparameter. Or if there is a pattern of # of neurons for each layer when you know the initial layer you could work off of that too.

      Let us know if you have any other questions.
      Thanks for reading!

  2. Sergio Calderón

    Thank you for sharing with us this topic. I would like to know how many trails has the total experiment.

    Thank you

  3. This was extremely helpful. Thanks so much Lianne and Justin. I have implemented several hyperparameter tunning scripts, however, this one by far was the best.
    Only thing I’d add is that people need to ensure their data sets are accurate or else they could run into issues. For example, “X_train_scaled” is scaled and transformed to a NumPy array before feeding into the “keras_mlp_cv_score” function, however, “y_train” and “y_test” are not, hence “.value” function to transform it to an array. The “.value” attribute should be removed if the arrays have already been transformed into NumPy arrays by “keras_mlp_cv_score”.

  4. Hello!

    Thanks for the tutorial, it is super helpful!

    However I noticed that each time I run the hpo I get different ‘best parameters’ for the same data set and without essentially changing anything. Is there something I can do to ensure reproducibility in the results?

    Thanks!

    1. Lianne & Justin

      Hi Georgia,

      We believe you would have to set a random seed for both the neural network as well as the Ax experiment. Try looking at the keras and Ax documentation to see how to set it.

      Thanks.

  5. Hi there, I tried to run this code to learn more about hyperparameter optimization. But It came up with an error and It doesn’t seem like there is not much information on this. ax_client.create_experiment(name=”keras_experiment”,
    parameters=parameters,
    objective_name=’keras_cv’,
    minimize=True)

    /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ax/service/utils/instantiation.py in parameter_from_json(representation)
    176 “‘str’ or ‘bool’).”
    177 )
    –> 178 unexpected_keys = set(representation.keys()) – EXPECTED_KEYS_IN_PARAM_REPR
    179 if unexpected_keys:
    180 raise ValueError(

    AttributeError: ‘str’ object has no attribute ‘keys’

    It looks like some error with the dictionary, and note that this is just in the create_experiment function.

    Any help is much appreciated.

    1. Lianne & Justin

      Hi vvahik,

      How does your parameters variable look like? The error is saying that it is expecting a dictionary but is getting a string.

      Thanks.

  6. Ioanna Karageorgou

    Hi, I tried your code for hyperparameter tuning and I’m getting the following error when I try to run:
    for i in range(20):
    parameters, trial_index = ax_client.get_next_trial()
    ax_client.complete_trial(trial_index=trial_index, raw_data=evaluate(parameters))

    Error:
    [INFO 06-18 09:11:35] ax.service.ax_client: Generated new trial 0 with parameters {‘learning_rate’: 0.5, ‘dropout_rate’: 0.02, ‘lstm_units’: 1, ‘neurons_dense’: 257, ‘num_epochs’: 2, ‘batch_size’: 13, ’embedding_size’: 234, ‘max_text_len’: 97}.
    WARNING:tensorflow:
    The following Variables were used a Lambda layer’s call (tf.compat.v1.nn.embedding_lookup), but
    are not present in its tracked objects:

    It is possible that this is intended behavior, but it is more likely
    an omission. This is a strong indication that this layer should be
    formulated as a subclassed Layer rather than a Lambda layer.
    —————————————————————————
    TypeError Traceback (most recent call last)
    in ()
    4 for i in range(20):
    5 parameters, trial_index = ax_client.get_next_trial()
    —-> 6 ax_client.complete_trial(trial_index=trial_index, raw_data=evaluate(parameters))

    16 frames
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/keras_tensor.py in __array__(self)
    253 def __array__(self):
    254 raise TypeError(
    –> 255 ‘Cannot convert a symbolic Keras input/output to a numpy array. ‘
    256 ‘This error may indicate that you\’re trying to pass a symbolic value ‘
    257 ‘to a NumPy call, which is not supported. Or, ‘

    TypeError: Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you’re trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.

    Any ideas what might help?
    Thank you

    1. Lianne & Justin

      Hi Ioanna,

      We’re not too sure about this error. But it looks like it’s more to do with your TF model code rather than the Hyperparameter tuning code. Maybe try to see if your model code works just for those parameters that Ax was trying to do.

      Thanks.

    1. Lianne & Justin

      Hi Edward,

      According to the documentation AxClient intelligently chooses a strategy. If it chose SOBOL for you, that probably means you have more options for choice variables than the number of continuous variables. Try reducing the number of choice options or increasing the number of continuous variables. If you still want to try GP, you’ll have to set the generation_strategy parameter when creating AxClient.

      Thanks.

Leave a Comment

Your email address will not be published. Required fields are marked *

More recent articles

Scroll to Top

Learn Python for Data Analysis

with a practical online course

lectures + projects

based on real-world datasets

We use cookies to ensure you get the best experience on our website.  Learn more.