# 3 Steps to Time Series Forecasting: LSTM with TensorFlow Keras A Practical Example in Python with useful Tips

#### Lianne & Justin

In this tutorial, we present a deep learning time series analysis example with Python. You’ll see:

• How to preprocess/transform the dataset for time series forecasting.
• How to handle large time series datasets when we have limited computer memory.
• How to fit Long Short-Term Memory (LSTM) with TensorFlow Keras neural networks model.
• And More.

If you want to analyze large time series dataset with machine learning techniques, you’ll love this guide with practical tips.

Let’s begin now!

The dataset we are using is the Household Electric Power Consumption from Kaggle. It provides measurements of electric power consumption in one household with a one-minute sampling rate.

There are 2,075,259 measurements gathered within 4 years. Different electrical quantities and some sub-metering values are available. But we’ll only focus on three features:

• Date: date in format dd/mm/yyyy
• Time: time in format hh:mm:ss
• Global_active_power: household global minute-averaged active power (in kilowatt)

In this project, we will predict the amount of Global_active_power 10 minutes ahead.

## Step #1: Preprocessing the Dataset for Time Series Analysis

To begin, let’s process the dataset to get ready for time series analysis.

We transform the dataset df by:

• creating feature date_time in DateTime format by combining Date and Time.
• converting Global_active_power to numeric and remove missing values (1.25%).
• ordering the features by time in the new dataset.

Now we have a dataset df as below.

Next, we split the dataset into training, validation, and test datasets.

df_test holds the data within the last 7 days in the original dataset. df_val has data 14 days before the test dataset. df_train has the rest of the data.

Related article: Time Series Analysis, Visualization & Forecasting with LSTM
But practically, we want to forecast over a more extended period, which we’ll do in this article.

## Step #2: Transforming the Dataset for TensorFlow Keras

Before we can fit the TensorFlow Keras LSTM, there are still other processes that need to be done.

Let’s deal with them little by little!

### Dividing the Dataset into Smaller Dataframes

As mentioned earlier, we want to forecast the Global_active_power that’s 10 minutes in the future.

The graph below visualizes the problem: using the lagged data (from t-n to t-1) to predict the target (t+10).

It is not efficient to loop through the dataset while training the model. So we want to transform the dataset with each row representing the historical data and the target.

In this way, we only need to train the model using each row of the above matrix.

Now here comes the challenges:

• How do we convert the dataset to the new structure?
• How do we handle this larger new data structure when our computer memory is limited?

As a result, the function create_ts_files is defined:

• to convert the original dataset to the new dataset above.
• at the same time, to divide the new dataset into smaller files, which is easier to process.

Within this function, we define the following parameters:

• start_index: the earliest time to be included in all the historical data for forecasting.
In this practice, we want to include history from the very beginning, so we set the default of it to be 0.
• end_index: the latest time to be included in all the historical data for forecasting.
In this practice, we want to include all the history, so we set the default of it to be None.
• history_length: this is n mentioned earlier, which is the number of timesteps to look back for each forecasting.
• step_size: the stride of the history window.
Global_active_power doesn’t change fast throughout time. So to be more efficient, we can let step_size = 10. In this way, we downsample to use every 10 minutes of data in the past to predict the future amount. We are only looking at t-1, t-11, t-21 until t-n to predict t+10.
• target_step: the number of periods in the future to predict.
As mentioned earlier, we are trying to predict the global_active_power 10 minutes ahead. So this feature = 10.
• num_rows_per_file: the number of records to put in each file.
This is necessary to divide the large new dataset into smaller files.
• data_folder: the one single folder that will contain all the files.

That’s a lot of complicated parameters!

In the end, just know that this function creates a folder with files.
And each file contains a pandas dataframe that looks like the new dataset in the chart above.
Each of these dataframes has columns:

• y, which is the target to predict. This will be the value at t + target_step (t + 10).
• x_lag{i}, the value at time t + target_step – i (t + 10 – 11, t + 10 – 21, and so on), i.e., the lagged value compared to y.

At the same time, the function also returns the number of lags (len(col_names)-1) in the dataframes. This number will be required when defining the shape for TensorFlow models later.

Before applying the function create_ts_files, we also need to:

• scale the global_active_power to work with Neural Networks.
• define n, the history_length, as 7 days (7*24*60 minutes).
• define step_size within historical data to be 10 minutes.
• set the target_step to be 10, so that we are forecasting the global_active_power 10 minutes after the historical data.

After these, we apply the create_ts_files to:

• create 158 files (each including a pandas dataframe) within the folder ts_data.
• return num_timesteps as the number of lags.

As the function runs, it prints the name of every 10 files.

The folder ts_data is around 16 GB, and we were only using the past 7 days of data to predict. Now you can see why it’s necessary to divide the dataset into smaller dataframes!

### Defining the Time Series Object Class

In this procedure, we create a class TimeSeriesLoader to transform and feed the dataframes into the model.

There are built-in functions from Keras such as Keras Sequence, tf.data API. But they are not very efficient for this purpose.

Within this class, we define:

• __init__: the initial settings of the object, including:
ts_folder, which will be ts_data that we just created.
filename_format, which is the string format of the file names in the ts_folder.
For example, when the files are ts_file0.pkl, ts_file1.pkl, …, ts_file100.pkl, the format would be ‘ts_file{}.pkl’.
• num_chunks: the total number of files (chunks).
• get_chunk: this method takes the dataframe from one of the files, processes it to be ready for training.
• shuffle_chunks: this method shuffles the order of the chunks that are returned in get_chunk. This is a good practice for modeling.

The definitions might seem a little confusing. But keep reading, you’ll see this object in action within the next step.

After defining, we apply this TimeSeriesLoader to the ts_data folder.

Now with the object tss points to our dataset, we are finally ready for LSTM!

## Step #3: Creating the LSTM Model

Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep learning.

LSTM networks are well-suited to classifying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series.

Wikipedia

As mentioned before, we are going to build an LSTM model based on the TensorFlow Keras library.

We all know the importance of hyperparameter tuning based on our guide. But in this article, we are simply demonstrating the model fitting without tuning.

The procedures are below:

• define the shape of the input dataset:
num_timesteps, the number of lags in the dataframes we set in Step #2.
– the number of time series as 1. Since we are only using one feature of global_active_power.
• define the number of units, 4*units*(units+2) is the number of parameters of the LSTM.
The higher the number, the more parameters in the model.
• define the dropout rate, which is used to prevent overfitting.
• specify the output layer to have a linear activation function.
• define the model.

Then we also define the optimization function and the loss function. Again, tuning these hyperparameters to find the best option would be a better practice.

To take a look at the model we just defined before running, we can print out the summary.

You can see that the output shape looks good, which is n / step_size (7*24*60 / 10 = 1008). The number of parameters that need to be trained looks right as well (4*units*(units+2) = 480).

Let’s start modeling!

We train each chunk in batches, and only run for one epoch. Ideally, you would train for multiple epochs for neural networks.

After fitting the model, we may also evaluate the model performance using the validation dataset.

Same as the training dataset, we also create a folder of the validation data, which prepares the validation dataset for model fitting.

Besides testing using the validation dataset, we also test against a baseline model using only the most recent history point (t + 10 – 11).

The detailed Python code is below.

The validation dataset using LSTM gives Mean Squared Error (MSE) of 0.418. While the baseline model has MSE of 0.428. The LSTM does slightly better than the baseline.

We could do better with hyperparameter tuning and more epochs. Plus, some other essential time series analysis tips such as seasonality would help too.

Related article: Hyperparameter Tuning with Python: Complete Step-by-Step Guide

So you won’t miss any new data science articles from us!

### 19 thoughts on “3 Steps to Time Series Forecasting: LSTM with TensorFlow Keras<br /><div style='color:#7A7A7A;font-size: large;font-family:roboto;font-weight:400;'> A Practical Example in Python with useful Tips</div>”

1. Dmitry Vilenchik

Hi,Lianne
What is ‘num_records’ in the last notebook page?

# reshape for input into LSTM. Batch major format.
features_batchmajor = features_arr.reshape(num_records, -1, 1)
it is not defined.
The method ‘get_chunk’ of TimeSeriesLoader class contains the code for ‘num_records’ internal variable.

Can it do be defined like as
num_records = len(df_val_tc.index)?

Thanks
Dmitry

2. Dear Lianne ,
Thank You for helpful guides. But can you show me how to reduce the dataset. Because it is so big and time-consuming.

1. Lianne & Justin

Hi David,

You can set the history_length to be a lower number. Or you can set step_size to be a higher number. Either one will make the dataset less.

Thanks.

3. Hello Lianne,

How can I print the predicted output ? I am very beginner in this field.

Thanks for sharing.

1. Lianne & Justin

Hi Omar, closer to the end of the article, it shows how to get y_pred, that’s the predicted result you can just call the variable name or print(y_pred).

4. Hello Lianne & Justin,

How can we forecast future for panel (longitudinal) data set? Is it possible you can upload an example how to use tf lstm forecasting unknown future for panel datasets?

Thanks!

1. Lianne & Justin

Hi Mu,

You can probably train the LSTM like any other time series, where each sequence is the measurements of an entity. It should be able to predict the next measurements when given a sequence from an entity.

5. # reshape for input into LSTM. Batch major format.
features_batchmajor = np.array(features).reshape(num_records, -1, 1)
I get an error here that in the reshape function , the third argument is expected to be a String. I am still getting my head around how the reshape function works so please will you help me out here?

1. Hi Ritesh,

We’d need a bit more context around the error that you’re receiving. Because when we run it, we don’t get an error message as you do.

1. No worries. I think it is a pycharm problem. It shows a preemptive error but it runs well. Any tips on how I can save the learnings so that I wont start from zero every time?

6. hello,
In function(), I think it is missing something :
ind0 = i*num_rows_per_file + start_index

7. Hello Lianne, thanks for your brilliant guide!

I´m new to neuronal networks, but got it sucsessfully implemented on my own weatherstation-dataset.

Your are testing using the validation dataset.
Would it be the same procedere for testing the network with df_test?

And if so, shouldn´t the validation data influence the network, while the testing is just for checking?

1. Hi Lukas, thanks for reading our article. That’s true, the validation dataset is used for choosing hyperparameters, while the test set is for checking the performance.

8. Hi,

thank you for the insightful article.

I just want to confirm, when you talk about prediction in the next minutes. Why you only set the dense = 1 for the output layer?

`outputs = layers.Dense(1, activation=’linear’)(x)`

1. Hi Yogi,

We put the parameter units=1 because we are predicting a single scalar value, which is the value in the next time period.

### How to build XGBoost models in Python With a step-by-step example

This is a practical guide to XGBoost in Python.
Learn how to build your first XGBoost model with this step-by-step tutorial.

### What is gradient boosting in machine learning: fundamentals explained Must read before implementing

This is a beginner’s guide to gradient boosting in machine learning.
Learn what it is and how to improve its performance with regularization.

### What are Python errors and How to fix them

This is a tutorial to Python errors for beginners. Learn their types and how to fix them with general steps.