In this tutorial, you’ll learn how to build **ARIMA models for time series prediction**, with an example in **Python**.

ARIMA is one of the fundamental time series forecasting models. It is a general class of models that includes different varieties. We can apply them to describe the autocorrelations in time series data to make predictions.

By following this tutorial, you’ll learn:

- What is ARIMA
- How to build an ARIMA model in Python, step-by-step
- How to automatically fit an ARIMA model in Python
- How to make predictions and evaluate them

If you want to use Python to create ARIMA models to predict your time series, this practical tutorial will get you started.

Let’s jump in!

## What is ARIMA?

ARIMA is a general class of statistical models for time series analysis forecasting. It stands for **A**uto-**R**egressive **I**ntegrated **M**oving **A**verage. When applying ARIMA models, we use a time series’ past values and/or forecast errors to predict its future values.

### Background knowledge

Before looking at the details of ARIMA models, let’s recap some definitions in time series analysis.

A **time series** is a sequence of data points collected in time order. Based on the time series characteristics, we can divide them into different categories, such as univariate vs. multivariate or linear vs. non-linear. In this tutorial, we’ll only look at and assume time series that are *univariate* (one variable), *linear*, and *non-seasonal*, which are great candidates for regular ARIMA models.

Before building ARIMA models, we should check whether the time series is **stationary**. If not, we can use a technique called *differencing *to transform it to be stationary. Don’t worry. ARIMA models have that transformation covered, which you’ll see in our example soon.

But what is a stationary time series? This is a critical concept so let’s elaborate on it.

As the term suggests, a stationary time series has its statistical properties remain constant across time. So it is easier to model and make predictions based on such a series. Below is an example stationary time series plot. You can see that it has no clear trend, and oscillates around a mean of 0 with constant variance.

Next, let’s take a closer look at the ARIMA model.

### Intro to ARIMA

As mentioned earlier, ARIMA is a general class of models. For different time series, we might be able to fit them with any of the combinations of the three components of ARIMA:

**AR**(Auto-Regressive): the time series is linearly regressed on its own past values**I**(Integrated): if not stationary, the time series can be differenced to become stationary, i.e., compute the differences between consecutive observations**MA**(Moving Average): the time series is ‘regressed’ on the past forecast errors

Here is the relationship of these three components in ARIMA: AR and MA are two models that usually work on stationary time series – if the time series is not stationary, we can use the I part to make it stationary.

These might sound a little confusing. But it will become more clear with explanations and our example.

### ARIMA parameters p, d, q

So corresponding to the three components of ARIMA mentioned above, there are three parameters p, d, and q. They take non-negative integer values, indicating which specific ARIMA model is used.

**p**: the number of past values included in the AR model

y_{t} = c + φ_{1}y_{t-1} + φ_{2}y_{t-2} + … + φ_{p}y_{t-p} + ε_{t}

where c is a constant, φ_{1}, …, φ_{p} are parameters, and ε_{t} is white noise

**d**: the number of times the time series is differenced. For example, the first order differencing is calculated as below:

∇y_{t} = y_{t} – y_{t-1}

**q**: the number of past forecast errors included in the MA model, or the size of the moving average window. It’s named the MA model since each y_{t}can be considered as a weighted moving average of the past forecast errors

y_{t} = c + ε_{t} + θ_{1}ε_{t-1} + θ_{2}ε_{t-2} + … + θ_{q}ε_{t-q}

So the ARIMA models are indeed a general class of models, including AR, MA, and ARMA. For example, ARIMA(p, 0, 0) is equivalent to AR(p), ARIMA(0, 0, q) is equivalent to MA(q).

The full model equation of ARIMA(p, d, q) is:

∇y_{t} = c + φ_{1}∇y_{t-1} + … + φ_{p}∇y_{t-p} + ε_{t} + θ_{1}ε_{t-1} + … + θ_{q}ε_{t-q}

where ∇y_{t} is the differenced time series, which could be more than one time differencing

All right! Now you’ve learned the basics of ARIMA models. It’s time to see a real example.

## Step 0: Explore the dataset

We’ll use a time series that records the traffic to our website justintodata.com. Please download the dataset from here. Due to privacy, I have removed the actual date information. Our target is to build an ARIMA model in Python that can predict the next 30 periods of time traffic.

The first thing to do is always **plot the time series and see its pattern**.

<class 'pandas.core.frame.DataFrame'> RangeIndex: 393 entries, 0 to 392 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 traffic 393 non-null int64 dtypes: int64(1) memory usage: 3.2 KB

You can see that this time series’ variance doesn’t look constant throughout time. So first, let’s transform this series to stabilize its variance. We can take the logarithm to do this.

You can see that the plot is less variant across time compared to the original one.

Now let’s also split the time series into training and test sets. We set the last 30 data points as the test set and the previous data as the training set.

We are now ready to consider how to fit an ARIMA model in Python.

## Step 1: Check for stationarity of time series

As mentioned earlier, we usually apply ARIMA models on stationary time series. This means we must check whether our time series has its statistical properties staying approximately the same throughout time.

There are different ways of checking whether the time series is stationary.

### Method #1: time series plot

Many time series are obviously non-stationary. We can tell by looking at their plots.

From the previous step, we can see that our time series has a strong upward trend, so it is non-stationary.

When your time series is harder to judge, we can use other methods.

### Method #2: ACF plot and PACF plot

These two plots are helpful throughout the process of fitting ARIMA models. So let’s take a closer look at them.

The **ACF **(autocorrelation function) is the correlation of the time series with its lags. It measures the linear relationship between lagged values of the time series. For example, we can measure the autocorrelation of y_{t }and y_{t-k} for different values of k.

ACF is straightforward to measure. But in reality, the relationships among lags are more complicated. For example, we can assume that y_{t} and y_{t-1} are correlated, and y_{t-1} and y_{t-2} are also correlated. Due to their correlations with y_{t-1}, y_{t} and y_{t-2} must also be correlated. How can we measure if there is new information in y_{t-2} to predict y_{t}, besides their relationships with y_{t-1}?

That’s why we need another definition called partial autocorrelations. The **PACF **(partial autocorrelation function) shows the partial correlation of the time series with its lags, after removing the effects of lower-order-lags between them. For example, the partial autocorrelation of y_{t} and y_{t-k} is the correlation that is not explained by their relationships with the lags y_{t-1}, y_{t-2}, …, y_{t-k+1}. It measures the *balance *amount of variance in y_{t-k} to predict the future value of y_{t}.

Let’s use the `statsmodels`

Python package to plot the ACF and PACF of our training set time series.

The ACF plot (left) shows the correlations with the lags are high and positive with very slow decay. While the PACF plot (right) shows the partial autocorrelations have a single spike at lag 1. These are both signs of a trended time series. So our time series is not stationary.

### Method #3: ADF test

Besides plots, we can also check for stationarity with statistical tests like the **ADF **test. The ADF (Augmented Dickey-Fuller) test tests for the null hypothesis that there is a unit root (non-stationary).

p-value: 0.24126116082883903

This result shows a large p-value, which means the test fails to reject the null hypothesis. So the ADF test also suggests that our time series is non-stationary.

With all the evidence above, let’s transform our time series to a stationary one.

### Transform to stationary: differencing

As mentioned earlier, we can use differencing to make a non-stationary time series to be stationary. It helps to stabilize the mean of the time series by removing changes in the levels of the series.

Let’s try it out on the training dataset and plot the new series. Below we are using the current observation minus the previous observation to get the new series. Within the code, the `dropna`

method is applied since the first observation has no previous observation to subtract, so its difference is missing.

You can see that the first difference time series doesn’t show a strong trend anymore. It looks to be more stationary.

We can also look at the ACF and PACF plots of the first difference time series `df_train_diff`

.

Compared to the original series, the ACF plot (left) drops in value more quickly. While the PACF plot (right) also shows a less strong spike at lag 1. These are signs of the series being more stationary.

We can also quantify the result by running the ADF test on the series.

p-value: 0.022059461239126336

The p-value is enough to reject the null hypothesis at a 5% significance level.

So we can conclude that our first difference time series is likely to be stationary!

This is good. If your first difference time series still shows a strong non-stationary pattern, you can try applying the difference a second time, i.e., the difference on the first difference time series. Most of the time, we should not go beyond second-order differences.

We’ve successfully figured out the I part in the ARIMA(p, d, q) model. We need the parameter d to be 1.

## Step 2: Determine ARIMA models parameters p, q

Now it’s time to decide on the other two parameters p and q. They determine the models that we’ll be using from ARIMA.

We can, again, use the ACF and PACF plots. We are going to model based on the first difference time series `df_train_diff`

. So let’s look at the plots of `df_train_diff`

again. Based on these plots, we can select the initial values of p and q.

Here is a rule of thumb for selecting the values of p and q:

- If the PACF plot has a significant spike at lag p, but not beyond; the ACF plot decays more gradually. This may suggest an ARIMA(p, d, 0) model
- If the ACF plot has a significant spike at lag q, but not beyond; the PACF plot decays more gradually. This may suggest an ARIMA(0, d, q) model

This is because the PACF measures the balance variance of the lags; it helps tell us whether we should include such lag within the auto-regressive (AR) models. While the ACF measures the correlations with the lags, it helps judge the moving average (MA) models. Most of the time, we should focus on either the AR or the MA models, not mixed.

For our differenced series, the PACF has a large spike at lag 1, and still shows more minor but significant lags at 2, 4, and 5. In contrast, the ACF shows a more gradual decay. So we can try for ARIMA models with the p parameter being 2, 3, 4, or even 5. Let’s try 2 since we usually prefer a simpler model. So we’ll fit an ARIMA(2, 1, 0) model.

As you can see, it can be hard and highly subjective to select appropriate values for the parameters of ARIMA models. You may try multiple models to find the best one for your need.

## Step 3: Fit the ARIMA model

After deciding the parameters of p, d, and q, we can fit the ARIMA model in Python!

We’ll use the classic Python package `statsmodels`

. Note that its function will apply the difference for us, so we are fitting the model using the original training set `df_train`

.

Next, let’s use this model to make predictions.

## Step 4: Make time series predictions

Before using this model to make time-series predictions, we need to make sure our model has captured adequate information from the data. We can check this by looking at the residuals. If the model is good, its residuals should look like white noise.

We’ll plot the residuals and their density.

The residuals look random in general, and their density looks normally distributed with a mean of around 0.

We can also look at the ACF and PACF plots of the residuals.

The lower lags barely show any significant ‘spikes’.

These show that the residuals are close to white noise. We are ready to forecast with this model ARIMA(2, 1, 0).

Let’s calculate the predictions in Python and plot them together with the actual series, including the test set.

You can see that the forecast follows the previous momentum and shows an upward trend. While in reality, the traffic fell at first before heading back up.

It is challenging to make time-series predictions!

## Optional: Auto-fit the ARIMA model

You’ve done a lot of work! It takes effort to identify a good combination of parameters of ARIMA models.

The good news is that there are Python packages that provide functions to fit ARIMA models automatically. Let’s try the `pmdarima`

Python package. It offers automatic ARIMA modeling based on the `statsmodels`

library that we’ve been using.

So we’ll start from the training set `df_train`

we obtained in step 0. The `auto_arima`

function can help us automate steps 1 to 3 to fit an ARIMA model automatically. It will generate the optimal model based on its criteria. Below we also set two of its parameters to be False so that it will consider all possible non-seasonal models. *Note that when applying the auto ARIMA function, it is still mandatory to perform step 0 to explore the dataset. *

ARIMA(order=(5, 1, 0), scoring_args={}, suppress_warnings=True)

It returns the model ARIMA(5, 1, 0).

We can also print out its summary.

By default, the auto process uses the KPSS unit root test to select the value of parameter d, then uses the AIC information criteria to determine the values of p and q. You can change the methods by setting them within the function `auto_arima`

. Please read more about the function in its documentation here.

As you can see, the auto-fitting process is simple. But based on current technology, the automatic process can’t completely replace our judgment. So please use the auto process cautiously and make the final decision based on your experience.

## Step 5: Evaluate model predictions

Now we have two ARIMA models: ARIMA(2, 1, 0) and the auto-fitted ARIMA(5, 1, 0). Let’s compare and evaluate their predictions.

*Note: before forecasting, we should also check the residuals of the auto-fitted ARIMA model. But in this tutorial, I’ll skip the process.*

First, if we plot both models’ predictions together, you can see that the manually fitted model of ARIMA(2, 1, 0) is closer to the actual test set.

Also, we can use common metrics to evaluate these time series model predictions in Python.

Below we are using the MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error), and RMSE (Root Mean Squared Error). We calculate these sets of metrics for both of the models.

mae - manual: 0.14514968388076604 mape - manual: 0.017216198515850756 rmse - manual: 0.15051403899106064

mae - auto: 0.21701199293639092 mape - auto: 0.025732742863908743 rmse - auto: 0.2313024737299977

You can see that based on our test dataset, the model ARIMA(2, 1, 0) is better with lower errors.

So the best model picked by the auto process might not give better predictions on the test dataset.

## Other suggestions

Finally, I would like to mention a couple of tips for our time series prediction example.

- We’ve only completed the model assessment on one split of training and test sets. In reality, you want to conduct time series cross-validation to select the best model
- ARIMA models can also accommodate seasonality. Learn about seasonal ARIMA models (SARIMA)
- Certain events would impact the website traffic, e.g., postings of new articles. Those new variables could improve the time series prediction as well

That’s it!

In this tutorial, you’ve successfully built ARIMA time series models and made predictions in Python!

Please leave a comment for any questions you may have or anything else.

## 7 thoughts on “How to build ARIMA models in Python for time series prediction<br /><div style='color:#7A7A7A;font-size: large;font-family:roboto;font-weight:400;'> With an example, step-by-step</div>”

KrisGreat post! Thank you. One question, though.

I reproduced the process with the results almost identical to yours shown above. However, I was curious what will happen if I change the seasonal parameter to the auto_arima function from False to True. I restarted the cluster (so nothing was cached) and ran the process again. I was surprised to see that the evaluation results were identical (all 17 decimal digits) to the prior evaluation results for the parameter set to False.

Why could it be the case? I might suppose there is weak seasonality in the data but I would have expected some differences (for better or worse). Is there some kind of test, which based on observations decides not to include seasonality in the algorithm even when parameter is True?

Lianne & JustinHi Kris, the auto_arima document mentioned: “If the seasonal optional is enabled, auto-ARIMA also seeks to identify the optimal P and Q hyper-parameters after conducting the Canova-Hansen to determine the optimal order of seasonal differencing, D.”

So I guess based on the test, there’s no seasonality in the dataset. That’s why it didn’t add any seasonal parameters. And the dataset plot doesn’t look like there’s seasonality as well.

Hope this helps.

KrisYes, it does. Thank you!

vanetojHi Kris, i m getting negative values after the prediction . how do i avoid that ?

Lianne & JustinHi vanetoj, it’s hard to know for sure, perhaps your training data has negative values in it?

HuyHi, I would like to ask what is the final conclusion, prediction for the next 30 time periods. Since I see in Time series prediction plot comparison between prediction and reality why is there actual traffic available at the same time as prediction? Thank you.

Lianne & JustinHi Huy, when modeling, you usually split existing dataset into training and test sets. You use training to train the model and then use test to see the performance. Then you might apply it to brand new dataset (e.g., in this example in the future without actual traffic) to make prediction.