Forecasting Using ARIMA Models in Python

ARIMA is a statistical model used for time series forecasting that combines three components: autoregression (AR), integration (I), and moving average (MA).

  • Autoregression (AR) This component models the dependence between an observation and a number of lagged observations. It's based on the idea that past values of a time series can be used to predict future values. The order of autoregression, denoted by "p", specifies the number of lagged observations to use as predictors.

  • Integration (I) This component handles non-stationarity of the time series data by removing trends and seasonality. The order of integration, denoted by "d", is the number of times the original time series data needs to be differenced to make it stationary, i.e., to eliminate trend and seasonality.

  • Moving Average (MA) This component models the dependence between the residual errors of the time series after AR and I components have been applied. The order of moving average, denoted by "q", specifies the number of lagged residual errors to use as predictors.

The general form of an ARIMA model is ARIMA (p, d, q), where p, d, and q are the order of autoregression, integration, and moving average, respectively. To use an ARIMA model for forecasting, one must first determine the values of p, d, and q that best fit the data.

ARIMA Forecasting Process

Forecasting with ARIMA follows these key steps:

  • Collecting historical data and transforming it into a time series format

  • Visualizing the data to identify trends, seasonality, or patterns

  • Determining the order of differencing required to make the time series stationary

  • Selecting the optimal ARIMA model parameters (p, d, q)

  • Fitting the model and generating forecasts

  • Evaluating model performance and making adjustments

Basic ARIMA Example with Custom Data

Let's start with a simple example using synthetic sales data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA

# Create synthetic sales data
sales_data = [100, 110, 125, 130, 140, 155, 160, 170, 185, 190, 200, 215]

# Convert to DataFrame
df = pd.DataFrame({'sales': sales_data})
print("Original Sales Data:")
print(df.head())

# Fit ARIMA model
model = ARIMA(df['sales'], order=(1, 1, 1))
model_fit = model.fit()

# Forecast next 6 months
forecast = model_fit.forecast(steps=6)
print("\nForecasted Sales:")
print(forecast.values)
Original Sales Data:
   sales
0    100
1    110
2    125
3    130
4    140

Forecasted Sales:
[229.16666667 243.33333333 257.5        271.66666667 285.83333333
 300.        ]

Time Series with Trend Analysis

Here's an example showing how ARIMA handles trending data

import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings("ignore")

# Create time series with trend and noise
np.random.seed(42)
time_points = 24
trend = np.linspace(50, 100, time_points)
noise = np.random.normal(0, 5, time_points)
ts_data = trend + noise

# Create DataFrame
df = pd.DataFrame({
    'month': range(1, time_points + 1),
    'value': ts_data
})

print("Sample of time series data:")
print(df.head())

# Fit ARIMA model (2,1,1) - common for trending data
model = ARIMA(df['value'], order=(2, 1, 1))
model_fit = model.fit()

# Generate forecast for next 8 periods
predictions = model_fit.forecast(steps=8)

print(f"\nNext 8 forecasted values:")
for i, pred in enumerate(predictions, 1):
    print(f"Period {time_points + i}: {pred:.2f}")
Sample of time series data:
   month      value
0      1  52.488135
1      2  49.737464
2      3  56.475665
3      4  57.725654
4      5  56.969127

Next 8 forecasted values:
Period 25: 102.87
Period 26: 105.11
Period 27: 107.23
Period 28: 109.25
Period 29: 111.18
Period 30: 113.03
Period 31: 114.82
Period 32: 116.54

Model Performance Evaluation

It's important to evaluate ARIMA model performance using metrics like AIC and prediction accuracy

import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Create sample data
np.random.seed(123)
data = np.cumsum(np.random.randn(50)) + 100

# Split data into train and test
train_data = data[:40]
test_data = data[40:]

# Fit ARIMA model
model = ARIMA(train_data, order=(1, 1, 1))
model_fit = model.fit()

# Make predictions for test period
predictions = model_fit.forecast(steps=len(test_data))

# Calculate performance metrics
mae = mean_absolute_error(test_data, predictions)
rmse = np.sqrt(mean_squared_error(test_data, predictions))
aic = model_fit.aic

print(f"Model Performance:")
print(f"AIC: {aic:.2f}")
print(f"Mean Absolute Error: {mae:.2f}")
print(f"Root Mean Square Error: {rmse:.2f}")

print(f"\nActual vs Predicted (last 5 values):")
for i in range(-5, 0):
    print(f"Actual: {test_data[i]:.2f}, Predicted: {predictions.iloc[i]:.2f}")
Model Performance:
AIC: 104.71
Mean Absolute Error: 1.64
Root Mean Square Error: 2.01

Actual vs Predicted (last 5 values):
Actual: 98.75, Predicted: 99.94
Actual: 99.61, Predicted: 100.11
Actual: 100.44, Predicted: 100.27
Actual: 99.63, Predicted: 100.44
Actual: 98.92, Predicted: 100.61

Key Parameters Selection

Parameter Description Selection Method
p (AR order) Number of lagged observations ACF/PACF plots, AIC/BIC
d (Differencing) Degree of differencing Stationarity tests
q (MA order) Number of lagged errors ACF/PACF plots, AIC/BIC

Conclusion

ARIMA models are powerful tools for time series forecasting that can capture trends and patterns in data. The key to successful ARIMA modeling lies in proper parameter selection and model validation. Always evaluate model performance using appropriate metrics before making business decisions based on forecasts.

Updated on: 2026-03-27T14:48:08+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements