Exogenous variables are external factors that provide additional information about the behavior of the target variable in time series forecasting. These variables, which are correlated with the target, can significantly improve predictions. Examples of exogenous variables include weather data, economic indicators, holiday markers, and promotional sales.
TimeGPT
allows you to include exogenous variables when
generating a forecast. This vignette will show you how to include them.
It assumes you have already set up your API key. If you haven’t done
this, please read the Get
Started vignette first.
For this vignette, we will use the electricity consumption dataset
with exogenous variables included in nixtlar
. This dataset
contains hourly prices from five different electricity markets, along
with two exogenous variables related to the prices and binary variables
indicating the day of the week.
df_exo_vars <- nixtlar::electricity_exo_vars
head(df_exo_vars)
#> unique_id ds y Exogenous1 Exogenous2 day_0 day_1 day_2
#> 1 BE 2016-10-22 00:00:00 70.00 49593 57253 0 0 0
#> 2 BE 2016-10-22 01:00:00 37.10 46073 51887 0 0 0
#> 3 BE 2016-10-22 02:00:00 37.10 44927 51896 0 0 0
#> 4 BE 2016-10-22 03:00:00 44.75 44483 48428 0 0 0
#> 5 BE 2016-10-22 04:00:00 37.10 44338 46721 0 0 0
#> 6 BE 2016-10-22 05:00:00 35.61 44504 46303 0 0 0
#> day_3 day_4 day_5 day_6
#> 1 0 0 1 0
#> 2 0 0 1 0
#> 3 0 0 1 0
#> 4 0 0 1 0
#> 5 0 0 1 0
#> 6 0 0 1 0
When using exogenous variables, nixtlar
distinguishes
between historical and future exogenous variables:
Historical Exogenous Variables: These should be
included in the input data immediately following the
id_col
, ds
, and y
columns. If
your dataset contains additional columns that are not exogenous
variables, you must remove them before using any core functions of
nixtlar
.
Future Exogenous Variables: These correspond to
the X_df
parameter and should cover the entire forecast
horizon. This dataset must include columns with the appropriate
timestamps and, if applicable, unique identifiers.
future_exo_vars <- nixtlar::electricity_future_exo_vars
head(future_exo_vars)
#> unique_id ds Exogenous1 Exogenous2 day_0 day_1 day_2 day_3
#> 1 BE 2016-12-31 00:00:00 64108 70318 0 0 0 0
#> 2 BE 2016-12-31 01:00:00 62492 67898 0 0 0 0
#> 3 BE 2016-12-31 02:00:00 61571 68379 0 0 0 0
#> 4 BE 2016-12-31 03:00:00 60381 64972 0 0 0 0
#> 5 BE 2016-12-31 04:00:00 60298 62900 0 0 0 0
#> 6 BE 2016-12-31 05:00:00 60339 62364 0 0 0 0
#> day_4 day_5 day_6
#> 1 0 1 0
#> 2 0 1 0
#> 3 0 1 0
#> 4 0 1 0
#> 5 0 1 0
#> 6 0 1 0
To generate a forecast with exogenous variables, use the
nixtla_client_forecast
function as you would for forecasts
without them. The only difference is that you must add the future
exogenous variables using the X_df
argument.
fcst_exo_vars <- nixtla_client_forecast(df_exo_vars, h = 24, X_df = future_exo_vars)
#> Frequency chosen: h
#> Using historical exogenous features: Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6
#> Using future exogenous features: Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6
head(fcst_exo_vars)
#> unique_id ds TimeGPT
#> 1 BE 2016-12-31 00:00:00 74.54077
#> 2 BE 2016-12-31 01:00:00 43.34429
#> 3 BE 2016-12-31 02:00:00 44.42921
#> 4 BE 2016-12-31 03:00:00 38.09440
#> 5 BE 2016-12-31 04:00:00 37.38914
#> 6 BE 2016-12-31 05:00:00 39.08574
For comparison, we will also generate a forecast without exogenous variables.
df <- nixtlar::electricity # same dataset but without exogenous variables
fcst <- nixtla_client_forecast(df, h = 24)
#> Frequency chosen: h
head(fcst)
#> unique_id ds TimeGPT
#> 1 BE 2016-12-31 00:00:00 45.19045
#> 2 BE 2016-12-31 01:00:00 43.24445
#> 3 BE 2016-12-31 02:00:00 41.95839
#> 4 BE 2016-12-31 03:00:00 39.79649
#> 5 BE 2016-12-31 04:00:00 39.20454
#> 6 BE 2016-12-31 05:00:00 40.10878
nixtlar
includes a function to plot the historical data
and any output from nixtla_client_forecast
,
nixtla_client_historic
,
nixtla_client_anomaly_detection
and
nixtla_client_cross_validation
. If you have long series,
you can use max_insample_length
to only plot the last N
historical values (the forecast will always be plotted in full).