Theta 模型¶

Assimakopoulos & Nikolopoulos (2000) 的 Theta 模型是一种简单的预测方法，它涉及拟合两条 \(\theta\) 线，使用简单指数平滑法预测这些线，然后将来自这两条线的预测结合起来以产生最终预测。该模型按以下步骤实现

测试季节性
如果检测到季节性，则进行季节性调整
通过将 SES 模型拟合到数据来估计 \(\alpha\)，并通过 OLS 估计 \(b_0\)。
预测序列
如果数据已进行季节性调整，则重新进行季节性调整。

季节性测试检查季节性滞后 \(m\) 处的 ACF。如果该滞后明显不同于零，则使用 statsmodels.tsa.seasonal_decompose 对数据进行季节性调整，使用乘法方法（默认）或加法方法。

模型的参数是 \(b_0\) 和 \(\alpha\)，其中 \(b_0\) 是从 OLS 回归中估计的

\[X_t = a_0 + b_0 (t-1) + \epsilon_t\]

而 \(\alpha\) 是 SES 平滑参数，在以下公式中

\[\tilde{X}_t = (1-\alpha) X_t + \alpha \tilde{X}_{t-1}\]

然后预测值为

\[\hat{X}_{T+h|T} = \frac{\theta-1}{\theta} \hat{b}_0 \left[h - 1 + \frac{1}{\hat{\alpha}} - \frac{(1-\hat{\alpha})^T}{\hat{\alpha}} \right] + \tilde{X}_{T+h|T}\]

最终，\(\theta\) 只在决定趋势的衰减程度方面发挥作用。如果 \(\theta\) 非常大，则该模型的预测与带有漂移的积分移动平均预测相同，

\[X_t = X_{t-1} + b_0 + (\alpha-1)\epsilon_{t-1} + \epsilon_t.\]

最后，如果需要，对预测进行重新季节性调整。

该模块基于

Assimakopoulos, V., & Nikolopoulos, K. (2000). The theta model: a decomposition approach to forecasting. International journal of forecasting, 16(4), 521-530.
Hyndman, R. J., & Billah, B. (2003). Unmasking the Theta method. International Journal of Forecasting, 19(2), 287-290.
Fioruci, J. A., Pellegrini, T. R., Louzada, F., & Petropoulos, F. (2015). The optimized theta method. arXiv preprint arXiv:1503.03529.

导入¶

我们从标准的导入集开始，并对默认的 matplotlib 样式进行一些调整。

[1]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader as pdr
import seaborn as sns

plt.rc("figure", figsize=(16, 8))
plt.rc("font", size=15)
plt.rc("lines", linewidth=3)
sns.set_style("darkgrid")

加载一些数据¶

我们首先查看使用美国数据的房屋开工数据。该系列显然是季节性的，但在同一时期没有明显的趋势。

[2]:

reader = pdr.fred.FredReader(["HOUST"], start="1980-01-01", end="2020-04-01")
data = reader.read()
housing = data.HOUST
housing.index.freq = housing.index.inferred_freq
ax = housing.plot()

../../../_images/examples_notebooks_generated_theta-model_4_0.png

我们拟合指定模型，没有任何选项并进行拟合。摘要显示数据使用乘法方法进行了季节性调整。漂移适度且为负，平滑参数相当低。

[3]:

from statsmodels.tsa.forecasting.theta import ThetaModel

tm = ThetaModel(housing)
res = tm.fit()
print(res.summary())

                              ThetaModel Results
==============================================================================
Dep. Variable:                  HOUST   No. Observations:                  484
Method:                       OLS/SES   Deseasonalized:                   True
Date:                Thu, 03 Oct 2024   Deseas. Method:         Multiplicative
Time:                        15:46:28   Period:                             12
Sample:                    01-01-1980
                         - 04-01-2020
   Parameter Estimates
=========================
           Parameters
-------------------------
b0    -0.9194460961668147
alpha   0.616996789006705
-------------------------

该模型首先也是一个预测方法。使用拟合模型中的 forecast 方法生成预测。下面，我们通过每 2 年预测 2 年的预测值生成刺猬图。

注意：默认的 \(\theta\) 为 2。

[4]:

forecasts = {"housing": housing}
for year in range(1995, 2020, 2):
    sub = housing[: str(year)]
    res = ThetaModel(sub).fit()
    fcast = res.forecast(24)
    forecasts[str(year)] = fcast
forecasts = pd.DataFrame(forecasts)
ax = forecasts["1995":].plot(legend=False)
children = ax.get_children()
children[0].set_linewidth(4)
children[0].set_alpha(0.3)
children[0].set_color("#000000")
ax.set_title("Housing Starts")
plt.tight_layout(pad=1.0)

../../../_images/examples_notebooks_generated_theta-model_8_0.png

或者，我们可以拟合数据的对数。在这里，强制季节性调整使用加法方法更有意义，如果需要。我们还使用 MLE 拟合模型参数。该方法拟合 IMA

\[X_t = X_{t-1} + \gamma\epsilon_{t-1} + \epsilon_t\]

其中 \(\hat{\alpha}\) = \(\min(\hat{\gamma}+1, 0.9998)\)，使用 statsmodels.tsa.SARIMAX。参数相似，尽管漂移更接近于零。

[5]:

tm = ThetaModel(np.log(housing), method="additive")
res = tm.fit(use_mle=True)
print(res.summary())

                              ThetaModel Results
==============================================================================
Dep. Variable:                  HOUST   No. Observations:                  484
Method:                           MLE   Deseasonalized:                   True
Date:                Thu, 03 Oct 2024   Deseas. Method:               Additive
Time:                        15:46:30   Period:                             12
Sample:                    01-01-1980
                         - 04-01-2020
     Parameter Estimates
=============================
             Parameters
-----------------------------
b0    -0.00044644118691643226
alpha       0.670610385005854
-----------------------------

预测仅取决于预测趋势分量，

\[\hat{b}_0 \left[h - 1 + \frac{1}{\hat{\alpha}} - \frac{(1-\hat{\alpha})^T}{\hat{\alpha}} \right],\]

SES 的预测（不会随着预测范围的改变而改变），以及季节性因素。这三个分量可以使用 forecast_components 获得。这允许使用上面权重表达式构建使用多个 \(\theta\) 选择的预测。

[6]:

res.forecast_components(12)

[6]:

	趋势	ses	季节性
2020-05-01	-0.000666	6.95726	-0.001252
2020-06-01	-0.001112	6.95726	-0.006891
2020-07-01	-0.001559	6.95726	0.002992
2020-08-01	-0.002005	6.95726	-0.003817
2020-09-01	-0.002451	6.95726	-0.003902
2020-10-01	-0.002898	6.95726	-0.003981
2020-11-01	-0.003344	6.95726	0.008536
2020-12-01	-0.003791	6.95726	-0.000714
2021-01-01	-0.004237	6.95726	0.005239
2021-02-01	-0.004684	6.95726	0.009943
2021-03-01	-0.005130	6.95726	-0.004535
2021-04-01	-0.005577	6.95726	-0.001619

个人消费支出¶

接下来，我们查看个人消费支出。该系列具有明显的季节性分量和漂移。

[7]:

reader = pdr.fred.FredReader(["NA000349Q"], start="1980-01-01", end="2020-04-01")
pce = reader.read()
pce.columns = ["PCE"]
pce.index.freq = "QS-OCT"
_ = pce.plot()

../../../_images/examples_notebooks_generated_theta-model_14_0.png

由于该系列始终为正，因此我们对 \(\ln\) 进行建模。

[8]:

mod = ThetaModel(np.log(pce))
res = mod.fit()
print(res.summary())

                              ThetaModel Results
==============================================================================
Dep. Variable:                    PCE   No. Observations:                  162
Method:                       OLS/SES   Deseasonalized:                   True
Date:                Thu, 03 Oct 2024   Deseas. Method:         Multiplicative
Time:                        15:46:32   Period:                              4
Sample:                    01-01-1980
                         - 04-01-2020
   Parameter Estimates
==========================
           Parameters
--------------------------
b0    0.013035370221488518
alpha   0.9998851279204637
--------------------------

接下来，我们探讨预测值随着 \(\theta\) 变化的差异。当 \(\theta\) 接近 1 时，漂移几乎不存在。随着 \(\theta\) 的增加，漂移变得更加明显。

[9]:

forecasts = pd.DataFrame(
    {
        "ln PCE": np.log(pce.PCE),
        "theta=1.2": res.forecast(12, theta=1.2),
        "theta=2": res.forecast(12),
        "theta=3": res.forecast(12, theta=3),
        "No damping": res.forecast(12, theta=np.inf),
    }
)
_ = forecasts.tail(36).plot()
plt.title("Forecasts of ln PCE")
plt.tight_layout(pad=1.0)

../../../_images/examples_notebooks_generated_theta-model_18_0.png

最后，plot_predict 可用于可视化预测和预测区间，这些区间是假设 IMA 为真而构建的。

[10]:

ax = res.plot_predict(24, theta=2)

../../../_images/examples_notebooks_generated_theta-model_20_0.png

我们通过使用 2 年的非重叠样本生成刺猬图来结束。

[11]:

ln_pce = np.log(pce.PCE)
forecasts = {"ln PCE": ln_pce}
for year in range(1995, 2020, 3):
    sub = ln_pce[: str(year)]
    res = ThetaModel(sub).fit()
    fcast = res.forecast(12)
    forecasts[str(year)] = fcast
forecasts = pd.DataFrame(forecasts)
ax = forecasts["1995":].plot(legend=False)
children = ax.get_children()
children[0].set_linewidth(4)
children[0].set_alpha(0.3)
children[0].set_color("#000000")
ax.set_title("ln PCE")
plt.tight_layout(pad=1.0)

../../../_images/examples_notebooks_generated_theta-model_22_0.png

最后更新：2024 年 10 月 3 日