SARIMAX 和 ARIMA：常见问题解答 (FAQ)¶

此笔记本包含对常见问题的解释。

比较 SARIMAX、ARIMA 和 AutoReg 中的趋势和外生变量
重建 SARIMAX 和 ARIMA 中的残差、拟合值和预测
SARIMAX 和 ARIMA 中的初始残差

比较 `SARIMAX`、`ARIMA` 和 `AutoReg` 中的趋势和外生变量¶

ARIMA 正式上是具有 ARMA 误差的 OLS。具有 ARMA 误差的 OLS 中的基本 AR(1) 被描述为

\[\begin{split}\begin{align} Y_t & = \delta + \epsilon_t \\ \epsilon_t & = \rho \epsilon_{t-1} + \eta_t \\ \eta_t & \sim WN(0,\sigma^2) \\ \end{align}\end{split}\]

在大样本中，\(\hat{\delta}\stackrel{p}{\rightarrow} E[Y]\).

SARIMAX 使用不同的表示形式，因此使用 SARIMAX 估计的模型为

\[\begin{split}\begin{align} Y_t & = \phi + \rho Y_{t-1} + \eta_t \\ \eta_t & \sim WN(0,\sigma^2) \\ \end{align}\end{split}\]

这与使用 OLS (AutoReg) 估计模型时使用的表示形式相同。在大样本中，\(\hat{\phi}\stackrel{p}{\rightarrow} E[Y](1-\rho)\).

在下一个单元格中，我们模拟了一个大样本并验证了这些关系在实践中是否成立。

[1]:

%matplotlib inline

[2]:

import numpy as np
import pandas as pd

rng = np.random.default_rng(20210819)
eta = rng.standard_normal(5200)
rho = 0.8
beta = 10
epsilon = eta.copy()
for i in range(1, eta.shape[0]):
    epsilon[i] = rho * epsilon[i - 1] + eta[i]
y = beta + epsilon
y = y[200:]

[3]:

from statsmodels.tsa.api import SARIMAX, AutoReg
from statsmodels.tsa.arima.model import ARIMA

这三个模型在下一个单元格中被指定和估计。AR(0) 作为参考包含在内。AR(0) 在所有三个估计器中是相同的。

[4]:

ar0_res = SARIMAX(y, order=(0, 0, 0), trend="c").fit()
sarimax_res = SARIMAX(y, order=(1, 0, 0), trend="c").fit()
arima_res = ARIMA(y, order=(1, 0, 0), trend="c").fit()
autoreg_res = AutoReg(y, 1, trend="c").fit()

 This problem is unconstrained.
 This problem is unconstrained.

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            2     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.91760D+00    |proj g|=  3.68860D-06

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    2      0      1      0     0     0   3.689D-06   1.918D+00
  F =   1.9175996129577773

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            3     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.41373D+00    |proj g|=  9.51828D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    3      2      5      1     0     0   4.516D-05   1.414D+00
  F =   1.4137311050015484

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH

下表包含模型中的估计参数、估计的 AR(1) 系数以及长期均值，该均值要么等于估计参数 (AR(0) 或 ARIMA)，要么取决于截距与 1 减去 AR(1) 参数的比率。

[5]:

intercept = [
    ar0_res.params[0],
    sarimax_res.params[0],
    arima_res.params[0],
    autoreg_res.params[0],
]
rho_hat = [0] + [r.params[1] for r in (sarimax_res, arima_res, autoreg_res)]
long_run = [
    ar0_res.params[0],
    sarimax_res.params[0] / (1 - sarimax_res.params[1]),
    arima_res.params[0],
    autoreg_res.params[0] / (1 - autoreg_res.params[1]),
]
cols = ["AR(0)", "SARIMAX", "ARIMA", "AutoReg"]
pd.DataFrame(
    [intercept, rho_hat, long_run],
    columns=cols,
    index=["delta-or-phi", "rho", "long-run mean"],
)

[5]:

	AR(0)	SARIMAX	ARIMA	AutoReg
delta 或 phi	9.7745	1.985714	9.774498	1.985790
rho	0.0000	0.796846	0.796875	0.796882
长期均值	9.7745	9.774424	9.774498	9.776537

SARIMAX 中趋势和 exog 之间的差异¶

当 SARIMAX 包含 exog 变量时，exog 被视为 OLS 回归量，因此估计的模型为

\[\begin{split}\begin{align} Y_t - X_t \beta & = \delta + \rho (Y_{t-1} - X_{t-1}\beta) + \eta_t \\ \eta_t & \sim WN(0,\sigma^2) \\ \end{align}\end{split}\]

在下一个示例中，我们省略了趋势，而是包含了一列 1，这会产生一个等效的模型，在大样本中，该模型等效于没有外生回归量且 trend="c" 的情况。这里 const 的估计值与使用 ARIMA 估计的值相匹配。发生这种情况是因为 SARIMAX 中的 exog 和 ARIMA 中的趋势都被视为具有 ARMA 误差的线性回归模型。

[6]:

sarimax_exog_res = SARIMAX(y, exog=np.ones_like(y), order=(1, 0, 0), trend="n").fit()
print(sarimax_exog_res.summary())

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            3     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.41373D+00    |proj g|=  1.06920D-04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    3      1      4      1     0     0   4.752D-05   1.414D+00
  F =   1.4137311099487531

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH

 This problem is unconstrained.

                               SARIMAX Results
==============================================================================
Dep. Variable:                      y   No. Observations:                 5000
Model:               SARIMAX(1, 0, 0)   Log Likelihood               -7068.656
Date:                Thu, 03 Oct 2024   AIC                          14143.311
Time:                        15:45:04   BIC                          14162.863
Sample:                             0   HQIC                         14150.164
                               - 5000
Covariance Type:                  opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          9.7745      0.069    141.177      0.000       9.639       9.910
ar.L1          0.7969      0.009     93.691      0.000       0.780       0.814
sigma2         0.9894      0.020     49.921      0.000       0.951       1.028
===================================================================================
Ljung-Box (L1) (Q):                   0.42   Jarque-Bera (JB):                 0.08
Prob(Q):                              0.51   Prob(JB):                         0.96
Heteroskedasticity (H):               0.97   Skew:                            -0.01
Prob(H) (two-sided):                  0.47   Kurtosis:                         2.99
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

在 SARIMAX 和 ARIMA 中使用 `exog`¶

虽然 exog 在这两个模型中处理方式相同，但截距仍然不同。下面我们向 y 添加一个外生回归量，然后使用所有三种方法拟合模型。数据生成过程现在是

\[\begin{split}\begin{align} Y_t & = \delta + X_t \beta + \epsilon_t \\ \epsilon_t & = \rho \epsilon_{t-1} + \eta_t \\ \eta_t & \sim WN(0,\sigma^2) \\ \end{align}\end{split}\]

[7]:

full_x = rng.standard_normal(eta.shape)
x = full_x[200:]
y += 3 * x

[8]:

sarimax_exog_res = SARIMAX(y, exog=x, order=(1, 0, 0), trend="c").fit()
arima_exog_res = ARIMA(y, exog=x, order=(1, 0, 0), trend="c").fit()

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            4     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.42683D+00    |proj g|=  2.05943D-01

 This problem is unconstrained.

At iterate    5    f=  1.41332D+00    |proj g|=  1.60874D-03

At iterate   10    f=  1.41329D+00    |proj g|=  3.11659D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    4     11     15      1     0     0   1.796D-06   1.413D+00
  F =   1.4132928400115858

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL

检查参数表，我们看到 x1 上的参数估计值是相同的，而 intercept 的估计值仍然不同，因为这些估计器对趋势的处理方式不同。

`SARIMAX`¶

[9]:

def print_params(s):
    from io import StringIO

    return pd.read_csv(StringIO(s.tables[1].as_csv()), index_col=0)

print_params(sarimax_exog_res.summary())

[9]:

	coef	std err	z	P>\|z\|	[0.025	0.975]

intercept	1.9849	0.085	23.484	0.0	1.819	2.151
x1	3.0231	0.011	277.150	0.0	3.002	3.044
ar.L1	0.7969	0.009	93.735	0.0	0.780	0.814
sigma2	0.9886	0.020	49.941	0.0	0.950	1.027

`ARIMA`¶

[10]:

print_params(arima_exog_res.summary())

[10]:

	coef	std err	z	P>\|z\|	[0.025	0.975]

const	9.7741	0.069	141.201	0.0	9.638	9.910
x1	3.0231	0.011	277.140	0.0	3.002	3.044
ar.L1	0.7969	0.009	93.728	0.0	0.780	0.814
sigma2	0.9886	0.020	49.941	0.0	0.950	1.027

`exog` 在 `AutoReg` 中¶

当使用 AutoReg 使用 OLS 估计模型时，模型与 SARIMAX 和 ARIMA 都不同。具有外生变量的 AutoReg 规范为

\[\begin{split}\begin{align} Y_t & = \phi + \rho Y_{t-1} + X_{t}\beta + \eta_t \\ \eta_t & \sim WN(0,\sigma^2) \\ \end{align}\end{split}\]

此规范不等效于 SARIMAX 和 ARIMA 中估计的规范。这里的差异并非微不足道，对同一时间序列进行的简单估计会导致不同的参数值，即使在大样本（以及极限）中也是如此。估计此模型会改变 AR(1) 系数上的参数估计值。

`AutoReg`¶

[11]:

autoreg_exog_res = AutoReg(y, 1, exog=x, trend="c").fit()
print_params(autoreg_exog_res.summary())

[11]:

	coef	std err	z	P>\|z\|	[0.025	0.975]

const	7.9714	0.064	124.525	0.0	7.846	8.097
y.L1	0.1838	0.006	29.890	0.0	0.172	0.196
x1	3.0311	0.021	142.513	0.0	2.989	3.073

主要差异可以通过用滞后算子表示法编写模型来观察。

\[\begin{split}\begin{align} (1-\phi L ) Y_t & = X_{t}\beta + \eta_t \Rightarrow \\ Y_t & = (1-\phi L )^{-1}\left(X_{t}\beta + \eta_t\right) \\ Y_t & = \sum_{i=0}^{\infty} \phi^i \left(X_{t-i}\beta + \eta_{t-i}\right) \end{align}\end{split}\]

这里假设 \(|\phi|<1\)。这里我们看到 \(Y_t\) 取决于 \(X_t\) 和 \(\eta_t\) 的所有滞后值。这不同于 SARIMAX 和 ARIMA 估计的规范，可以看出是

\[\begin{split}\begin{align} Y_t - X_t \beta & = \delta + \rho (Y_{t-1} - X_{t-1}\beta) + \eta_t \\ \left(1-\rho L \right)\left(Y_t - X_t \beta\right) & = \delta + \eta_t \\ Y_t - X_t \beta & = \frac{\delta}{1-\rho} + \left(1-\rho L \right)^{-1}\eta_t \\ Y_t - X_t \beta & = \frac{\delta}{1-\rho} + \sum_{i=0}^\infty \rho^i \eta_{t-i} \\ Y_t & = \frac{\delta}{1-\rho} + X_t \beta + \sum_{i=0}^\infty \rho^i \eta_{t-i} \\ \end{align}\end{split}\]

在此规范中，\(Y_t\) 仅取决于 \(X_t\) 而不取决于其他滞后。

在 AutoReg 中使用正确的 DGP¶

模拟 AutoReg 中估计的过程表明，参数是从真实模型中恢复的。

[12]:

y = beta + eta
epsilon = eta.copy()
for i in range(1, eta.shape[0]):
    y[i] = beta * (1 - rho) + rho * y[i - 1] + 3 * full_x[i] + eta[i]
y = y[200:]

具有正确 DGP 的 `AutoReg`¶

[13]:

autoreg_alt_exog_res = AutoReg(y, 1, exog=x, trend="c").fit()
print_params(autoreg_alt_exog_res.summary())

[13]:

	coef	std err	z	P>\|z\|	[0.025	0.975]

const	1.9870	0.030	66.526	0.0	1.928	2.046
y.L1	0.7968	0.003	300.382	0.0	0.792	0.802
x1	3.0263	0.014	217.034	0.0	2.999	3.054

重建 SARIMAX 和 ARIMA 中的残差、拟合值和预测¶

在仅包含自回归项、趋势和外生变量的模型中，一旦模型中的最大滞后长度达到，就可以轻松重建拟合值和预测。在实践中，这意味着在 \((P+D)s+p+d\) 个时期之后。早期的预测和残差更难重建，因为模型构建了对 \(Y_t|Y_{t-1},Y_{t-2},...\) 的最佳预测。当 \(Y\) 的滞后数量小于自回归阶数时，则最佳预测的表达式与模型不同。例如，当预测第一个值 \(Y_1\) 时，没有来自 \(Y\) 历史的信息，因此最佳预测是无条件均值。在 AR(1) 的情况下，第二个预测将遵循模型，因此，当使用 ARIMA 时，预测为

\[Y_2 = \hat{\delta} + \hat{\rho} \left(Y_1 - \hat{\delta}\right)\]

因为 ARIMA 将外生项和趋势项都视为具有 ARMA 误差的回归。

这可以在下一组单元格中看到。

[14]:

arima_res = ARIMA(y, order=(1, 0, 0), trend="c").fit()
print_params(arima_res.summary())

[14]:

	coef	std err	z	P>\|z\|	[0.025	0.975]

const	9.9346	0.222	44.667	0.0	9.499	10.371
ar.L1	0.7957	0.009	92.515	0.0	0.779	0.813
sigma2	10.3015	0.204	50.496	0.0	9.902	10.701

[15]:

arima_res.predict(0, 2)

[15]:

array([ 9.93458658, 10.91088035, 11.80415747])

[16]:

delta_hat, rho_hat = arima_res.params[:2]
delta_hat + rho_hat * (y[0] - delta_hat)

[16]:

np.float64(10.910880346250012)

SARIMAX 处理趋势项的方式不同，因此使用 SARIMAX 估计的模型的一步预测为

\[Y_2 = \hat\delta + \hat\rho Y_1\]

[17]:

sarima_res = SARIMAX(y, order=(1, 0, 0), trend="c").fit()
print_params(sarima_res.summary())

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            3     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.58518D+00    |proj g|=  5.99456D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    3      3      5      1     0     0   3.347D-05   2.585D+00
  F =   2.5851830060985752

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH

 This problem is unconstrained.

[17]:

	coef	std err	z	P>\|z\|	[0.025	0.975]

intercept	2.0283	0.097	20.841	0.0	1.838	2.219
ar.L1	0.7959	0.009	92.536	0.0	0.779	0.813
sigma2	10.3007	0.204	50.500	0.0	9.901	10.700

[18]:

sarima_res.predict(0, 2)

[18]:

array([ 9.93588659, 10.91128867, 11.80469658])

[19]:

delta_hat, rho_hat = sarima_res.params[:2]
delta_hat + rho_hat * y[0]

[19]:

np.float64(10.911288670367867)

包含 MA 成分的预测¶

当模型包含 MA 成分时，预测会更加复杂，因为误差永远无法直接观察。预测仍然是 \(Y_t|Y_{t-1},Y_{t-2},...\)，并且当 MA 成分可逆时，最佳预测可以表示为 \(t\) 滞后的 AR 过程。当 \(t\) 很大时，这应该非常接近预测，就好像误差是可观察的。对于短滞后，这可能会有显著的差异。

在下一个单元格中，我们模拟了一个 MA(1) 过程，并拟合了一个 MA 模型。

[20]:

rho = 0.8
beta = 10
epsilon = eta.copy()
for i in range(1, eta.shape[0]):
    epsilon[i] = rho * eta[i - 1] + eta[i]
y = beta + epsilon
y = y[200:]

ma_res = ARIMA(y, order=(0, 0, 1), trend="c").fit()
print_params(ma_res.summary())

[20]:

	coef	std err	z	P>\|z\|	[0.025	0.975]

const	9.9185	0.025	391.129	0.0	9.869	9.968
ma.L1	0.8025	0.009	93.864	0.0	0.786	0.819
sigma2	0.9904	0.020	49.925	0.0	0.951	1.029

我们首先观察样本开始附近对应的预测 y[1]， …， y[5]。

[21]:

ma_res.predict(1, 5)

[21]:

array([ 8.57011015,  9.19907188,  8.96971353,  9.78987115, 11.11984478])

以及产生“直接”预测所需的相应残差

[22]:

ma_res.resid[:5]

[22]:

array([-2.7621904 , -1.12255005, -1.33557621, -0.17206944,  1.5634041 ])

使用模型参数，我们可以使用 MA(1) 规范产生“直接”预测

\[\hat Y_t = \hat\delta + \hat\rho \hat\epsilon_{t-1}\]

我们看到这些对于初始预测来说并不特别接近实际的模型预测，但是差距迅速减少。

[23]:

delta_hat, rho_hat = ma_res.params[:2]
direct = delta_hat + rho_hat * ma_res.resid[:5]
direct

[23]:

array([ 7.70168405,  9.01756049,  8.84659855,  9.7803589 , 11.17314527])

对于第一个来说，差异几乎是一个标准差，但随着索引的增加而下降。

[24]:

ma_res.predict(1, 5) - direct

[24]:

array([ 0.8684261 ,  0.18151139,  0.12311499,  0.00951225, -0.05330049])

接下来，我们查看样本的结尾和最后三个预测。

[25]:

t = y.shape[0]
ma_res.predict(t - 3, t - 1)

[25]:

array([ 9.79692804, 10.51272714, 10.55855562])

[26]:

ma_res.resid[-4:-1]

[26]:

array([-0.15142355,  0.74049384,  0.79759816])

[27]:

direct = delta_hat + rho_hat * ma_res.resid[-4:-1]
direct

[27]:

array([ 9.79692804, 10.51272714, 10.55855562])

“直接”预测是相同的。这是因为短样本的影响在样本末端消失（实际上，在观测值 100 左右时它可以忽略不计，在观测值 160 左右时它在数值上不存在）。

[28]:

ma_res.predict(t - 3, t - 1) - direct

[28]:

array([0., 0., 0.])

同样的原则适用于包含多个滞后或季节项的更复杂模型——一旦达到有效滞后长度，AR 模型中的预测就变得简单，而包含 MA 成分的模型中的预测只有当 MA 滞后多项式的最大根足够小时才会变得简单，这样残差就接近真实残差。

`SARIMAX` 和 `ARIMA` 中的预测差异¶

用于从 SARIMAX 和 ARIMA 模型进行预测的公式在以下一个关键方面有所不同——ARIMA 将所有趋势项（例如，截距或时间趋势）视为外生回归量的一部分。例如，使用 ARIMA 估计的具有截距和线性时间趋势的 AR(1) 模型具有以下规范

\[\begin{split}\begin{align*} Y_t - \delta_0 - \delta_1 t & = \epsilon_t \\ \epsilon_t & = \rho \epsilon_{t-1} + \eta_t \end{align*}\end{split}\]

当使用 SARIMAX 估计相同的模型时，规范为

\[\begin{split}\begin{align*} Y_t & = \epsilon_t \\ \epsilon_t & = \delta_0 + \delta_1 t + \rho \epsilon_{t-1} + \eta_t \end{align*}\end{split}\]

当模型包含外生回归量 \(X_t\) 时，差异更加明显。 ARIMA 规范为

\[\begin{split}\begin{align*} Y_t - \delta_0 - \delta_1 t - X_t \beta & = \epsilon_t \\ \epsilon_t & = \rho \epsilon_{t-1} + \eta_t \\ & = \rho \left(Y_{t-1} - \delta_0 - \delta_1 (t-1) - X_{t-1} \beta\right) + \eta_t \end{align*}\end{split}\]

而 SARIMAX 规范为

\[\begin{split}\begin{align*} Y_t & = X_t \beta + \epsilon_t \\ \epsilon_t & = \delta_0 + \delta_1 t + \rho \epsilon_{t-1} + \eta_t \\ & = \delta_0 + \delta_1 t + \rho \left(Y_{t-1} - X_{t-1}\beta\right) + \eta_t \end{align*}\end{split}\]

这两者之间的关键区别在于，截距和趋势在 ARIMA 中实际上等效于外生回归，而在 SARIMAX 中则更像是标准 ARMA 项。

下一个单元格使用 ARIMA 中的规范模拟具有时间趋势的 ARX，并使用这两个估计器估计参数。

[29]:

rho = 0.8
beta = 2
delta0 = 10
delta1 = 0.5
epsilon = eta.copy()
for i in range(1, eta.shape[0]):
    epsilon[i] = rho * epsilon[i - 1] + eta[i]
t = np.arange(epsilon.shape[0])
y = delta0 + delta1 * t + beta * full_x + epsilon
y = y[200:]

[30]:

start = np.array([110, delta1, beta, rho, 1])
arx_res = ARIMA(y, exog=x, order=(1, 0, 0), trend="ct").fit()
mod = SARIMAX(y, exog=x, order=(1, 0, 0), trend="ct")
start[:2] *= 1 - rho
sarimax_res = mod.fit(start_params=start, method="bfgs")

         Current function value: 1.413691
         Iterations: 43
         Function evaluations: 72
         Gradient evaluations: 62

/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/scipy/optimize/_optimize.py:1291: OptimizeWarning: Desired error not necessarily achieved due to precision loss.
  res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "

这两个估计器拟合得很好，尽管对数似然函数存在很小的差异。这是一个数值问题，不应该对预测产生实质性影响。重要的是，两个趋势参数 const 和 x1（不幸的是，它们被命名为时间趋势）在两者之间存在差异。其他参数实际上是相同的。

[31]:

print(arx_res.summary())

                               SARIMAX Results
==============================================================================
Dep. Variable:                      y   No. Observations:                 5000
Model:                 ARIMA(1, 0, 0)   Log Likelihood               -7069.171
Date:                Thu, 03 Oct 2024   AIC                          14148.343
Time:                        15:45:16   BIC                          14180.928
Sample:                             0   HQIC                         14159.763
                               - 5000
Covariance Type:                  opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        109.2112      0.137    796.186      0.000     108.942     109.480
x1             0.5000   4.78e-05   1.05e+04      0.000       0.500       0.500
x2             2.0495      0.011    187.517      0.000       2.028       2.071
ar.L1          0.7965      0.009     93.669      0.000       0.780       0.813
sigma2         0.9897      0.020     49.854      0.000       0.951       1.029
===================================================================================
Ljung-Box (L1) (Q):                   0.33   Jarque-Bera (JB):                 0.15
Prob(Q):                              0.57   Prob(JB):                         0.93
Heteroskedasticity (H):               0.97   Skew:                            -0.01
Prob(H) (two-sided):                  0.53   Kurtosis:                         3.00
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

[32]:

print(sarimax_res.summary())

                               SARIMAX Results
==============================================================================
Dep. Variable:                      y   No. Observations:                 5000
Model:               SARIMAX(1, 0, 0)   Log Likelihood               -7068.457
Date:                Thu, 03 Oct 2024   AIC                          14146.914
Time:                        15:45:16   BIC                          14179.500
Sample:                             0   HQIC                         14158.335
                               - 5000
Covariance Type:                  opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept     22.7438      0.929     24.481      0.000      20.923      24.565
drift          0.1019      0.004     23.985      0.000       0.094       0.110
x1             2.0230      0.011    185.290      0.000       2.002       2.044
ar.L1          0.7963      0.008     93.745      0.000       0.780       0.813
sigma2         0.9894      0.020     49.899      0.000       0.951       1.028
===================================================================================
Ljung-Box (L1) (Q):                   0.47   Jarque-Bera (JB):                 0.13
Prob(Q):                              0.49   Prob(JB):                         0.94
Heteroskedasticity (H):               0.97   Skew:                            -0.01
Prob(H) (two-sided):                  0.47   Kurtosis:                         3.00
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

`SARIMAX` 和 `ARIMA` 的初始残差¶

对于最大模型阶数之前的观测值，残差不可靠，不应用于性能评估。通常，在阶数为 \((p,d,q)\times(P,D,Q,s)\) 的 ARIMA 中，行为较差的残差的公式为

\[\max((P+D)s+p+d,Qs+q)\]

我们可以从 ARIMA(1,0,0)(1,0,0,12) 模拟一些数据并检查残差。

[33]:

import numpy as np
import pandas as pd

rho = 0.8
psi = -0.6
beta = 20
epsilon = eta.copy()
for i in range(13, eta.shape[0]):
    epsilon[i] = (
        rho * epsilon[i - 1]
        + psi * epsilon[i - 12]
        - (rho * psi) * epsilon[i - 13]
        + eta[i]
    )
y = beta + epsilon
y = y[200:]

在样本量很大的情况下，参数估计非常接近 DGP 参数。

[34]:

res = ARIMA(y, order=(1, 0, 0), trend="c", seasonal_order=(1, 0, 0, 12)).fit()
print(res.summary())

                                    SARIMAX Results
========================================================================================
Dep. Variable:                                y   No. Observations:                 5000
Model:             ARIMA(1, 0, 0)x(1, 0, 0, 12)   Log Likelihood               -7076.266
Date:                          Thu, 03 Oct 2024   AIC                          14160.532
Time:                                  15:45:19   BIC                          14186.600
Sample:                                       0   HQIC                         14169.668
                                         - 5000
Covariance Type:                            opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         19.8586      0.043    458.609      0.000      19.774      19.943
ar.L1          0.7972      0.008     93.925      0.000       0.781       0.814
ar.S.L12      -0.6044      0.011    -53.280      0.000      -0.627      -0.582
sigma2         0.9914      0.020     49.899      0.000       0.952       1.030
===================================================================================
Ljung-Box (L1) (Q):                   0.50   Jarque-Bera (JB):                 0.11
Prob(Q):                              0.48   Prob(JB):                         0.95
Heteroskedasticity (H):               0.96   Skew:                            -0.01
Prob(H) (two-sided):                  0.40   Kurtosis:                         2.99
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

首先，我们可以通过将初始 13 个残差绘制在模型的实际冲击上进行检查。虽然存在对应关系，但它相当弱，并且相关性远小于 1。

[35]:

import matplotlib.pyplot as plt

plt.rc("figure", figsize=(10, 10))
plt.rc("font", size=14)

_ = plt.scatter(res.resid[:13], eta[200 : 200 + 13])

../../../_images/examples_notebooks_generated_statespace_sarimax_faq_63_0.png

观察接下来的 24 个残差和冲击，我们看到几乎完美的相关性。在样本量很大的情况下，一旦忽略了不太准确的残差，就会出现这种情况。

[36]:

_ = plt.scatter(res.resid[13:37], eta[200 + 13 : 200 + 37])

../../../_images/examples_notebooks_generated_statespace_sarimax_faq_65_0.png

接下来，我们模拟一个 ARIMA(1,1,0)，并包含一个时间趋势。

[37]:

rng = np.random.default_rng(20210819)
eta = rng.standard_normal(5200)
rho = 0.8
beta = 20
epsilon = eta.copy()
for i in range(2, eta.shape[0]):
    epsilon[i] = (1 + rho) * epsilon[i - 1] - rho * epsilon[i - 2] + eta[i]
t = np.arange(epsilon.shape[0])
y = beta + 2 * t + epsilon
y = y[200:]

同样，参数估计非常接近 DGP 参数。

[38]:

res = ARIMA(y, order=(1, 1, 0), trend="t").fit()
print(res.summary())

                               SARIMAX Results
==============================================================================
Dep. Variable:                      y   No. Observations:                 5000
Model:                 ARIMA(1, 1, 0)   Log Likelihood               -7067.739
Date:                Thu, 03 Oct 2024   AIC                          14141.479
Time:                        15:45:20   BIC                          14161.030
Sample:                             0   HQIC                         14148.331
                               - 5000
Covariance Type:                  opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.7747      0.069     25.642      0.000       1.639       1.910
ar.L1          0.7968      0.009     93.658      0.000       0.780       0.813
sigma2         0.9896      0.020     49.908      0.000       0.951       1.028
===================================================================================
Ljung-Box (L1) (Q):                   0.43   Jarque-Bera (JB):                 0.09
Prob(Q):                              0.51   Prob(JB):                         0.96
Heteroskedasticity (H):               0.97   Skew:                            -0.01
Prob(H) (two-sided):                  0.47   Kurtosis:                         2.99
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

残差不准确，第一个残差约为 500。其他残差更接近，尽管在此模型中通常应忽略前两个。

[39]:

res.resid[:5]

[39]:

array([ 5.08403002e+02, -1.58904197e+00, -1.54902446e+00,  1.04992617e-01,
        1.33644383e+00])

第一个残差如此之大的原因是，此值的最佳预测是差值的均值，即 1.77。一旦知道了第一个值，第二个值就会在其预测中使用第一个值，并且预测会更接近真实值。

[40]:

res.predict(0, 5)

[40]:

array([  1.77472562, 511.95355128, 510.87392196, 508.85708934,
       509.03356182, 511.85245439])

值得注意的是，结果类包含两个参数，可以帮助理解哪些残差有问题，即 loglikelihood_burn 和 nobs_diffuse。

[41]:

res.loglikelihood_burn, res.nobs_diffuse

[41]:

(1, 0)

最后更新时间：2024 年 10 月 3 日

SARIMAX 和 ARIMA：常见问题解答 (FAQ)¶

比较 SARIMAX、ARIMA 和 AutoReg 中的趋势和外生变量¶

SARIMAX 中趋势和 exog 之间的差异¶

在 SARIMAX 和 ARIMA 中使用 exog¶

SARIMAX¶

ARIMA¶

exog 在 AutoReg 中¶

AutoReg¶