回归诊断¶
此示例文件展示如何在真实场景中使用一些 statsmodels
回归诊断检验。您可以在 回归诊断页面 上了解有关更多检验的信息以及有关这些检验的更多信息。
请注意,这里描述的大多数检验只返回一个数字元组,没有任何注释。完整的输出描述始终包含在文档字符串中,以及在线 statsmodels
文档中。出于演示目的,我们在下面的示例中使用 zip(name,test)
结构来美化打印简短描述。
估计回归模型¶
[1]:
%matplotlib inline
[2]:
from statsmodels.compat import lzip
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
# Load data
url = "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/HistData/Guerry.csv"
dat = pd.read_csv(url)
# Fit regression model (using the natural log of one of the regressors)
results = smf.ols("Lottery ~ Literacy + np.log(Pop1831)", data=dat).fit()
# Inspect the results
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Lottery R-squared: 0.348
Model: OLS Adj. R-squared: 0.333
Method: Least Squares F-statistic: 22.20
Date: Thu, 03 Oct 2024 Prob (F-statistic): 1.90e-08
Time: 16:05:44 Log-Likelihood: -379.82
No. Observations: 86 AIC: 765.6
Df Residuals: 83 BIC: 773.0
Df Model: 2
Covariance Type: nonrobust
===================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------
Intercept 246.4341 35.233 6.995 0.000 176.358 316.510
Literacy -0.4889 0.128 -3.832 0.000 -0.743 -0.235
np.log(Pop1831) -31.3114 5.977 -5.239 0.000 -43.199 -19.424
==============================================================================
Omnibus: 3.713 Durbin-Watson: 2.019
Prob(Omnibus): 0.156 Jarque-Bera (JB): 3.394
Skew: -0.487 Prob(JB): 0.183
Kurtosis: 3.003 Cond. No. 702.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
残差的正态性¶
Jarque-Bera 检验
[3]:
name = ["Jarque-Bera", "Chi^2 two-tail prob.", "Skew", "Kurtosis"]
test = sms.jarque_bera(results.resid)
lzip(name, test)
[3]:
[('Jarque-Bera', np.float64(3.39360802484318)),
('Chi^2 two-tail prob.', np.float64(0.18326831231663254)),
('Skew', np.float64(-0.4865803431122347)),
('Kurtosis', np.float64(3.003417757881634))]
Omni 检验
[4]:
name = ["Chi^2", "Two-tail probability"]
test = sms.omni_normtest(results.resid)
lzip(name, test)
[4]:
[('Chi^2', np.float64(3.7134378115971933)),
('Two-tail probability', np.float64(0.15618424580304735))]
影响检验¶
创建后,OLSInfluence
类的对象将保存属性和方法,允许用户评估每个观测的影响。例如,我们可以计算并提取 DFbetas 的前几行
[5]:
from statsmodels.stats.outliers_influence import OLSInfluence
test_class = OLSInfluence(results)
test_class.dfbetas[:5, :]
[5]:
array([[-0.00301154, 0.00290872, 0.00118179],
[-0.06425662, 0.04043093, 0.06281609],
[ 0.01554894, -0.03556038, -0.00905336],
[ 0.17899858, 0.04098207, -0.18062352],
[ 0.29679073, 0.21249207, -0.3213655 ]])
通过键入 dir(influence_test)
探索其他选项
杠杆的相关信息也可以绘制出来
[6]:
from statsmodels.graphics.regressionplots import plot_leverage_resid2
fig, ax = plt.subplots(figsize=(8, 6))
fig = plot_leverage_resid2(results, ax=ax)
可以在 图形页面 上找到其他绘图选项。
多重共线性¶
条件数
[7]:
np.linalg.cond(results.model.exog)
[7]:
np.float64(702.1792145490066)
异方差检验¶
Breush-Pagan 检验
[8]:
name = ["Lagrange multiplier statistic", "p-value", "f-value", "f p-value"]
test = sms.het_breuschpagan(results.resid, results.model.exog)
lzip(name, test)
[8]:
[('Lagrange multiplier statistic', np.float64(4.893213374094005)),
('p-value', np.float64(0.08658690502352002)),
('f-value', np.float64(2.5037159462564618)),
('f p-value', np.float64(0.08794028782672814))]
Goldfeld-Quandt 检验
[9]:
name = ["F statistic", "p-value"]
test = sms.het_goldfeldquandt(results.resid, results.model.exog)
lzip(name, test)
[9]:
[('F statistic', np.float64(1.1002422436378143)),
('p-value', np.float64(0.38202950686925324))]
线性¶
Harvey-Collier 乘数检验,针对原假设线性规范是正确的
[10]:
name = ["t value", "p value"]
test = sms.linear_harvey_collier(results)
lzip(name, test)
[10]:
[('t value', np.float64(-1.0796490077759802)),
('p value', np.float64(0.2834639247569222))]
最后更新:2024 年 10 月 3 日