离散选择模型概述¶
[1]:
import numpy as np
import statsmodels.api as sm
数据¶
从 Spector 和 Mazzeo (1980) 加载数据。示例遵循格林的计量经济学分析第 21 章(第 5 版)。
[2]:
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)
检查数据
[3]:
print(spector_data.exog.head())
print(spector_data.endog.head())
GPA TUCE PSI const
0 2.66 20.0 0.0 1.0
1 2.89 22.0 0.0 1.0
2 3.28 24.0 0.0 1.0
3 2.92 12.0 0.0 1.0
4 4.00 21.0 0.0 1.0
0 0.0
1 0.0
2 0.0
3 0.0
4 1.0
Name: GRADE, dtype: float64
线性概率模型 (OLS)¶
[4]:
lpm_mod = sm.OLS(spector_data.endog, spector_data.exog)
lpm_res = lpm_mod.fit()
print("Parameters: ", lpm_res.params[:-1])
Parameters: GPA 0.463852
TUCE 0.010495
PSI 0.378555
dtype: float64
Logit 模型¶
[5]:
logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
logit_res = logit_mod.fit(disp=0)
print("Parameters: ", logit_res.params)
Parameters: GPA 2.826113
TUCE 0.095158
PSI 2.378688
const -13.021347
dtype: float64
边际效应
[6]:
margeff = logit_res.get_margeff()
print(margeff.summary())
Logit Marginal Effects
=====================================
Dep. Variable: GRADE
Method: dydx
At: overall
==============================================================================
dy/dx std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
GPA 0.3626 0.109 3.313 0.001 0.148 0.577
TUCE 0.0122 0.018 0.686 0.493 -0.023 0.047
PSI 0.3052 0.092 3.304 0.001 0.124 0.486
==============================================================================
与下面介绍的所有离散数据模型一样,我们可以打印一个不错的结果摘要
[7]:
print(logit_res.summary())
Logit Regression Results
==============================================================================
Dep. Variable: GRADE No. Observations: 32
Model: Logit Df Residuals: 28
Method: MLE Df Model: 3
Date: Thu, 03 Oct 2024 Pseudo R-squ.: 0.3740
Time: 15:45:00 Log-Likelihood: -12.890
converged: True LL-Null: -20.592
Covariance Type: nonrobust LLR p-value: 0.001502
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
GPA 2.8261 1.263 2.238 0.025 0.351 5.301
TUCE 0.0952 0.142 0.672 0.501 -0.182 0.373
PSI 2.3787 1.065 2.234 0.025 0.292 4.465
const -13.0213 4.931 -2.641 0.008 -22.687 -3.356
==============================================================================
Probit 模型¶
[8]:
probit_mod = sm.Probit(spector_data.endog, spector_data.exog)
probit_res = probit_mod.fit()
probit_margeff = probit_res.get_margeff()
print("Parameters: ", probit_res.params)
print("Marginal effects: ")
print(probit_margeff.summary())
Optimization terminated successfully.
Current function value: 0.400588
Iterations 6
Parameters: GPA 1.625810
TUCE 0.051729
PSI 1.426332
const -7.452320
dtype: float64
Marginal effects:
Probit Marginal Effects
=====================================
Dep. Variable: GRADE
Method: dydx
At: overall
==============================================================================
dy/dx std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
GPA 0.3608 0.113 3.182 0.001 0.139 0.583
TUCE 0.0115 0.018 0.624 0.533 -0.025 0.048
PSI 0.3165 0.090 3.508 0.000 0.140 0.493
==============================================================================
多元 Logit¶
从美国国家选举研究中加载数据
[9]:
anes_data = sm.datasets.anes96.load()
anes_exog = anes_data.exog
anes_exog = sm.add_constant(anes_exog)
检查数据
[10]:
print(anes_data.exog.head())
print(anes_data.endog.head())
logpopul selfLR age educ income
0 -2.302585 7.0 36.0 3.0 1.0
1 5.247550 3.0 20.0 4.0 1.0
2 3.437208 2.0 24.0 6.0 1.0
3 4.420045 3.0 28.0 6.0 1.0
4 6.461624 5.0 68.0 6.0 1.0
0 6.0
1 1.0
2 1.0
3 1.0
4 0.0
Name: PID, dtype: float64
拟合 MNL 模型
[11]:
mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
mlogit_res = mlogit_mod.fit()
print(mlogit_res.params)
Optimization terminated successfully.
Current function value: 1.548647
Iterations 7
0 1 2 3 4 5
const -0.373402 -2.250913 -3.665584 -7.613843 -7.060478 -12.105751
logpopul -0.011536 -0.088751 -0.105967 -0.091557 -0.093285 -0.140881
selfLR 0.297714 0.391669 0.573451 1.278772 1.346962 2.070080
age -0.024945 -0.022898 -0.014851 -0.008681 -0.017904 -0.009433
educ 0.082491 0.181043 -0.007152 0.199828 0.216939 0.321926
income 0.005197 0.047874 0.057575 0.084498 0.080958 0.108894
泊松¶
加载 Rand 数据。请注意,此示例类似于 Cameron 和 Trivedi 的 Microeconometrics
表 20.5,但由于数据中的细微变化,略有不同。
[12]:
rand_data = sm.datasets.randhie.load()
rand_exog = rand_data.exog
rand_exog = sm.add_constant(rand_exog, prepend=False)
拟合泊松模型
[13]:
poisson_mod = sm.Poisson(rand_data.endog, rand_exog)
poisson_res = poisson_mod.fit(method="newton")
print(poisson_res.summary())
Optimization terminated successfully.
Current function value: 3.091609
Iterations 6
Poisson Regression Results
==============================================================================
Dep. Variable: mdvis No. Observations: 20190
Model: Poisson Df Residuals: 20180
Method: MLE Df Model: 9
Date: Thu, 03 Oct 2024 Pseudo R-squ.: 0.06343
Time: 15:45:01 Log-Likelihood: -62420.
converged: True LL-Null: -66647.
Covariance Type: nonrobust LLR p-value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
lncoins -0.0525 0.003 -18.216 0.000 -0.058 -0.047
idp -0.2471 0.011 -23.272 0.000 -0.268 -0.226
lpi 0.0353 0.002 19.302 0.000 0.032 0.039
fmde -0.0346 0.002 -21.439 0.000 -0.038 -0.031
physlm 0.2717 0.012 22.200 0.000 0.248 0.296
disea 0.0339 0.001 60.098 0.000 0.033 0.035
hlthg -0.0126 0.009 -1.366 0.172 -0.031 0.005
hlthf 0.0541 0.015 3.531 0.000 0.024 0.084
hlthp 0.2061 0.026 7.843 0.000 0.155 0.258
const 0.7004 0.011 62.741 0.000 0.678 0.722
==============================================================================
负二项¶
负二项模型给出略有不同的结果。
[14]:
mod_nbin = sm.NegativeBinomial(rand_data.endog, rand_exog)
res_nbin = mod_nbin.fit(disp=False)
print(res_nbin.summary())
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
NegativeBinomial Regression Results
==============================================================================
Dep. Variable: mdvis No. Observations: 20190
Model: NegativeBinomial Df Residuals: 20180
Method: MLE Df Model: 9
Date: Thu, 03 Oct 2024 Pseudo R-squ.: 0.01845
Time: 15:45:01 Log-Likelihood: -43384.
converged: False LL-Null: -44199.
Covariance Type: nonrobust LLR p-value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
lncoins -0.0579 0.006 -9.515 0.000 -0.070 -0.046
idp -0.2678 0.023 -11.802 0.000 -0.312 -0.223
lpi 0.0412 0.004 9.938 0.000 0.033 0.049
fmde -0.0381 0.003 -11.216 0.000 -0.045 -0.031
physlm 0.2691 0.030 8.985 0.000 0.210 0.328
disea 0.0382 0.001 26.080 0.000 0.035 0.041
hlthg -0.0441 0.020 -2.201 0.028 -0.083 -0.005
hlthf 0.0173 0.036 0.478 0.632 -0.054 0.088
hlthp 0.1782 0.074 2.399 0.016 0.033 0.324
const 0.6635 0.025 26.786 0.000 0.615 0.712
alpha 1.2930 0.019 69.477 0.000 1.256 1.329
==============================================================================
替代求解器¶
拟合离散数据 MLE 模型的默认方法是牛顿-拉夫森。您可以使用其他求解器,方法是使用 method
参数
[15]:
mlogit_res = mlogit_mod.fit(method="bfgs", maxiter=250)
print(mlogit_res.summary())
Optimization terminated successfully.
Current function value: 1.548647
Iterations: 111
Function evaluations: 117
Gradient evaluations: 117
MNLogit Regression Results
==============================================================================
Dep. Variable: PID No. Observations: 944
Model: MNLogit Df Residuals: 908
Method: MLE Df Model: 30
Date: Thu, 03 Oct 2024 Pseudo R-squ.: 0.1648
Time: 15:45:02 Log-Likelihood: -1461.9
converged: True LL-Null: -1750.3
Covariance Type: nonrobust LLR p-value: 1.822e-102
==============================================================================
PID=1 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.3734 0.630 -0.593 0.553 -1.608 0.861
logpopul -0.0115 0.034 -0.337 0.736 -0.079 0.056
selfLR 0.2977 0.094 3.180 0.001 0.114 0.481
age -0.0249 0.007 -3.823 0.000 -0.038 -0.012
educ 0.0825 0.074 1.121 0.262 -0.062 0.227
income 0.0052 0.018 0.295 0.768 -0.029 0.040
------------------------------------------------------------------------------
PID=2 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -2.2509 0.763 -2.949 0.003 -3.747 -0.755
logpopul -0.0888 0.039 -2.266 0.023 -0.166 -0.012
selfLR 0.3917 0.108 3.619 0.000 0.180 0.604
age -0.0229 0.008 -2.893 0.004 -0.038 -0.007
educ 0.1810 0.085 2.123 0.034 0.014 0.348
income 0.0479 0.022 2.149 0.032 0.004 0.092
------------------------------------------------------------------------------
PID=3 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -3.6656 1.157 -3.169 0.002 -5.932 -1.399
logpopul -0.1060 0.057 -1.858 0.063 -0.218 0.006
selfLR 0.5734 0.159 3.617 0.000 0.263 0.884
age -0.0149 0.011 -1.311 0.190 -0.037 0.007
educ -0.0072 0.126 -0.057 0.955 -0.255 0.240
income 0.0576 0.034 1.713 0.087 -0.008 0.123
------------------------------------------------------------------------------
PID=4 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -7.6139 0.958 -7.951 0.000 -9.491 -5.737
logpopul -0.0916 0.044 -2.091 0.037 -0.177 -0.006
selfLR 1.2788 0.129 9.921 0.000 1.026 1.531
age -0.0087 0.008 -1.031 0.302 -0.025 0.008
educ 0.1998 0.094 2.123 0.034 0.015 0.384
income 0.0845 0.026 3.226 0.001 0.033 0.136
------------------------------------------------------------------------------
PID=5 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -7.0605 0.844 -8.362 0.000 -8.715 -5.406
logpopul -0.0933 0.039 -2.371 0.018 -0.170 -0.016
selfLR 1.3470 0.117 11.494 0.000 1.117 1.577
age -0.0179 0.008 -2.352 0.019 -0.033 -0.003
educ 0.2169 0.085 2.552 0.011 0.050 0.384
income 0.0810 0.023 3.524 0.000 0.036 0.126
------------------------------------------------------------------------------
PID=6 coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -12.1058 1.060 -11.421 0.000 -14.183 -10.028
logpopul -0.1409 0.042 -3.343 0.001 -0.223 -0.058
selfLR 2.0701 0.143 14.435 0.000 1.789 2.351
age -0.0094 0.008 -1.160 0.246 -0.025 0.007
educ 0.3219 0.091 3.534 0.000 0.143 0.500
income 0.1089 0.025 4.304 0.000 0.059 0.158
==============================================================================
最后更新:2024 年 10 月 3 日