离散选择模型概述

[1]:
import numpy as np
import statsmodels.api as sm

数据

从 Spector 和 Mazzeo (1980) 加载数据。示例遵循格林的计量经济学分析第 21 章(第 5 版)。

[2]:
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)

检查数据

[3]:
print(spector_data.exog.head())
print(spector_data.endog.head())
    GPA  TUCE  PSI  const
0  2.66  20.0  0.0    1.0
1  2.89  22.0  0.0    1.0
2  3.28  24.0  0.0    1.0
3  2.92  12.0  0.0    1.0
4  4.00  21.0  0.0    1.0
0    0.0
1    0.0
2    0.0
3    0.0
4    1.0
Name: GRADE, dtype: float64

线性概率模型 (OLS)

[4]:
lpm_mod = sm.OLS(spector_data.endog, spector_data.exog)
lpm_res = lpm_mod.fit()
print("Parameters: ", lpm_res.params[:-1])
Parameters:  GPA     0.463852
TUCE    0.010495
PSI     0.378555
dtype: float64

Logit 模型

[5]:
logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
logit_res = logit_mod.fit(disp=0)
print("Parameters: ", logit_res.params)
Parameters:  GPA       2.826113
TUCE      0.095158
PSI       2.378688
const   -13.021347
dtype: float64

边际效应

[6]:
margeff = logit_res.get_margeff()
print(margeff.summary())
        Logit Marginal Effects
=====================================
Dep. Variable:                  GRADE
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
GPA            0.3626      0.109      3.313      0.001       0.148       0.577
TUCE           0.0122      0.018      0.686      0.493      -0.023       0.047
PSI            0.3052      0.092      3.304      0.001       0.124       0.486
==============================================================================

与下面介绍的所有离散数据模型一样,我们可以打印一个不错的结果摘要

[7]:
print(logit_res.summary())
                           Logit Regression Results
==============================================================================
Dep. Variable:                  GRADE   No. Observations:                   32
Model:                          Logit   Df Residuals:                       28
Method:                           MLE   Df Model:                            3
Date:                Thu, 03 Oct 2024   Pseudo R-squ.:                  0.3740
Time:                        15:45:00   Log-Likelihood:                -12.890
converged:                       True   LL-Null:                       -20.592
Covariance Type:            nonrobust   LLR p-value:                  0.001502
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
GPA            2.8261      1.263      2.238      0.025       0.351       5.301
TUCE           0.0952      0.142      0.672      0.501      -0.182       0.373
PSI            2.3787      1.065      2.234      0.025       0.292       4.465
const        -13.0213      4.931     -2.641      0.008     -22.687      -3.356
==============================================================================

Probit 模型

[8]:
probit_mod = sm.Probit(spector_data.endog, spector_data.exog)
probit_res = probit_mod.fit()
probit_margeff = probit_res.get_margeff()
print("Parameters: ", probit_res.params)
print("Marginal effects: ")
print(probit_margeff.summary())
Optimization terminated successfully.
         Current function value: 0.400588
         Iterations 6
Parameters:  GPA      1.625810
TUCE     0.051729
PSI      1.426332
const   -7.452320
dtype: float64
Marginal effects:
       Probit Marginal Effects
=====================================
Dep. Variable:                  GRADE
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
GPA            0.3608      0.113      3.182      0.001       0.139       0.583
TUCE           0.0115      0.018      0.624      0.533      -0.025       0.048
PSI            0.3165      0.090      3.508      0.000       0.140       0.493
==============================================================================

多元 Logit

从美国国家选举研究中加载数据

[9]:
anes_data = sm.datasets.anes96.load()
anes_exog = anes_data.exog
anes_exog = sm.add_constant(anes_exog)

检查数据

[10]:
print(anes_data.exog.head())
print(anes_data.endog.head())
   logpopul  selfLR   age  educ  income
0 -2.302585     7.0  36.0   3.0     1.0
1  5.247550     3.0  20.0   4.0     1.0
2  3.437208     2.0  24.0   6.0     1.0
3  4.420045     3.0  28.0   6.0     1.0
4  6.461624     5.0  68.0   6.0     1.0
0    6.0
1    1.0
2    1.0
3    1.0
4    0.0
Name: PID, dtype: float64

拟合 MNL 模型

[11]:
mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
mlogit_res = mlogit_mod.fit()
print(mlogit_res.params)
Optimization terminated successfully.
         Current function value: 1.548647
         Iterations 7
                 0         1         2         3         4          5
const    -0.373402 -2.250913 -3.665584 -7.613843 -7.060478 -12.105751
logpopul -0.011536 -0.088751 -0.105967 -0.091557 -0.093285  -0.140881
selfLR    0.297714  0.391669  0.573451  1.278772  1.346962   2.070080
age      -0.024945 -0.022898 -0.014851 -0.008681 -0.017904  -0.009433
educ      0.082491  0.181043 -0.007152  0.199828  0.216939   0.321926
income    0.005197  0.047874  0.057575  0.084498  0.080958   0.108894

泊松

加载 Rand 数据。请注意,此示例类似于 Cameron 和 Trivedi 的 Microeconometrics 表 20.5,但由于数据中的细微变化,略有不同。

[12]:
rand_data = sm.datasets.randhie.load()
rand_exog = rand_data.exog
rand_exog = sm.add_constant(rand_exog, prepend=False)

拟合泊松模型

[13]:
poisson_mod = sm.Poisson(rand_data.endog, rand_exog)
poisson_res = poisson_mod.fit(method="newton")
print(poisson_res.summary())
Optimization terminated successfully.
         Current function value: 3.091609
         Iterations 6
                          Poisson Regression Results
==============================================================================
Dep. Variable:                  mdvis   No. Observations:                20190
Model:                        Poisson   Df Residuals:                    20180
Method:                           MLE   Df Model:                            9
Date:                Thu, 03 Oct 2024   Pseudo R-squ.:                 0.06343
Time:                        15:45:01   Log-Likelihood:                -62420.
converged:                       True   LL-Null:                       -66647.
Covariance Type:            nonrobust   LLR p-value:                     0.000
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
lncoins       -0.0525      0.003    -18.216      0.000      -0.058      -0.047
idp           -0.2471      0.011    -23.272      0.000      -0.268      -0.226
lpi            0.0353      0.002     19.302      0.000       0.032       0.039
fmde          -0.0346      0.002    -21.439      0.000      -0.038      -0.031
physlm         0.2717      0.012     22.200      0.000       0.248       0.296
disea          0.0339      0.001     60.098      0.000       0.033       0.035
hlthg         -0.0126      0.009     -1.366      0.172      -0.031       0.005
hlthf          0.0541      0.015      3.531      0.000       0.024       0.084
hlthp          0.2061      0.026      7.843      0.000       0.155       0.258
const          0.7004      0.011     62.741      0.000       0.678       0.722
==============================================================================

负二项

负二项模型给出略有不同的结果。

[14]:
mod_nbin = sm.NegativeBinomial(rand_data.endog, rand_exog)
res_nbin = mod_nbin.fit(disp=False)
print(res_nbin.summary())
/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
                     NegativeBinomial Regression Results
==============================================================================
Dep. Variable:                  mdvis   No. Observations:                20190
Model:               NegativeBinomial   Df Residuals:                    20180
Method:                           MLE   Df Model:                            9
Date:                Thu, 03 Oct 2024   Pseudo R-squ.:                 0.01845
Time:                        15:45:01   Log-Likelihood:                -43384.
converged:                      False   LL-Null:                       -44199.
Covariance Type:            nonrobust   LLR p-value:                     0.000
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
lncoins       -0.0579      0.006     -9.515      0.000      -0.070      -0.046
idp           -0.2678      0.023    -11.802      0.000      -0.312      -0.223
lpi            0.0412      0.004      9.938      0.000       0.033       0.049
fmde          -0.0381      0.003    -11.216      0.000      -0.045      -0.031
physlm         0.2691      0.030      8.985      0.000       0.210       0.328
disea          0.0382      0.001     26.080      0.000       0.035       0.041
hlthg         -0.0441      0.020     -2.201      0.028      -0.083      -0.005
hlthf          0.0173      0.036      0.478      0.632      -0.054       0.088
hlthp          0.1782      0.074      2.399      0.016       0.033       0.324
const          0.6635      0.025     26.786      0.000       0.615       0.712
alpha          1.2930      0.019     69.477      0.000       1.256       1.329
==============================================================================

替代求解器

拟合离散数据 MLE 模型的默认方法是牛顿-拉夫森。您可以使用其他求解器,方法是使用 method 参数

[15]:
mlogit_res = mlogit_mod.fit(method="bfgs", maxiter=250)
print(mlogit_res.summary())
Optimization terminated successfully.
         Current function value: 1.548647
         Iterations: 111
         Function evaluations: 117
         Gradient evaluations: 117
                          MNLogit Regression Results
==============================================================================
Dep. Variable:                    PID   No. Observations:                  944
Model:                        MNLogit   Df Residuals:                      908
Method:                           MLE   Df Model:                           30
Date:                Thu, 03 Oct 2024   Pseudo R-squ.:                  0.1648
Time:                        15:45:02   Log-Likelihood:                -1461.9
converged:                       True   LL-Null:                       -1750.3
Covariance Type:            nonrobust   LLR p-value:                1.822e-102
==============================================================================
     PID=1       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.3734      0.630     -0.593      0.553      -1.608       0.861
logpopul      -0.0115      0.034     -0.337      0.736      -0.079       0.056
selfLR         0.2977      0.094      3.180      0.001       0.114       0.481
age           -0.0249      0.007     -3.823      0.000      -0.038      -0.012
educ           0.0825      0.074      1.121      0.262      -0.062       0.227
income         0.0052      0.018      0.295      0.768      -0.029       0.040
------------------------------------------------------------------------------
     PID=2       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -2.2509      0.763     -2.949      0.003      -3.747      -0.755
logpopul      -0.0888      0.039     -2.266      0.023      -0.166      -0.012
selfLR         0.3917      0.108      3.619      0.000       0.180       0.604
age           -0.0229      0.008     -2.893      0.004      -0.038      -0.007
educ           0.1810      0.085      2.123      0.034       0.014       0.348
income         0.0479      0.022      2.149      0.032       0.004       0.092
------------------------------------------------------------------------------
     PID=3       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -3.6656      1.157     -3.169      0.002      -5.932      -1.399
logpopul      -0.1060      0.057     -1.858      0.063      -0.218       0.006
selfLR         0.5734      0.159      3.617      0.000       0.263       0.884
age           -0.0149      0.011     -1.311      0.190      -0.037       0.007
educ          -0.0072      0.126     -0.057      0.955      -0.255       0.240
income         0.0576      0.034      1.713      0.087      -0.008       0.123
------------------------------------------------------------------------------
     PID=4       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -7.6139      0.958     -7.951      0.000      -9.491      -5.737
logpopul      -0.0916      0.044     -2.091      0.037      -0.177      -0.006
selfLR         1.2788      0.129      9.921      0.000       1.026       1.531
age           -0.0087      0.008     -1.031      0.302      -0.025       0.008
educ           0.1998      0.094      2.123      0.034       0.015       0.384
income         0.0845      0.026      3.226      0.001       0.033       0.136
------------------------------------------------------------------------------
     PID=5       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -7.0605      0.844     -8.362      0.000      -8.715      -5.406
logpopul      -0.0933      0.039     -2.371      0.018      -0.170      -0.016
selfLR         1.3470      0.117     11.494      0.000       1.117       1.577
age           -0.0179      0.008     -2.352      0.019      -0.033      -0.003
educ           0.2169      0.085      2.552      0.011       0.050       0.384
income         0.0810      0.023      3.524      0.000       0.036       0.126
------------------------------------------------------------------------------
     PID=6       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        -12.1058      1.060    -11.421      0.000     -14.183     -10.028
logpopul      -0.1409      0.042     -3.343      0.001      -0.223      -0.058
selfLR         2.0701      0.143     14.435      0.000       1.789       2.351
age           -0.0094      0.008     -1.160      0.246      -0.025       0.007
educ           0.3219      0.091      3.534      0.000       0.143       0.500
income         0.1089      0.025      4.304      0.000       0.059       0.158
==============================================================================

最后更新:2024 年 10 月 3 日