分布式估计¶

此笔记本演示了几个示例，说明如何使用 distributed_estimation。我们导入 DistributedModel 类并创建 exog 和 endog 生成器。

[1]:

import numpy as np
from scipy.stats.distributions import norm
from statsmodels.base.distributed_estimation import DistributedModel


def _exog_gen(exog, partitions):
    """partitions exog data"""

    n_exog = exog.shape[0]
    n_part = np.ceil(n_exog / partitions)

    ii = 0
    while ii < n_exog:
        jj = int(min(ii + n_part, n_exog))
        yield exog[ii:jj, :]
        ii += int(n_part)


def _endog_gen(endog, partitions):
    """partitions endog data"""

    n_endog = endog.shape[0]
    n_part = np.ceil(n_endog / partitions)

    ii = 0
    while ii < n_endog:
        jj = int(min(ii + n_part, n_endog))
        yield endog[ii:jj]
        ii += int(n_part)

接下来，我们生成一些随机数据作为示例。

[2]:

X = np.random.normal(size=(1000, 25))
beta = np.random.normal(size=25)
beta *= np.random.randint(0, 2, size=25)
y = norm.rvs(loc=X.dot(beta))
m = 5

这是最基本的拟合，显示所有默认值，即使用 OLS 作为模型类，以及去偏过程。

[3]:

debiased_OLS_mod = DistributedModel(m)
debiased_OLS_fit = debiased_OLS_mod.fit(
    zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={"alpha": 0.2}
)

然后，我们通过一个稍微复杂的示例，该示例使用 GLM 模型类。

[4]:

from statsmodels.genmod.generalized_linear_model import GLM
from statsmodels.genmod.families import Gaussian

debiased_GLM_mod = DistributedModel(
    m, model_class=GLM, init_kwds={"family": Gaussian()}
)
debiased_GLM_fit = debiased_GLM_mod.fit(
    zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={"alpha": 0.2}
)

我们还可以更改 estimation_method 和 join_method。以下示例展示了此方法在标准 OLS 情况下的工作原理。这里我们使用朴素平均方法而不是去偏过程。

[5]:

from statsmodels.base.distributed_estimation import _est_regularized_naive, _join_naive


naive_OLS_reg_mod = DistributedModel(
    m, estimation_method=_est_regularized_naive, join_method=_join_naive
)
naive_OLS_reg_params = naive_OLS_reg_mod.fit(
    zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={"alpha": 0.2}
)

最后，我们还可以更改使用的 results_class。以下示例展示了此方法在使用未正则化模型和朴素平均的简单情况下的工作原理。

[6]:

from statsmodels.base.distributed_estimation import (
    _est_unregularized_naive,
    DistributedResults,
)


naive_OLS_unreg_mod = DistributedModel(
    m,
    estimation_method=_est_unregularized_naive,
    join_method=_join_naive,
    results_class=DistributedResults,
)
naive_OLS_unreg_params = naive_OLS_unreg_mod.fit(
    zip(_endog_gen(y, m), _exog_gen(X, m)), fit_kwds={"alpha": 0.2}
)

上次更新：2024 年 10 月 3 日