Skip to content

Commit 80c731f

Browse files
committed
update treatment assignment logic in DGP functions: update probability computation and overlap adjustments
1 parent 6ba4c30 commit 80c731f

File tree

2 files changed

+62
-11
lines changed

2 files changed

+62
-11
lines changed

doubleml/did/datasets/dgp_did_CS2021.py

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -105,11 +105,35 @@ def make_did_CS2021(n_obs=1000, dgp_type=1, include_never_treated=True, time_typ
105105
106106
6. Treatment assignment:
107107
108-
For non-experimental settings (DGP 1-4), the probability of being in treatment group :math:`g` is:
108+
For non-experimental settings (DGP 1-4), the probability of being in treatment group :math:`g` is computed as follows:
109109
110-
.. math::
110+
- Compute group-specific logits for each observation:
111+
112+
.. math::
113+
114+
\\text{logit}_{i,g} = f_{ps,g}(W_{ps})
115+
116+
The logits are clipped to the range [-2.5, 2.5] for numerical stability.
117+
118+
- Convert logits to uncapped probabilities via softmax:
119+
120+
.. math::
121+
122+
p^{\\text{uncapped}}_{i,g} = \\frac{\\exp(\\text{logit}_{i,g})}{\\sum_{g'} \\exp(\\text{logit}_{i,g'})}
123+
124+
- Clip uncapped probabilities to the range [0.05, 0.95]:
125+
126+
.. math::
127+
128+
p^{\\text{clipped}}_{i,g} = \\min(\\max(p^{\\text{uncapped}}_{i,g}, 0.05), 0.95)
129+
130+
- Renormalize clipped probabilities so they sum to 1 for each observation:
131+
132+
.. math::
133+
134+
p_{i,g} = \\frac{p^{\text{clipped}}_{i,g}}{\\sum_{g'} p^{\\text{clipped}}_{i,g'}}
111135
112-
P(G_i = g) = \\frac{\\exp(f_{ps,g}(W_{ps}))}{\\sum_{g'} \\exp(f_{ps,g'}(W_{ps}))}
136+
- Assign each observation to a treatment group by sampling from the categorical distribution defined by :math:`p_{i,g}`.
113137
114138
For experimental settings (DGP 5-6), each treatment group (including never-treated) has equal probability:
115139
@@ -159,7 +183,7 @@ def make_did_CS2021(n_obs=1000, dgp_type=1, include_never_treated=True, time_typ
159183
`dim_x` (int, default=4):
160184
Dimension of feature vectors.
161185
162-
`xi` (float, default=0.9):
186+
`xi` (float, default=0.5):
163187
Scale parameter for the propensity score function.
164188
165189
`n_periods` (int, default=5):
@@ -188,7 +212,7 @@ def make_did_CS2021(n_obs=1000, dgp_type=1, include_never_treated=True, time_typ
188212

189213
c = kwargs.get("c", 0.0)
190214
dim_x = kwargs.get("dim_x", 4)
191-
xi = kwargs.get("xi", 0.9)
215+
xi = kwargs.get("xi", 0.75)
192216
n_periods = kwargs.get("n_periods", 5)
193217
anticipation_periods = kwargs.get("anticipation_periods", 0)
194218
n_pre_treat_periods = kwargs.get("n_pre_treat_periods", 2)
@@ -228,8 +252,11 @@ def make_did_CS2021(n_obs=1000, dgp_type=1, include_never_treated=True, time_typ
228252
p = np.ones(n_treatment_groups) / n_treatment_groups
229253
d_index = np.random.choice(n_treatment_groups, size=n_obs, p=p)
230254
else:
231-
unnormalized_p = np.exp(_f_ps_groups(features_ps, xi, n_groups=n_treatment_groups))
232-
p = unnormalized_p / unnormalized_p.sum(1, keepdims=True)
255+
logits = np.clip(_f_ps_groups(features_ps, xi, n_groups=n_treatment_groups), a_min=-2.5, a_max=2.5)
256+
unnormalized_p = np.exp(logits)
257+
p_uncapped = unnormalized_p / unnormalized_p.sum(1, keepdims=True)
258+
p_clipped = np.clip(p_uncapped, a_min=0.05, a_max=0.95)
259+
p = p_clipped / p_clipped.sum(1, keepdims=True)
233260
d_index = np.array([np.random.choice(n_treatment_groups, p=p_row) for p_row in p])
234261

235262
# fixed effects (shape (n_obs, n_time_periods))

doubleml/did/datasets/dgp_did_cs_CS2021.py

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -85,11 +85,35 @@ def make_did_cs_CS2021(n_obs=1000, dgp_type=1, include_never_treated=True, lambd
8585
8686
6. Treatment assignment:
8787
88-
For non-experimental settings (DGP 1-4), the probability of being in treatment group :math:`g` is:
88+
For non-experimental settings (DGP 1-4), the probability of being in treatment group :math:`g` is computed as follows:
8989
90-
.. math::
90+
- Compute group-specific logits for each observation:
91+
92+
.. math::
93+
94+
\\text{logit}_{i,g} = f_{ps,g}(W_{ps})
95+
96+
The logits are clipped to the range [-2.5, 2.5] for numerical stability.
97+
98+
- Convert logits to uncapped probabilities via softmax:
99+
100+
.. math::
101+
102+
p^{\\text{uncapped}}_{i,g} = \\frac{\\exp(\\text{logit}_{i,g})}{\\sum_{g'} \\exp(\\text{logit}_{i,g'})}
103+
104+
- Clip uncapped probabilities to the range [0.05, 0.95]:
105+
106+
.. math::
107+
108+
p^{\\text{clipped}}_{i,g} = \\min(\\max(p^{\\text{uncapped}}_{i,g}, 0.05), 0.95)
109+
110+
- Renormalize clipped probabilities so they sum to 1 for each observation:
111+
112+
.. math::
113+
114+
p_{i,g} = \\frac{p^{\text{clipped}}_{i,g}}{\\sum_{g'} p^{\\text{clipped}}_{i,g'}}
91115
92-
P(G_i = g) = \\frac{\\exp(f_{ps,g}(W_{ps}))}{\\sum_{g'} \\exp(f_{ps,g'}(W_{ps}))}
116+
- Assign each observation to a treatment group by sampling from the categorical distribution defined by :math:`p_{i,g}`.
93117
94118
For experimental settings (DGP 5-6), each treatment group (including never-treated) has equal probability:
95119
@@ -148,7 +172,7 @@ def make_did_cs_CS2021(n_obs=1000, dgp_type=1, include_never_treated=True, lambd
148172
`dim_x` (int, default=4):
149173
Dimension of feature vectors.
150174
151-
`xi` (float, default=0.9):
175+
`xi` (float, default=0.5):
152176
Scale parameter for the propensity score function.
153177
154178
`n_periods` (int, default=5):

0 commit comments

Comments
 (0)