Calculate the propensity scores and ATT inverse probability weights for participants from internal and external datasets. Only the relevant treatment arms from each dataset should be read in (e.g., only the control arm from each dataset if creating a hybrid control arm).
calc_prop_scr(internal_df, external_df, id_col, model, ...)
Internal dataset with one row per subject and all the variables needed to run the model
External dataset with one row per subject and all the variables needed to run the model
Name of the column in both datasets used to identify each subject. It must be the same across datasets
Model used to calculate propensity scores
Optional arguments
prop_scr_obj
object, with the internal and the external data and
the propensity score and inverse probability weight calculated for each
subject.
For the subset of participants in both the external and internal studies for which we want to balance the covariate distributions (e.g., external control and internal control participants if constructing a hybrid control arm), we define a study-inclusion propensity score for each participant as
$$e(x_i) = P(S_i = 1 \mid x_i),$$
where \(x_i\) denotes a vector of baseline covariates for the \(i\)th participant and \(S_i\) denotes the indicator that the participant is enrolled in the internal trial (\(S_i = 1\) if internal, \(S_i = 0\) if external). The estimated propensity score \(\hat{e}(x_i)\) is obtained using logistic regression.
An ATT inverse probability weight is calculated for each individual as
$$\hat{a}_{0i} = \frac{\hat{e}(x_i)}{\hat{P}(S_i = s_i | x_i)} = s_i + (1 - s_i ) \frac{\hat{e}(x_i)}{1 - \hat{e}(x_i)}.$$
In a weighted estimator, data from participants in the external study are given a weight of \(\hat{e}(x_i)⁄(1 - \hat{e}(x_i))\) whereas data from participants in the internal trial are given a weight of 1.
# This can be used for both continuous and binary data
library(dplyr)
# Continuous
calc_prop_scr(internal_df = filter(int_norm_df, trt == 0),
external_df = ex_norm_df,
id_col = subjid,
model = ~ cov1 + cov2 + cov3 + cov4)
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> • cov1 + cov2 + cov3 + cov4
#>
#> ── Propensity Scores and Weights ───────────────────────────────────────────────
#> # A tibble: 150 × 4
#> subjid Internal `Propensity Score` `Inverse Probability Weight`
#> <int> <lgl> <dbl> <dbl>
#> 1 1 FALSE 0.175 0.212
#> 2 2 FALSE 0.219 0.281
#> 3 3 FALSE 0.497 0.990
#> 4 4 FALSE 0.257 0.347
#> 5 5 FALSE 0.257 0.347
#> 6 6 FALSE 0.425 0.740
#> 7 7 FALSE 0.328 0.489
#> 8 8 FALSE 0.0833 0.0908
#> 9 9 FALSE 0.165 0.198
#> 10 10 FALSE 0.196 0.244
#> # ℹ 140 more rows
#>
#> ── Absolute Standardized Mean Difference ───────────────────────────────────────
#> # A tibble: 4 × 3
#> covariate diff_unadj diff_adj
#> <chr> <dbl> <dbl>
#> 1 cov1 0.670 0.154
#> 2 cov2 0.229 0.0905
#> 3 cov3 0.0677 0.148
#> 4 cov4 0.252 0.0413
# Binary
calc_prop_scr(internal_df = filter(int_binary_df, trt == 0),
external_df = ex_binary_df,
id_col = subjid,
model = ~ cov1 + cov2 + cov3 + cov4)
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> • cov1 + cov2 + cov3 + cov4
#>
#> ── Propensity Scores and Weights ───────────────────────────────────────────────
#> # A tibble: 150 × 4
#> subjid Internal `Propensity Score` `Inverse Probability Weight`
#> <int> <lgl> <dbl> <dbl>
#> 1 1 FALSE 0.342 0.520
#> 2 2 FALSE 0.312 0.453
#> 3 3 FALSE 0.221 0.284
#> 4 4 FALSE 0.461 0.854
#> 5 5 FALSE 0.531 1.13
#> 6 6 FALSE 0.444 0.798
#> 7 7 FALSE 0.424 0.735
#> 8 8 FALSE 0.254 0.340
#> 9 9 FALSE 0.334 0.501
#> 10 10 FALSE 0.242 0.319
#> # ℹ 140 more rows
#>
#> ── Absolute Standardized Mean Difference ───────────────────────────────────────
#> # A tibble: 4 × 3
#> covariate diff_unadj diff_adj
#> <chr> <dbl> <dbl>
#> 1 cov1 0.323 0.0365
#> 2 cov2 0.192 0.000289
#> 3 cov3 0.0173 0.0132
#> 4 cov4 0.226 0.00715