Calculate the propensity scores and ATT inverse probability weights for participants from internal and external datasets. Only the relevant treatment arms from each dataset should be read in (e.g., only the control arm from each dataset if creating a hybrid control arm).

calc_prop_scr(internal_df, external_df, id_col, model, ...)

Arguments

internal_df

Internal dataset with one row per subject and all the variables needed to run the model

external_df

External dataset with one row per subject and all the variables needed to run the model

id_col

Name of the column in both datasets used to identify each subject. It must be the same across datasets

model

Model used to calculate propensity scores

...

Optional arguments

Value

prop_scr_obj object, with the internal and the external data and the propensity score and inverse probability weight calculated for each subject.

Details

For the subset of participants in both the external and internal studies for which we want to balance the covariate distributions (e.g., external control and internal control participants if constructing a hybrid control arm), we define a study-inclusion propensity score for each participant as

$$e(x_i) = P(S_i = 1 \mid x_i),$$

where \(x_i\) denotes a vector of baseline covariates for the \(i\)th participant and \(S_i\) denotes the indicator that the participant is enrolled in the internal trial (\(S_i = 1\) if internal, \(S_i = 0\) if external). The estimated propensity score \(\hat{e}(x_i)\) is obtained using logistic regression.

An ATT inverse probability weight is calculated for each individual as

$$\hat{a}_{0i} = \frac{\hat{e}(x_i)}{\hat{P}(S_i = s_i | x_i)} = s_i + (1 - s_i ) \frac{\hat{e}(x_i)}{1 - \hat{e}(x_i)}.$$

In a weighted estimator, data from participants in the external study are given a weight of \(\hat{e}(x_i)⁄(1 - \hat{e}(x_i))\) whereas data from participants in the internal trial are given a weight of 1.

Examples

# This can be used for both continuous and binary data
library(dplyr)
# Continuous
calc_prop_scr(internal_df = filter(int_norm_df, trt == 0),
                       external_df = ex_norm_df,
                       id_col = subjid,
                       model = ~ cov1 + cov2 + cov3 + cov4)
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#>  cov1 + cov2 + cov3 + cov4
#> 
#> ── Propensity Scores and Weights ───────────────────────────────────────────────
#> # A tibble: 150 × 4
#>    subjid Internal `Propensity Score` `Inverse Probability Weight`
#>     <int> <lgl>                 <dbl>                        <dbl>
#>  1      1 FALSE                0.175                        0.212 
#>  2      2 FALSE                0.219                        0.281 
#>  3      3 FALSE                0.497                        0.990 
#>  4      4 FALSE                0.257                        0.347 
#>  5      5 FALSE                0.257                        0.347 
#>  6      6 FALSE                0.425                        0.740 
#>  7      7 FALSE                0.328                        0.489 
#>  8      8 FALSE                0.0833                       0.0908
#>  9      9 FALSE                0.165                        0.198 
#> 10     10 FALSE                0.196                        0.244 
#> # ℹ 140 more rows
#> 
#> ── Absolute Standardized Mean Difference ───────────────────────────────────────
#> # A tibble: 4 × 3
#>   covariate diff_unadj diff_adj
#>   <chr>          <dbl>    <dbl>
#> 1 cov1          0.670    0.154 
#> 2 cov2          0.229    0.0905
#> 3 cov3          0.0677   0.148 
#> 4 cov4          0.252    0.0413
# Binary
calc_prop_scr(internal_df = filter(int_binary_df, trt == 0),
                       external_df = ex_binary_df,
                       id_col = subjid,
                       model = ~ cov1 + cov2 + cov3 + cov4)
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#>  cov1 + cov2 + cov3 + cov4
#> 
#> ── Propensity Scores and Weights ───────────────────────────────────────────────
#> # A tibble: 150 × 4
#>    subjid Internal `Propensity Score` `Inverse Probability Weight`
#>     <int> <lgl>                 <dbl>                        <dbl>
#>  1      1 FALSE                 0.342                        0.520
#>  2      2 FALSE                 0.312                        0.453
#>  3      3 FALSE                 0.221                        0.284
#>  4      4 FALSE                 0.461                        0.854
#>  5      5 FALSE                 0.531                        1.13 
#>  6      6 FALSE                 0.444                        0.798
#>  7      7 FALSE                 0.424                        0.735
#>  8      8 FALSE                 0.254                        0.340
#>  9      9 FALSE                 0.334                        0.501
#> 10     10 FALSE                 0.242                        0.319
#> # ℹ 140 more rows
#> 
#> ── Absolute Standardized Mean Difference ───────────────────────────────────────
#> # A tibble: 4 × 3
#>   covariate diff_unadj diff_adj
#>   <chr>          <dbl>    <dbl>
#> 1 cov1          0.323  0.0365  
#> 2 cov2          0.192  0.000289
#> 3 cov3          0.0173 0.0132  
#> 4 cov4          0.226  0.00715