In this example, we are going to walk through how to use Bayesian dynamic borrowing (BDB) with the inclusion of inverse probability weighting to balance baseline covariate distributions between external and internal datasets. This particular example considers a hypothetical trial with a cross sectional normal endpoint and a known standard deviation (SD) in each treatment arm (external control arm and both internal arms), and our objective is to use BDB with IPWs to construct a posterior distribution for the control mean θC\theta_C. We will use simulated internal and external datasets from the package where each dataset has a normally distributed response variable and four baseline covariates which we will balance.

The external control dataset has a sample size of 150 participants, and the distributions of the four covariates are as follows: - Covariate 1: normal with a mean and standard deviation of approximately 50 and 10, respectively - Covariate 2: binary (0 vs. 1) with approximately 20% of participants with level 1 - Covariate 3: binary (0 vs. 1) with approximately 60% of participants with level 1 - Covariate 4: binary (0 vs. 1) with approximately 30% of participants with level 1

The internal dataset has 120 participants with 60 participants in each of the control arm and the active treatment arms. The covariate distributions of each arm are as follows: - Covariate 1: normal with a mean and standard deviation of approximately 55 and 8, respectively - Covariate 2: binary (0 vs. 1) with approximately 30% of participants with level 1 - Covariate 3: binary (0 vs. 1) with approximately 50% of participants with level 1 - Covariate 4: binary (0 vs. 1) with approximately 30% of participants with level 1

We assume the standard deviations of both the external and internal response data are known and equal to 0.15.

library(tibble)
library(distributional)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
set.seed(1234)

summary(int_norm_df)
#>      subjid            cov1            cov2             cov3       
#>  Min.   :  1.00   Min.   :33.00   Min.   :0.0000   Min.   :0.0000  
#>  1st Qu.: 30.75   1st Qu.:49.00   1st Qu.:0.0000   1st Qu.:0.0000  
#>  Median : 60.50   Median :55.00   Median :0.0000   Median :1.0000  
#>  Mean   : 60.50   Mean   :54.39   Mean   :0.2417   Mean   :0.5333  
#>  3rd Qu.: 90.25   3rd Qu.:60.00   3rd Qu.:0.0000   3rd Qu.:1.0000  
#>  Max.   :120.00   Max.   :74.00   Max.   :1.0000   Max.   :1.0000  
#>       cov4          trt            y           
#>  Min.   :0.0   Min.   :0.0   Min.   :-0.09789  
#>  1st Qu.:0.0   1st Qu.:0.0   1st Qu.: 0.48495  
#>  Median :0.0   Median :0.5   Median : 0.69407  
#>  Mean   :0.3   Mean   :0.5   Mean   : 0.68539  
#>  3rd Qu.:1.0   3rd Qu.:1.0   3rd Qu.: 0.89461  
#>  Max.   :1.0   Max.   :1.0   Max.   : 1.34632
summary(ex_norm_df)
#>      subjid            cov1            cov2             cov3    
#>  Min.   :  1.00   Min.   :27.00   Min.   :0.0000   Min.   :0.0  
#>  1st Qu.: 38.25   1st Qu.:44.00   1st Qu.:0.0000   1st Qu.:0.0  
#>  Median : 75.50   Median :49.00   Median :0.0000   Median :1.0  
#>  Mean   : 75.50   Mean   :49.75   Mean   :0.1867   Mean   :0.6  
#>  3rd Qu.:112.75   3rd Qu.:56.00   3rd Qu.:0.0000   3rd Qu.:1.0  
#>  Max.   :150.00   Max.   :72.00   Max.   :1.0000   Max.   :1.0  
#>       cov4              y          
#>  Min.   :0.0000   Min.   :-0.2337  
#>  1st Qu.:0.0000   1st Qu.: 0.3187  
#>  Median :0.0000   Median : 0.5490  
#>  Mean   :0.3467   Mean   : 0.5460  
#>  3rd Qu.:1.0000   3rd Qu.: 0.7580  
#>  Max.   :1.0000   Max.   : 1.3146
sd_external_control <- 0.15
sd_internal_control <- 0.15
n_external <- nrow(ex_norm_df)

With the covariate data from both the external and internal datasets, we can calculate the propensity scores and ATT inverse probability weights (IPWs) for the internal and external controls using the calc_prop_scr function. This creates a propensity score object which we can use for calculating an inverse probability weighted power prior in the next step.

Note: when reading external and internal datasets into calc_prop_scr, be sure to include only the treatment arms across which you want to balance the covariate distributions. In this example, we want to balance the covariate distributions of the external control arm to be similar to those of the internal control arm, so we will exclude the internal active treatment arm data from this function.

ps_model <- ~ cov1 + cov2 + cov3 + cov4
ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), 
                        external_df = ex_norm_df, 
                        id_col = subjid,
                        model = ps_model)

ps_obj
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#>  cov1 + cov2 + cov3 + cov4
#> 
#> ── Propensity Scores and Weights ───────────────────────────────────────────────
#> # A tibble: 150 × 4
#>    subjid Internal `Propensity Score` `Inverse Probability Weight`
#>     <int> <lgl>                 <dbl>                        <dbl>
#>  1      1 FALSE                0.175                        0.212 
#>  2      2 FALSE                0.219                        0.281 
#>  3      3 FALSE                0.497                        0.990 
#>  4      4 FALSE                0.257                        0.347 
#>  5      5 FALSE                0.257                        0.347 
#>  6      6 FALSE                0.425                        0.740 
#>  7      7 FALSE                0.328                        0.489 
#>  8      8 FALSE                0.0833                       0.0908
#>  9      9 FALSE                0.165                        0.198 
#> 10     10 FALSE                0.196                        0.244 
#> # ℹ 140 more rows
#> 
#> ── Absolute Standardized Mean Difference ───────────────────────────────────────
#> # A tibble: 4 × 3
#>   covariate diff_unadj diff_adj
#>   <chr>          <dbl>    <dbl>
#> 1 cov1          0.670    0.154 
#> 2 cov2          0.229    0.0905
#> 3 cov3          0.0677   0.148 
#> 4 cov4          0.252    0.0413

In order to check the suitability of the external data, we can create a variety of diagnostic plots. The first plot we might want is a histogram of the overlapping propensity score distributions from both datasets. To get this, we use the prop_scr_hist function. This function takes in the propensity score object made in the previous step, and we can optionally supply the variable we want to look at (either the propensity score or the IPW). By default, it will plot the propensity scores. Additionally, we can look at the densities rather than histograms by using the prop_scr_dens function. When looking at the IPWs with either the histogram or the density functions, it is important to note that only the IPWs for external control participants will be shown because the ATT IPWs of all internal control participants are equal to 1.

prop_scr_dens(ps_obj, variable = "ipw")

The final plot we might want to look at is a love plot to visualize the absolute standard mean differences (both unadjusted and adjusted by the IPWs) of the covariates between the internal and external data. To do this, we use the prop_scr_love function. Like the previous function, the only required parameter for this function is the propensity score object, but we can also provide a location along the x-axis for a vertical reference line.

prop_scr_love(ps_obj, reference_line = 0.1)

Now that we are happy with our propensity score object, we can use it to calculate a normal inverse probability weighted power prior for θC\theta_C. To calculate the power prior, we need to supply:

  • weighted object, the propensity score we created above

  • response variable, in this case yy

  • initial prior, in the form of a normal distributional object

  • SD of the external control response data, assumed known

The prior and the external control SD are optional. If no prior is provided, an improper uniform prior will be used for the initial prior; i.e., π(θC)1\pi(\theta_C) \propto 1. If no external control SD or initial prior are specified (i.e., both the prior and external_sd arguments set as NULL), then a non-standardized tt power prior will be created (not covered in this vignette). In this example, we define the initial prior to be a vague normal distribution with a mean 50 and SD 10.

Once we have a power prior, we might want to plot it. To do that, we use the plot_dist function.

pwr_prior <- calc_power_prior_norm(ps_obj,
                                   response = y,
                                   prior = dist_normal(50, 10), 
                                   external_sd = sd_external_control)
plot_dist(pwr_prior)

Now that we have a normal power prior, we can calculate the posterior distribution for θC\theta_C using the calc_post_norm function. By defining our prior to be a normal distribution and by assuming the SD of the internal response data to be known, the resulting posterior distribution will also be normal (the case when the SD is unknown is not covered in this vignette).

Note: if reading internal data directly into calc_post_norm instead of a propensity score object, be sure to include only the treatment arm of interest (e.g., the internal control arm if creating a posterior distribution for θC\theta_C).

post <- calc_post_norm(ps_obj,
                       response = y, 
                       prior = pwr_prior,
                       internal_sd = sd_internal_control)
plot_dist(post)

If we want to robustify our power prior for θC\theta_C, we can add a vague component to the power prior distribution we previously created to construct a robust mixture prior which we can then pass to calc_post_norm. In general, we can define our prior to be a mixture distribution with an arbitrary number of normal and/or tt components, in which case the resulting posterior distribution will also be a mixture of normal components. If any component of the prior is a tt distribution, the prior will be approximated with the mixture of two normal distributions.

mixed_prior <- robustify_norm(pwr_prior, n_external)

post_mixed <- calc_post_norm(ps_obj,
                             response = y, 
                             prior= mixed_prior,
                             internal_sd = sd_internal_control)
plot_dist("Control Posterior" = post, 
          "Mixed Posterior" = post_mixed)

Lastly, we can also create a posterior distribution for the mean of the active treatment arm by reading the internal data directly into the calc_post_norm function and assuming the SD of the internal active treatment arm to be equal to 0.15. In this case, we assume a vague normal prior with mean 50 and SD 10.

As noted earlier, be sure to read in only the data for the internal active treatment arm while excluding the internal control data.

sd_internal_treated <- 0.15
post_treated <- calc_post_norm(internal_data = filter(int_norm_df, trt == 1),
                               response = y,
                               prior = dist_normal(50, 10),
                               internal_sd = sd_internal_treated)

plot_dist("Control Posterior" = post, 
          "Mixed Posterior" = post_mixed,
          "Treatment Posterior" = post_treated)