Uni-variate variable selection — analyse

A univariate screening of covariates against some outcome.

Usage

analyse_univariate(
  x,
  response = NULL,
  vars = NULL,
  level = 0.05,
  family = stats::gaussian(),
  method = "Wald"
)

Arguments

x: An output of the bootstrap_data function.
response: a character argument denoting the outcome.
vars: a character vector denoting the covariates or features to be screened.
level: a numeric scalar denoting a significance threshold to be used against each uni-variate test performed. Default is 0.05.
family: the error distribution and link function passed to the glm() call inside the analyse_univariate function. Default is gaussian.
method: a character value of the chosen test to be used. Currently only a Wald Chi-squared test is implemented.

Value

A nested tibble.

Details

A number of analysis wrappers have been created to illustrate the challenge of final variable selection. These can be combined in sequence as desired by the user to consider combinations of selection per bootstrap replication. Although discouraged (Moons et al., 2012), univariate selection is not uncommon, particularly as a way to pre-screen many covariates prior to some multi-variable fit with possible further feature selection thereafter. The present function is a simple wrapper for the aod::wald.test() function, an implementation of the Wald Chi-squared test under an assumption of asymptotic (multivariate) normality.

References

Moons, K. G. M., Kengne, A. P., Woodward, M., Royston, P., Vergouwe, Y., Altman, D. G., & Grobbee, D. E. (2012). Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart, 98(9), 683–690. https://doi.org/10.1136/heartjnl-2011-301246

Examples

data(iswr_stroke)
iswr_stroke %>% 
  bootstrap_data(10, seed = 1234) %>%
  analyse_univariate(response = "dead12",
                     vars = c("Gender", "Age", "Diagnosis", "Coma", 
                              "Diabetes", "MI", "Hypertension"),
                     family = "binomial",         
                     level = 0.05)
#> # A tibble: 10 × 3
#>    boot_rep data               vars     
#>       <int> <list>             <list>   
#>  1        1 <tibble [814 × 8]> <chr [3]>
#>  2        2 <tibble [814 × 8]> <chr [5]>
#>  3        3 <tibble [814 × 8]> <chr [5]>
#>  4        4 <tibble [814 × 8]> <chr [7]>
#>  5        5 <tibble [814 × 8]> <chr [5]>
#>  6        6 <tibble [814 × 8]> <chr [3]>
#>  7        7 <tibble [814 × 8]> <chr [5]>
#>  8        8 <tibble [814 × 8]> <chr [2]>
#>  9        9 <tibble [814 × 8]> <chr [4]>
#> 10       10 <tibble [814 × 8]> <chr [4]>