Introduction to multibias

Multibias makes it easy to simultaneously adjust for multiple biases in causal inference research. This document walks you through the key steps to performing the analysis.

1. Identify the biases in your data

All epidemiological biases can be generalized into four main groups:

Uncontrolled confounding (uc)
Exposure misclassificaiton (em)
Outcome misclassificaiton (om)
Selection bias (sel)

Multibias is capable of handling bias adjustment for most of the combinations of the above four biases. Choose the adjust() function corresponding to the biases of interest using the appropriate prefix. For example, adjust_uc_sel() adjusts for uncontrolled confounding and selection bias.

For purposes of demonstration, multibias includes datasets with different bias combinations. For a given bias or biases, it includes the biased data (e.g., df_uc_sel: missing a confounder and not including the un-selected subjects) and the source used to derive the biased data (e.g., df_uc_sel_source: including data on the missing confounder and subjects who were not selected).

Each dataset has variables defined similarly:

X = binary exposure
Y = binary outcome
C1-C3 = binary confounders

2. Create data_observed

Each adjust() function in multibias requires inputting your observed, biased data as a data_observed object. Here you specify the dataframe and identify the key variables in the data.

df_observed <- data_observed(
  df_uc_sel,
  exposure = "X",
  outcome = "Y",
  confounders = c("C1", "C2", "C3")
)

3. Obtain source for bias adjustment

Next, you need to have some information that can be used to quantify the assumed bias or biases. There are two options here.

Option 1: Bias parameters

One option is to list the assumed parameters for the corresponding bias equations. The necessary bias equations are provided in the documentation of each adjust() function. These equations can be derived using the methods outlined in the article here. In our example, we have bias parameters to predict the missing binary confounder U (as the vector u_coefs) and study selection indicator S (as the vector s_coefs).

u_coefs <- c(-0.19, 0.61, 0.72, -0.09, 0.10, -0.15)
s_coefs <- c(-0.01, 0.92, 0.94)

Option 2: Validation data

A second option is to specify a validation data source as a data_validation object. In order to adjust for a given bias, the validation data must have the corresonding missing data. In the example here, the validation data has additional columns for the missing binary confounder U and indicator S for whether a given individual was selected into the study.

df_validation <- data_validation(
  df_uc_sel_source,
  true_exposure = "X",
  true_outcome = "Y",
  confounders = c("C1", "C2", "C3", "U"),
  selection = "S"
)

4. Run the bias adjustment

Finally, we can run the multi-bias adjustment!

Single run

adjust_uc_sel(
  data_observed = df_observed,
  u_model_coefs = u_coefs,
  s_model_coefs = s_coefs
)
#> $estimate
#> [1] 2.001141
#> 
#> $ci
#> [1] 1.948436 2.055271

adjust_uc_sel(
  data_observed = df_observed,
  data_validation = df_validation
)
#> $estimate
#> [1] 2.009484
#> 
#> $ci
#> [1] 1.956566 2.063833

We observe that the bias-adjusted odds ratio of the effect of X on Y is approximately 2. This effect can be compared to the effect observed in the biased data to understand whether systematic error is bringing the observed effect closer to the null or further from the null.

Bootstrapping

Multibias performs bias adjustment via a combination of imputation and/or regression weighting. When imputation is involved, there will be some inherent randomness in the sampling. A single run, therefore, will not produce an exactly consistent result unless a seed is specified. Bootstrapping is thus recommended to quantify the random error. Computational performance can be improved via parallelization.

n <- nrow(df_uc_sel)
est <- vector()
nreps <- 100

for (i in 1:nreps) {
  df_bootstrap <- df_uc_sel[sample(seq_len(n), n, replace = TRUE), ]
  df_observed <- data_observed(
    df_bootstrap,
    exposure = "X",
    outcome = "Y",
    confounders = c("C1", "C2", "C3")
  )
  results <- adjust_uc_sel(
    df_observed,
    df_validation
  )
  est[i] <- results$estimate
}

# odds ratio estimate
round(median(est), 2)
#> [1] 1.99

# confidence interval
round(quantile(est, c(.025, .975)), 2)
#>  2.5% 97.5% 
#>  1.94  2.04

When adjusting via bias parameters, the user can alternatively provide each bias parameter value as a probability distribution. The resulting confidence interval from bootstrapping can then quantify uncertainty in both the random error and the systematic error. Check out the vignette article “Multibias Validation” for a demonstration of this approach.