Multibias makes it easy to simultaneously adjust for multiple biases in causal inference research. This document walks you through the key steps to performing the analysis.
All epidemiological biases can be generalized into four main groups:
Multibias is capable of handling bias adjustment for most of the
combinations of the above four biases. Choose the adjust()
function corresponding to the biases of interest using the appropriate
prefix. For example, adjust_uc_sel()
adjusts for
uncontrolled confounding and selection bias.
For purposes of demonstration, multibias includes datasets with
different bias combinations. For a given bias or biases, it includes the
biased data (e.g., df_uc_sel
: missing a confounder and not
including the un-selected subjects) and the source used to derive the
biased data (e.g., df_uc_sel_source
: including data on the
missing confounder and subjects who were not selected).
Each dataset has variables defined similarly:
X
= binary exposureY
= binary outcomeC1-C3
= binary confoundersEach adjust()
function in multibias requires inputting
your observed, biased data as a data_observed
object. Here
you specify the dataframe and identify the key variables in the
data.
Next, you need to have some information that can be used to quantify the assumed bias or biases. There are two options here.
One option is to list the assumed parameters for the corresponding
bias equations. The necessary bias equations are provided in the
documentation of each adjust()
function. These equations
can be derived using the methods outlined in the article here. In our example, we
have bias parameters to predict the missing binary confounder
U
(as the vector u_coefs
) and study selection
indicator S
(as the vector s_coefs
).
A second option is to specify a validation data source as a
data_validation
object. In order to adjust for a given
bias, the validation data must have the corresonding missing data. In
the example here, the validation data has additional columns for the
missing binary confounder U
and indicator S
for whether a given individual was selected into the study.
Finally, we can run the multi-bias adjustment!
adjust_uc_sel(
data_observed = df_observed,
u_model_coefs = u_coefs,
s_model_coefs = s_coefs
)
#> $estimate
#> [1] 2.001141
#>
#> $ci
#> [1] 1.948436 2.055271
or
adjust_uc_sel(
data_observed = df_observed,
data_validation = df_validation
)
#> $estimate
#> [1] 2.009484
#>
#> $ci
#> [1] 1.956566 2.063833
We observe that the bias-adjusted odds ratio of the effect of X on Y is approximately 2. This effect can be compared to the effect observed in the biased data to understand whether systematic error is bringing the observed effect closer to the null or further from the null.
Multibias performs bias adjustment via a combination of imputation and/or regression weighting. When imputation is involved, there will be some inherent randomness in the sampling. A single run, therefore, will not produce an exactly consistent result unless a seed is specified. Bootstrapping is thus recommended to quantify the random error. Computational performance can be improved via parallelization.
n <- nrow(df_uc_sel)
est <- vector()
nreps <- 100
for (i in 1:nreps) {
df_bootstrap <- df_uc_sel[sample(seq_len(n), n, replace = TRUE), ]
df_observed <- data_observed(
df_bootstrap,
exposure = "X",
outcome = "Y",
confounders = c("C1", "C2", "C3")
)
results <- adjust_uc_sel(
df_observed,
df_validation
)
est[i] <- results$estimate
}
# odds ratio estimate
round(median(est), 2)
#> [1] 1.99
# confidence interval
round(quantile(est, c(.025, .975)), 2)
#> 2.5% 97.5%
#> 1.94 2.04
When adjusting via bias parameters, the user can alternatively provide each bias parameter value as a probability distribution. The resulting confidence interval from bootstrapping can then quantify uncertainty in both the random error and the systematic error. Check out the vignette article “Multibias Validation” for a demonstration of this approach.