Consider a 2-arm RCT (1:1 allocation) with a continuous outcome, total sample size 200 and seven baseline covariates. Consider a superpopulation set-up. The estimand is the (super) population average treatment effect \(\theta = E(Y(1) - Y(0))\). Estimation is performed via a linear regression working model
This estimated variance is 15% smaller than the usual ANCOVA variance estimate
var_robin / var_model
[1] 0.8546666
Same model, same data, 15% smaller estimated variance.
So what?
The methods included in {RobinCar2} are powerful and useful. I’m an advocate for using more covariate adjustment in the primary analysis of RCTs. I’m especially excited about the methods for covariate adjustment involving time-to-event outcomes. They need to be used in the appropriate settings, however.
It’s easy to say that’s when \(p\) is not too large, and \(n\) is not too small. I chose this example to be somewhere on the boundary of what I would instinctively consider reasonable but can still lead to a dramatic difference between the model-based and influence function approaches.
I’m slowly building an understanding of what’s driving this difference. It’s a combination of several factors: conditional vs unconditional inference, variance inflation factors (see Senn et al.,2024), and a degrees of freedom correction. In large(ish) sample sizes each of these factors might not seem to make a huge difference in isolation, but together it could add up to a big difference.