What's the difference between fixed, random, independent, type I, type II, (non)prognostic, (non)ignorable, and (non)informative censoring?
I already feel exhausted after writing the title. I’m not really going to explain all these differences, sorry. I’m not capable. And the terms are not used in a consistent way anyway. Two references that I have found helpful are
- Lecture https://myweb.uiowa.edu/pbreheny/7210/f15/notes/9-3.pdf by Patrick Breheny
- Paper https://doi.org/10.2307/2529941 by Stephen Lagakos
The crucial question when thinking about censoring mechanisms is whether or not it is valid to use the same likelihood function as you would have done had the censoring times been fixed in advance. If not the case, this would invalidate the use of standard survival analysis tecniques (KM, Cox, standard parametric models,…).
Thinking directly about likelihood functions may be a bit too abstract. For me, in the context of clinical trials, the criterion I find most helpful is:
“A censored observation at time \(t\) provides only the information that the true survival time exceeds \(t\)…the event of censoring provides no prognostic information about subsequent survival time.”
That was a quote from the Lagakos paper. He refers to this condition as “non-prognostic censoring.”
If this condition holds then it is valid to use the same likelihood as if the censoring times had been fixed. (One could easily think of situations where it might not hold e.g. dropout due to an adverse event, etc.)
This condition is also a bit more general than what he defines as “independent censoring”.
“Independent” (or possibly “random” censoring)
Consider a trial where everyone has a “true” survival time \(\tilde{T}_i\) (which is a random variable), everyone has a censoring time \(C_i\) (which is a random variable), and all of these random variables are all independent of each other. Furthermore, the distribution of \(\tilde{T}_i\) is assumed to be “functionally independent” to the distribution of \(C\) (I think this basically means that the distributions do not share any parameters). What we observe is \(T_i = \min(\tilde{T}_i, C_i)\) and \(\delta_i = I\{\tilde{T}_i < C_i\}\).
This is how Lagakos defines “independent censoring”, although he does say “also referred to as random censorship”. In his lecture notes, Breheny calls this a “random censorship model”.
An example would be a clinical trial where patients enter the study at a time point that is random, and the study ends at a pre-fixed calendar time. This particular case is sometimes referred to as type I censoring.
This situation of random/independent censoring can also be extended such that the random variables are only independent of each other conditional on the value of some covariate \(x\). An example would be if patients recruited earlier in the trial tended to have a better/worse prognosis than patients recruited later in the trial. In this case, there is a covariate (time of recruitment) that could be used to account for this.
“Non-prognostic” (or possibly “independent” censoring)
Consider an event-driven clinical trial that stops after a pre-specified number of events has been observed. This particular case is sometimes called type II censoring. Strictly speaking, \(\tilde{T}_i\) and \(C_i\) are not independent random variables, so this doesn’t fall into the category above. However, it may still be reasonable to assume that the censoring provides no prognostic information about subsequent survival time. Hence Lagakos’ use of “non-prognostic” to capture this important set of situations.
Sometimes, however, the term “independent censoring mechanism” is used in a more general sense to cover any situation where it is valid to use the same likelihood as if the censoring times had been fixed. This is how Breheny uses the term in his lecture notes. “Ignorable” is another word that might be used for this.
Apparently, the set of situations where it is valid to use the same likelihood function as if the censoring times had been fixed is broader than just the situations satisfying the “non-prognostic” condition (which is already broader than “independent/random” censoring in the sense of the previous section). Lagakos provides a technical condition that captures this broader set of situations, and he calls censoring models that satisfy the condition “non-informative censoring models”. Unfortunately, I don’t understand this condition. But he himself says it “is the least easily interpreted condition […] in terms of the physical process, yet it is the most general one mathematically.” I take it from this that I don’t need to worry about it.
Another perspective on “(non)informative” censoring
Interestingly, in his lecture notes, Breheny uses informative/non-informative in a slightly different (I think) sense, by considering time-to-event and time-to-censoring distributions that have a shared parameter. Then, censoring would be “informative” about the parameter of interest even for the “non-prognostic” censoring situation above. In this case, it would still be “valid” (though inefficient) to use the same likelihood as if the censoring times had been fixed (see also this stack exchange post). However, I’ve gone past the limits of my knowledge here. As I can’t really imagine something like this in the context I work in, I’m not going to worry about it either.
References
- Lecture https://myweb.uiowa.edu/pbreheny/7210/f15/notes/9-3.pdf by Patrick Breheny
- Paper https://doi.org/10.2307/2529941 by Lagakos