flexsurv
I’m going to write about some of my favourite R packages. I’ll start with flexsurv
(https://github.com/chjackson/flexsurv-dev) by Chris Jackson, which can be used to fit all kinds of parametric models to survival data. It can really do a lot, but I’ll pick out just 2 cool things I like about it:
- Fit a standard survival model, but where it’s slightly easier to work out what the parameters mean.
- Fit a proportional hazards model, which is a lot like a Cox model, but where you also model the baseline hazard using a spline.
1. Consistent parameter values
As mentioned in the tutorial (https://www.jstatsoft.org/article/view/v070i08), for simple models flexsurvreg
acts as a wrapper for survival::survreg
, but where the parameters in the output match those of dweibull
.
With survival::survreg
I would do, e.g.:
dat_ovarian <- survival::ovarian
dat_ovarian$rx <- factor(dat_ovarian$rx)
library(survival)
fit_survreg = survreg(Surv(futime, fustat) ~ rx,
dist = "weibull",
data = dat_ovarian)
summary(fit_survreg)
##
## Call:
## survreg(formula = Surv(futime, fustat) ~ rx, data = dat_ovarian,
## dist = "weibull")
## Value Std. Error z p
## (Intercept) 6.825 0.344 19.84 <2e-16
## rx2 0.559 0.529 1.06 0.29
## Log(scale) -0.121 0.251 -0.48 0.63
##
## Scale= 0.886
##
## Weibull distribution
## Loglik(model)= -97.4 Loglik(intercept only)= -98
## Chisq= 1.18 on 1 degrees of freedom, p= 0.28
## Number of Newton-Raphson Iterations: 5
## n= 26
Then I would look around online for an explanation of the output e.g. (https://stats.stackexchange.com/questions/159044/weibull-survival-model-in-r). There is also an explanation in ?survreg
.
On the other hand, using flexsurv
:
library(flexsurv)
fit_flexsurvreg = flexsurvreg(Surv(futime, fustat) ~ rx,
dist = "weibull",
data = dat_ovarian)
fit_flexsurvreg
## Call:
## flexsurvreg(formula = Surv(futime, fustat) ~ rx, data = dat_ovarian,
## dist = "weibull")
##
## Estimates:
## data mean est L95% U95% se exp(est)
## shape NA 1.129 0.690 1.848 0.284 NA
## scale NA 920.128 468.868 1805.704 316.508 NA
## rx2 0.500 0.559 -0.478 1.597 0.529 1.749
## L95% U95%
## shape NA NA
## scale NA NA
## rx2 0.620 4.936
##
## N = 26, Events: 12, Censored: 14
## Total time at risk: 15588
## Log-likelihood = -97.36415, df = 3
## AIC = 200.7283
The parameters shape
and scale
correspond to dweibull
. So I don’t have to think any further? Not quite: I still have to work out what the estimate of rx2
is doing. I might look at exp(est) = 1.749
and somehow expect this to be a hazard ratio. It’s not. It’s a multiplicative effect on the scale parameter. So when rx = 1
the scale is 920.1
, and when rx = 2
the scale is 920.1 * 1.749
. The hazard ratio (treatment 2 vs treatment 1) is
\[\begin{align} \frac{h_2(x)}{h_1(x)} & = \left( \frac{b_1}{b_2} \right)^a \\ & = \left( \frac{920.1}{920.1 \times 1.749} \right)^{1.129}\\ & = 0.53 \end{align} \]
where \(a\) is the common shape parameter, and \(b_1\) and \(b_2\) are the scale parameters.
Once I had started writing this post, I realized that it’s actually not straightforward to make inference on the hazard ratio using flexsurv
. For working out variances/covariances, the survreg
parameterization is indeed better. I looked around for other R packages in this space, and found SurvRegCensCov
, which can do this conversion automatically for you:
library(SurvRegCensCov)
ConvertWeibull(fit_survreg)$HR
## HR LB UB
## rx2 0.5318051 0.1683444 1.679989
For completeness, using flexsurv
, the log-hazard ratio is
\[\begin{align} \log\left( \frac{h_2(x)}{h_1(x) }\right) & = a\left\lbrace \log(b_1) - \log(b_2) \right\rbrace \\ & = -\exp(\log(a)) \times \left\lbrace \log(b_2) - \log(b_1) \right\rbrace \end{align}\]
I can extract the terms \(\alpha:=\log(a)\) and \(\beta:=\log(b_2) - \log(b_1)\) from the fit_flexsurvreg
object, as well as their (co)variance.
alpha <- fit_flexsurvreg$res.t["shape", "est"]
beta <- fit_flexsurvreg$res.t["rx2", "est"]
cov_alpha_beta <- vcov(fit_flexsurvreg)[c("shape", "rx2"), c("shape", "rx2")]
Then work out the variance of the log-hazard ratio using the delta method.
\[\text{var}\left\lbrace -\beta\exp(\alpha) \right\rbrace = (-\beta\exp(\alpha), -\exp(\alpha)) \text{Cov}(\alpha, \beta) \left( \begin{array}{c} -\beta\exp(\alpha) \\ -\exp(\alpha) \end{array}\right)\]
grad <- matrix(c(-beta*exp(alpha),
-exp(alpha)),
ncol = 1)
var_lhr = t(grad) %*% cov_alpha_beta %*% grad
se_lhr = sqrt(var_lhr)
se_lhr
## [,1]
## [1,] 0.5868807
And to get a 95% confidence interval for the hazard ratio…
log_hr = -exp(alpha) * beta
log_hr_upr = log_hr + qnorm(0.975) * se_lhr
log_hr_lwr = log_hr - qnorm(0.975) * se_lhr
data.frame(HR = exp(log_hr),
LB = exp(log_hr_lwr),
UB = exp(log_hr_upr))
## HR LB UB
## 1 0.5318051 0.1683444 1.679988
Splines
The second thing I really like about flexsurv
is the proportional hazards model with a spline for the baseline hazard. I’ll explore this in another post.