How to calculate the log-rank statistic
Suppose we have the following data.
df <- dplyr::tibble(patient_id = as.character(1:12),
treatment = rep(c("C", "E"), each = 6),
survival_time = survival::Surv(time = c(2,6,8,11,17,24,7,9,13,22,23,25),
event = c(1,1,1,1,1,0,1,1,1,0,0,0)))
knitr::kable(df)
patient_id | treatment | survival_time |
---|---|---|
1 | C | 2 |
2 | C | 6 |
3 | C | 8 |
4 | C | 11 |
5 | C | 17 |
6 | C | 24+ |
7 | E | 7 |
8 | E | 9 |
9 | E | 13 |
10 | E | 22+ |
11 | E | 23+ |
12 | E | 25+ |
Let’s arrange the data in increasing order of survival time.
df_ordered <- dplyr::arrange(df, survival_time[,"time"])
knitr::kable(df_ordered)
patient_id | treatment | survival_time |
---|---|---|
1 | C | 2 |
2 | C | 6 |
7 | E | 7 |
3 | C | 8 |
8 | E | 9 |
4 | C | 11 |
9 | E | 13 |
5 | C | 17 |
10 | E | 22+ |
11 | E | 23+ |
6 | C | 24+ |
12 | E | 25+ |
Method 1: scores
The first step is to estimate the survival probability from the pooled data using the Nelson-Aalen estimator. Then, to get the log-rank scores, we add 1 to the logarithm of the pooled survival estimate for all observed events. For censored events we do not add 1.
pooled_fit <- survival::survfit(survival_time ~ 1, data = df_ordered)
df_logrank <- dplyr::mutate(df_ordered,
pooled_s = exp(-cumsum(pooled_fit$n.event / pooled_fit$n.risk)),
logrank_score = log(pooled_s) + survival_time[,"status"])
knitr::kable(df_logrank, digits = 3)
patient_id | treatment | survival_time | pooled_s | logrank_score |
---|---|---|---|---|
1 | C | 2 | 0.920 | 0.917 |
2 | C | 6 | 0.840 | 0.826 |
7 | E | 7 | 0.760 | 0.726 |
3 | C | 8 | 0.680 | 0.615 |
8 | E | 9 | 0.600 | 0.490 |
4 | C | 11 | 0.520 | 0.347 |
9 | E | 13 | 0.440 | 0.180 |
5 | C | 17 | 0.361 | -0.020 |
10 | E | 22+ | 0.361 | -1.020 |
11 | E | 23+ | 0.361 | -1.020 |
6 | C | 24+ | 0.361 | -1.020 |
12 | E | 25+ | 0.361 | -1.020 |
To calculate the logrank score statistic we add up all the scores on the control arm.
logrank_score <- dplyr::summarise(dplyr::group_by(df_logrank, treatment), sum(logrank_score))
logrank_score
## # A tibble: 2 x 2
## treatment `sum(logrank_score)`
## <chr> <dbl>
## 1 C 1.66
## 2 E -1.66
Method 2: observed - expected
The score statistic is equivalent to the sum of “observed” - “expected” events at each event time. This is implemented in the survival package.
logrank_fit <- survival::survdiff(survival_time ~ treatment, data = df)
rbind(logrank_fit$n, logrank_fit$obs - logrank_fit$exp)[2,]
## treatment=C treatment=E
## 1.664105 -1.664105