Longitudinal hurdle models 2

In a previous post I fit a longitudinal hurdle model using the brms package.

library(brms)
summary(fit_hurdle)
##  Family: hurdle_lognormal 
##   Links: mu = identity; sigma = identity; hu = logit 
## Formula: y ~ time + (1 | q | id) 
##          hu ~ time + (1 | q | id)
##    Data: dat (Number of observations: 800) 
## Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept)                   1.96      0.16     1.67     2.30 1.00
## sd(hu_Intercept)                2.46      0.30     1.94     3.10 1.00
## cor(Intercept,hu_Intercept)    -0.90      0.04    -0.97    -0.80 1.00
##                             Bulk_ESS Tail_ESS
## sd(Intercept)                   1475     2467
## sd(hu_Intercept)                3265     4837
## cor(Intercept,hu_Intercept)     1949     3570
## 
## Population-Level Effects: 
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept        0.07      0.21    -0.34     0.48 1.01      770     1487
## hu_Intercept    -2.73      0.35    -3.44    -2.07 1.00     1484     3476
## time2           -0.33      0.08    -0.48    -0.18 1.00    11586     5606
## hu_time2         1.27      0.24     0.81     1.73 1.00    11668     5524
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.94      0.03     0.88     1.00 1.00     8823     5708
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

I’d now like to do some inference on the model, combining its zero and non-zero parts.

The model is: for observation \(j\) from patient \(i\),

\[Y_{i,j} = Z_{i,j}Y^*_{i,j},\] \[\text{logit}\left\lbrace P(Z_{i,j} = 0) \right\rbrace = x_{i,j}^T\gamma + u_i,\]

\[\log (Y^*_{i,j}) \sim N(x_{i,j}^T\beta + v_i, ~\sigma^2), \]

\[\left( \begin{array}{c} u_i \\ v_i \\ \end{array} \right) \sim N\left(\left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right), \left( \begin{array}{c c} \sigma_u^2 & \rho \sigma_u\sigma_v \\ \rho \sigma_u \sigma_v & \sigma_v ^ 2\end{array}\right)\right),\] where \(x_{i,j}^T = (1, t_{i,j})\). Also let \(\theta = (\gamma, \beta, \sigma, \sigma_u, \sigma_v, \rho)\).

How to use fitted.brmsfit

Having obtained posterior samples from \((\theta, u_1,\ldots,u_n,v_1,\ldots,v_n)\), we might want to look at samples from:

\[\begin{align} h(u_i, v_i, \theta; x) & = E(Y \mid u_i, v_i, \theta; x) \\ &= \left\lbrace 1 - \text{logit}^{-1} ( x^T \gamma + u_i) \right\rbrace \exp(x^T\beta + v_i + \frac{\sigma^2}{2}).\end{align}\]

This is some kind of patient-specific expectation of \(Y\), conditional on the random effects. If patient \(i\) is already in the model, then we can just take samples directly from the posterior. This can be achieved with the fitted method:

fitted(fit_hurdle, 
       newdata = data.frame(id = 1, time = "2"), 
       robust = TRUE)
##       Estimate Est.Error      Q2.5     Q97.5
## [1,] 0.4034682  0.150417 0.1796645 0.8248473

will give the median and (2.5, 97.5) quantiles of \(h(u_1, v_1, \theta ; x)\) at timepoint “2”.

Alternatively, we might be more interested in \(h(0, 0, \theta ; x)\), which in some sense is the patient-specific expectation of \(Y\) for an “average” patient with random effects fixed at zero. We can get this by setting re_formula = NA:

fitted(fit_hurdle, 
       newdata = data.frame(time = "2"), 
       robust = TRUE, 
       re_formula = NA)
##       Estimate Est.Error      Q2.5    Q97.5
## [1,] 0.9615294 0.2401395 0.5689127 1.569304

Or, we might be more interested in \(h(u_{i'}, v_{i'}, \theta ; x)\) for a new patient \(i'\). Now we need to generate samples from the posterior distribution

\[f(u_{i'}, v_{i'}, \theta \mid \mathbf{y}) = f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) f(\theta \mid \mathbf{y})\]

we can do this by going through our posterior samples \(\theta^{(k)}\) for \(k = 1,\ldots,K\) and each time simulating

\[\left( \begin{array}{c} u_{i'}^{(k)} \\ v_{i'}^{(k)} \\ \end{array} \right) \sim N\left(\left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right), \left( \begin{array}{c c} (\sigma^{(k)}_u)^2 & \rho \sigma^{(k)}_u\sigma^{(k)}_v \\ \rho \sigma^{(k)}_u \sigma^{(k)}_v & (\sigma^{(k)}_v)^2\end{array}\right)\right).\]

The way I expected this to be done is

fitted(fit_hurdle, newdata = data.frame(id = NA, time = "2"), 
       allow_new_levels = TRUE,
       sample_new_levels = "gaussian",
       robust = TRUE)
##       Estimate Est.Error         Q2.5    Q97.5
## [1,] 0.9095811  1.315207 0.0009475099 58.14658

and this is indeed what’s happening, as can be seen by going through the steps manually

ps <- posterior_samples(fit_hurdle)

sigma_error <- ps[,"sigma"]
sigma_u <- ps[,"sd_id__hu_Intercept"]
sigma_v <- ps[,"sd_id__Intercept"]
rho <- ps[,"cor_id__Intercept__hu_Intercept"]

n_mcmc <- length(rho)

x <- data.frame(id = NA, time = "2")

### simulate u_i' and v_i' 
u <- rnorm(n_mcmc, sd = sigma_u)
### include correlation 
v <- rnorm(n_mcmc, 
           mean = u * sigma_v / sigma_u * rho,
           sd = sqrt((1 - rho^2) * sigma_v ^ 2))


### extract draws from xi = x*gamma
xi <- qlogis(fitted(fit_hurdle, 
                    newdata = x, 
                    re_formula = NA, 
                    dpar = "hu",
                    summary = FALSE))

### extract draws from eta = x*beta
eta <- fitted(fit_hurdle, 
              newdata = x, 
              re_formula = NA, 
              dpar = "mu",
              summary = FALSE)

ey <- (1 - plogis(xi + u)) * exp(eta + v + sigma_error ^ 2 / 2)

round(quantile(ey, probs = c(0.025, 0.5, 0.975)), 3)
##   2.5%    50%  97.5% 
##  0.001  0.927 52.369

Unconditional expectation

Instead of a patient-specific expectation, conditional on random effects, we might be more interested in targeting an overall expectation (for a new patient) where we integrate out the random effects. In other words, we want to take samples from

\[\begin{align} g(\theta; x) & = E(Y \mid \theta; x) \\ & = \int E(Y \mid u_{i'}, v_{i'}, \theta; x) f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) du_{i'}dv_{i'} \\ &\approx L^{-1}\sum_{l = 1}^{L}E(Y \mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x)\\ &= L^{-1}\sum_{l = 1}^{L}\left\lbrace 1 - \text{logit}^{-1} ( x^T \gamma + u_{i'}^{(l)}) \right\rbrace \exp(x^T\beta + v_{i'}^{(l)} + \frac{\sigma^2}{2}).\end{align}\]

That is, I take \(g(\theta^{(k)}; x)\) for \(k = 1,\ldots,K\). At each \(k\), for \(l= 1 \ldots,L\), I take indpendent draws

\[\left( \begin{array}{c} u_{i'}^{(l)} \\ v_{i'}^{(l)} \\ \end{array} \right) \sim N\left(\left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right), \left( \begin{array}{c c} (\sigma^{(k)}_u)^2 & \rho \sigma^{(k)}_u\sigma^{(k)}_v \\ \rho \sigma^{(k)}_u \sigma^{(k)}_v & (\sigma^{(k)}_v)^2\end{array}\right)\right).\]

to perform the inner Monte Carlo integration (probably not the best method for this 2-d example). I don’t think it’s possible to do this with brms so I’ve written my own code (which only works for this specific model).

devtools::install_github("dominicmagirr/hurlong")
hurlong::marg_mean_q(newdata = x, 
                     nsims = 1000, 
                     fit = fit_hurdle)
##   id time     2.5%      50%    97.5%
## 1 NA    2 3.845286 7.511835 19.58231
Avatar
Dominic Magirr
Medical Statistician

Interested in the design, analysis and interpretation of clinical trials.

Related