Longitudinal hurdle models 2
In a previous post I fit a longitudinal hurdle model using the brms
package.
library(brms)
summary(fit_hurdle)
## Family: hurdle_lognormal
## Links: mu = identity; sigma = identity; hu = logit
## Formula: y ~ time + (1 | q | id)
## hu ~ time + (1 | q | id)
## Data: dat (Number of observations: 800)
## Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup samples = 8000
##
## Group-Level Effects:
## ~id (Number of levels: 100)
## Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept) 1.96 0.16 1.67 2.30 1.00
## sd(hu_Intercept) 2.46 0.30 1.94 3.10 1.00
## cor(Intercept,hu_Intercept) -0.90 0.04 -0.97 -0.80 1.00
## Bulk_ESS Tail_ESS
## sd(Intercept) 1475 2467
## sd(hu_Intercept) 3265 4837
## cor(Intercept,hu_Intercept) 1949 3570
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.07 0.21 -0.34 0.48 1.01 770 1487
## hu_Intercept -2.73 0.35 -3.44 -2.07 1.00 1484 3476
## time2 -0.33 0.08 -0.48 -0.18 1.00 11586 5606
## hu_time2 1.27 0.24 0.81 1.73 1.00 11668 5524
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.94 0.03 0.88 1.00 1.00 8823 5708
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
I’d now like to do some inference on the model, combining its zero and non-zero parts.
The model is: for observation \(j\) from patient \(i\),
\[Y_{i,j} = Z_{i,j}Y^*_{i,j},\] \[\text{logit}\left\lbrace P(Z_{i,j} = 0) \right\rbrace = x_{i,j}^T\gamma + u_i,\]
\[\log (Y^*_{i,j}) \sim N(x_{i,j}^T\beta + v_i, ~\sigma^2), \]
\[\left( \begin{array}{c} u_i \\ v_i \\ \end{array} \right) \sim N\left(\left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right), \left( \begin{array}{c c} \sigma_u^2 & \rho \sigma_u\sigma_v \\ \rho \sigma_u \sigma_v & \sigma_v ^ 2\end{array}\right)\right),\] where \(x_{i,j}^T = (1, t_{i,j})\). Also let \(\theta = (\gamma, \beta, \sigma, \sigma_u, \sigma_v, \rho)\).
How to use fitted.brmsfit
Having obtained posterior samples from \((\theta, u_1,\ldots,u_n,v_1,\ldots,v_n)\), we might want to look at samples from:
\[\begin{align} h(u_i, v_i, \theta; x) & = E(Y \mid u_i, v_i, \theta; x) \\ &= \left\lbrace 1 - \text{logit}^{-1} ( x^T \gamma + u_i) \right\rbrace \exp(x^T\beta + v_i + \frac{\sigma^2}{2}).\end{align}\]
This is some kind of patient-specific expectation of \(Y\), conditional on the random effects. If patient \(i\) is already in the model, then we can just take samples directly from the posterior. This can be achieved with the fitted
method:
fitted(fit_hurdle,
newdata = data.frame(id = 1, time = "2"),
robust = TRUE)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 0.4034682 0.150417 0.1796645 0.8248473
will give the median and (2.5, 97.5) quantiles of \(h(u_1, v_1, \theta ; x)\) at timepoint “2”.
Alternatively, we might be more interested in \(h(0, 0, \theta ; x)\), which in some sense is the patient-specific expectation of \(Y\) for an “average” patient with random effects fixed at zero. We can get this by setting re_formula = NA
:
fitted(fit_hurdle,
newdata = data.frame(time = "2"),
robust = TRUE,
re_formula = NA)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 0.9615294 0.2401395 0.5689127 1.569304
Or, we might be more interested in \(h(u_{i'}, v_{i'}, \theta ; x)\) for a new patient \(i'\). Now we need to generate samples from the posterior distribution
\[f(u_{i'}, v_{i'}, \theta \mid \mathbf{y}) = f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) f(\theta \mid \mathbf{y})\]
we can do this by going through our posterior samples \(\theta^{(k)}\) for \(k = 1,\ldots,K\) and each time simulating
\[\left( \begin{array}{c} u_{i'}^{(k)} \\ v_{i'}^{(k)} \\ \end{array} \right) \sim N\left(\left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right), \left( \begin{array}{c c} (\sigma^{(k)}_u)^2 & \rho \sigma^{(k)}_u\sigma^{(k)}_v \\ \rho \sigma^{(k)}_u \sigma^{(k)}_v & (\sigma^{(k)}_v)^2\end{array}\right)\right).\]
The way I expected this to be done is
fitted(fit_hurdle, newdata = data.frame(id = NA, time = "2"),
allow_new_levels = TRUE,
sample_new_levels = "gaussian",
robust = TRUE)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 0.9095811 1.315207 0.0009475099 58.14658
and this is indeed what’s happening, as can be seen by going through the steps manually
ps <- posterior_samples(fit_hurdle)
sigma_error <- ps[,"sigma"]
sigma_u <- ps[,"sd_id__hu_Intercept"]
sigma_v <- ps[,"sd_id__Intercept"]
rho <- ps[,"cor_id__Intercept__hu_Intercept"]
n_mcmc <- length(rho)
x <- data.frame(id = NA, time = "2")
### simulate u_i' and v_i'
u <- rnorm(n_mcmc, sd = sigma_u)
### include correlation
v <- rnorm(n_mcmc,
mean = u * sigma_v / sigma_u * rho,
sd = sqrt((1 - rho^2) * sigma_v ^ 2))
### extract draws from xi = x*gamma
xi <- qlogis(fitted(fit_hurdle,
newdata = x,
re_formula = NA,
dpar = "hu",
summary = FALSE))
### extract draws from eta = x*beta
eta <- fitted(fit_hurdle,
newdata = x,
re_formula = NA,
dpar = "mu",
summary = FALSE)
ey <- (1 - plogis(xi + u)) * exp(eta + v + sigma_error ^ 2 / 2)
round(quantile(ey, probs = c(0.025, 0.5, 0.975)), 3)
## 2.5% 50% 97.5%
## 0.001 0.927 52.369
Unconditional expectation
Instead of a patient-specific expectation, conditional on random effects, we might be more interested in targeting an overall expectation (for a new patient) where we integrate out the random effects. In other words, we want to take samples from
\[\begin{align} g(\theta; x) & = E(Y \mid \theta; x) \\ & = \int E(Y \mid u_{i'}, v_{i'}, \theta; x) f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) du_{i'}dv_{i'} \\ &\approx L^{-1}\sum_{l = 1}^{L}E(Y \mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x)\\ &= L^{-1}\sum_{l = 1}^{L}\left\lbrace 1 - \text{logit}^{-1} ( x^T \gamma + u_{i'}^{(l)}) \right\rbrace \exp(x^T\beta + v_{i'}^{(l)} + \frac{\sigma^2}{2}).\end{align}\]
That is, I take \(g(\theta^{(k)}; x)\) for \(k = 1,\ldots,K\). At each \(k\), for \(l= 1 \ldots,L\), I take indpendent draws
\[\left( \begin{array}{c} u_{i'}^{(l)} \\ v_{i'}^{(l)} \\ \end{array} \right) \sim N\left(\left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right), \left( \begin{array}{c c} (\sigma^{(k)}_u)^2 & \rho \sigma^{(k)}_u\sigma^{(k)}_v \\ \rho \sigma^{(k)}_u \sigma^{(k)}_v & (\sigma^{(k)}_v)^2\end{array}\right)\right).\]
to perform the inner Monte Carlo integration (probably not the best method for this 2-d example). I don’t think it’s possible to do this with brms
so I’ve written my own code (which only works for this specific model).
devtools::install_github("dominicmagirr/hurlong")
hurlong::marg_mean_q(newdata = x,
nsims = 1000,
fit = fit_hurdle)
## id time 2.5% 50% 97.5%
## 1 NA 2 3.845286 7.511835 19.58231