Longitudinal hurdle models 3
In the last post on longitudinal hurdle models, I had just taken samples from the marginal mean
\[\begin{align} g(\theta; x) & = E(Y \mid \theta; x) \\ & = \int E(Y \mid u_{i'}, v_{i'}, \theta; x) f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) du_{i'}dv_{i'} \\ &\approx L^{-1}\sum_{l = 1}^{L}E(Y \mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x)\\ &= L^{-1}\sum_{l = 1}^{L}\left\lbrace 1 - \text{logit}^{-1} ( x^T \gamma + u_{i'}^{(l)}) \right\rbrace \exp(x^T\beta + v_{i'}^{(l)} + \frac{\sigma^2}{2}).\end{align}\]
One of the issues with lognormal data is that it is highly skewed, so the mean can be very large. In a small sample, the sample mean can change a lot based on just 1 or 2 large observations. For this reason I would like to sample from other summary measures of \(Y\).
Samples from \(p(Y \leq k \mid \theta; x)\)
This is quite similar to taking samples from the marginal mean.
\[\begin{align} r(\theta; x, k) & = p(Y < k \mid \theta; x) \\ & = \int p(Y < k \mid u_{i'}, v_{i'}, \theta; x) f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) du_{i'}dv_{i'} \\ &\approx L^{-1}\sum_{l = 1}^{L}p(Y < k \mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x)\\ &= L^{-1}\sum_{l = 1}^{L}\ \left[ \pi^{(l)}(x) + \left\lbrace 1 - \pi^{(l)}(x)\right\rbrace \Phi \left\lbrace \frac{\log(k) - x^T\beta - v_{i'}^{(l)}}{\sigma} \right\rbrace \right].\end{align}\]
where \(\pi^{(l)}(x):= \text{logit}^{-1} ( x^T \gamma + u_{i'}^{(l)})\).
Again, I don’t know of any functions for doing this, so I built my own.
devtools::install_github("dominicmagirr/hurlong")
library(brms)
x <- data.frame(id = NA, time = "2")
hurlong::marg_pyk_q(k = 0.5,
newdata = x,
nsims = 1000,
fit = fit_hurdle)
## id time 2.5% 50% 97.5%
## 1 NA 2 0.421836 0.5018633 0.5792536
Samples from \(\text{median}(Y \mid \theta; x)\)
Finally, I am interested in samples from the marginal median.
\[\begin{align} s(\theta; x) & = \text{median}(Y \mid \theta; x) \\ & = \int \text{median}(Y \mid u_{i'}, v_{i'}, \theta; x) f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) du_{i'}dv_{i'} \\ &\approx L^{-1}\sum_{l = 1}^{L}\text{median}(Y\mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x) \end{align}\]
To evaluate \(\text{median}(Y\mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x)\) at each \(l\), I do a search for \(m^{(l)}\) such that
\[p(Y < m^{(l)} \mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x) = 0.5.\]
Again, I’ve made a function that can do this (for this specific longitudinal hurdle model).
hurlong::marg_med(newdata = x,
nsims = 1000,
ks = exp(seq(log(0.01),
log(100),
length.out = 15)), # where to evaluate p(Y<k)
fit = fit_hurdle)
## id time 2.5% 50% 97.5%
## 1 NA 2 0.2332964 0.4894882 0.8856152
Predictions
To round off this series on longitudinal hurdle models, I want to show how to simulate draws (and find quantiles) from the posterior predictive distribution for a new observation (\(\tilde{Y}\)). Firstly for a patient \(i\) already in the data set, where we draw from
\[ f(\tilde{y} \mid \mathbf{y} ; x) = f(\tilde{y} \mid u_i, v_i, \theta, \mathbf{y} ; x)f(u_i, v_i, \theta \mid \mathbf{y}) \]
library(brms)
predict(fit_hurdle,
newdata = data.frame(id = 1, time = "2"),
robust = TRUE)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 0.2448615 0.3224153 0 2.145083
and, secondly, for a new patient \(i'\), where we draw from
\[ f(\tilde{y} \mid \mathbf{y} ; x) = f(\tilde{y} \mid u_{i'}, v_{i'}, \theta, \mathbf{y} ; x)f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) f(\theta \mid \mathbf{y}) \]
predict(fit_hurdle,
newdata = data.frame(id = NA, time = "2"),
allow_new_levels = TRUE,
sample_new_levels = "gaussian",
robust = TRUE)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 0.4614108 0.6840876 0 52.57643