Longitudinal hurdle models 3

In the last post on longitudinal hurdle models, I had just taken samples from the marginal mean

\[\begin{align} g(\theta; x) & = E(Y \mid \theta; x) \\ & = \int E(Y \mid u_{i'}, v_{i'}, \theta; x) f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) du_{i'}dv_{i'} \\ &\approx L^{-1}\sum_{l = 1}^{L}E(Y \mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x)\\ &= L^{-1}\sum_{l = 1}^{L}\left\lbrace 1 - \text{logit}^{-1} ( x^T \gamma + u_{i'}^{(l)}) \right\rbrace \exp(x^T\beta + v_{i'}^{(l)} + \frac{\sigma^2}{2}).\end{align}\]

One of the issues with lognormal data is that it is highly skewed, so the mean can be very large. In a small sample, the sample mean can change a lot based on just 1 or 2 large observations. For this reason I would like to sample from other summary measures of \(Y\).

Samples from \(p(Y \leq k \mid \theta; x)\)

This is quite similar to taking samples from the marginal mean.

\[\begin{align} r(\theta; x, k) & = p(Y < k \mid \theta; x) \\ & = \int p(Y < k \mid u_{i'}, v_{i'}, \theta; x) f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) du_{i'}dv_{i'} \\ &\approx L^{-1}\sum_{l = 1}^{L}p(Y < k \mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x)\\ &= L^{-1}\sum_{l = 1}^{L}\ \left[ \pi^{(l)}(x) + \left\lbrace 1 - \pi^{(l)}(x)\right\rbrace \Phi \left\lbrace \frac{\log(k) - x^T\beta - v_{i'}^{(l)}}{\sigma} \right\rbrace \right].\end{align}\]

where \(\pi^{(l)}(x):= \text{logit}^{-1} ( x^T \gamma + u_{i'}^{(l)})\).

Again, I don’t know of any functions for doing this, so I built my own.

devtools::install_github("dominicmagirr/hurlong")
library(brms)
x <- data.frame(id = NA, time = "2")
hurlong::marg_pyk_q(k = 0.5,
                    newdata = x, 
                    nsims = 1000, 
                    fit = fit_hurdle)
##   id time     2.5%       50%     97.5%
## 1 NA    2 0.421836 0.5018633 0.5792536

Samples from \(\text{median}(Y \mid \theta; x)\)

Finally, I am interested in samples from the marginal median.

\[\begin{align} s(\theta; x) & = \text{median}(Y \mid \theta; x) \\ & = \int \text{median}(Y \mid u_{i'}, v_{i'}, \theta; x) f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) du_{i'}dv_{i'} \\ &\approx L^{-1}\sum_{l = 1}^{L}\text{median}(Y\mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x) \end{align}\]

To evaluate \(\text{median}(Y\mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x)\) at each \(l\), I do a search for \(m^{(l)}\) such that

\[p(Y < m^{(l)} \mid u_{i'}^{(l)}, v_{i'}^{(l)}, \theta; x) = 0.5.\]

Again, I’ve made a function that can do this (for this specific longitudinal hurdle model).

hurlong::marg_med(newdata = x, 
                  nsims = 1000, 
                  ks = exp(seq(log(0.01), 
                               log(100), 
                               length.out = 15)), # where to evaluate p(Y<k)
                  fit = fit_hurdle)
##   id time      2.5%       50%     97.5%
## 1 NA    2 0.2332964 0.4894882 0.8856152

Predictions

To round off this series on longitudinal hurdle models, I want to show how to simulate draws (and find quantiles) from the posterior predictive distribution for a new observation (\(\tilde{Y}\)). Firstly for a patient \(i\) already in the data set, where we draw from

\[ f(\tilde{y} \mid \mathbf{y} ; x) = f(\tilde{y} \mid u_i, v_i, \theta, \mathbf{y} ; x)f(u_i, v_i, \theta \mid \mathbf{y}) \]

library(brms)
predict(fit_hurdle,
        newdata = data.frame(id = 1, time = "2"),
        robust = TRUE)
##       Estimate Est.Error Q2.5    Q97.5
## [1,] 0.2448615 0.3224153    0 2.145083

and, secondly, for a new patient \(i'\), where we draw from

\[ f(\tilde{y} \mid \mathbf{y} ; x) = f(\tilde{y} \mid u_{i'}, v_{i'}, \theta, \mathbf{y} ; x)f(u_{i'}, v_{i'} \mid \theta, \mathbf{y}) f(\theta \mid \mathbf{y}) \]

predict(fit_hurdle,
        newdata = data.frame(id = NA, time = "2"),
        allow_new_levels = TRUE,
        sample_new_levels = "gaussian",
        robust = TRUE)
##       Estimate Est.Error Q2.5    Q97.5
## [1,] 0.4614108 0.6840876    0 52.57643
Avatar
Dominic Magirr
Medical Statistician

Interested in the design, analysis and interpretation of clinical trials.

Related