Longitudinal hurdle models 3
In the last post on longitudinal hurdle models, I had just taken samples from the marginal mean
g(θ;x)=E(Y∣θ;x)=∫E(Y∣ui′,vi′,θ;x)f(ui′,vi′∣θ,y)dui′dvi′≈L−1L∑l=1E(Y∣u(l)i′,v(l)i′,θ;x)=L−1L∑l=1{1−logit−1(xTγ+u(l)i′)}exp(xTβ+v(l)i′+σ22).
One of the issues with lognormal data is that it is highly skewed, so the mean can be very large. In a small sample, the sample mean can change a lot based on just 1 or 2 large observations. For this reason I would like to sample from other summary measures of Y.
Samples from p(Y≤k∣θ;x)
This is quite similar to taking samples from the marginal mean.
r(θ;x,k)=p(Y<k∣θ;x)=∫p(Y<k∣ui′,vi′,θ;x)f(ui′,vi′∣θ,y)dui′dvi′≈L−1L∑l=1p(Y<k∣u(l)i′,v(l)i′,θ;x)=L−1L∑l=1 [π(l)(x)+{1−π(l)(x)}Φ{log(k)−xTβ−v(l)i′σ}].
where π(l)(x):=logit−1(xTγ+u(l)i′).
Again, I don’t know of any functions for doing this, so I built my own.
devtools::install_github("dominicmagirr/hurlong")
library(brms)
x <- data.frame(id = NA, time = "2")
hurlong::marg_pyk_q(k = 0.5,
newdata = x,
nsims = 1000,
fit = fit_hurdle)
## id time 2.5% 50% 97.5%
## 1 NA 2 0.421836 0.5018633 0.5792536
Samples from median(Y∣θ;x)
Finally, I am interested in samples from the marginal median.
s(θ;x)=median(Y∣θ;x)=∫median(Y∣ui′,vi′,θ;x)f(ui′,vi′∣θ,y)dui′dvi′≈L−1L∑l=1median(Y∣u(l)i′,v(l)i′,θ;x)
To evaluate median(Y∣u(l)i′,v(l)i′,θ;x) at each l, I do a search for m(l) such that
p(Y<m(l)∣u(l)i′,v(l)i′,θ;x)=0.5.
Again, I’ve made a function that can do this (for this specific longitudinal hurdle model).
hurlong::marg_med(newdata = x,
nsims = 1000,
ks = exp(seq(log(0.01),
log(100),
length.out = 15)), # where to evaluate p(Y<k)
fit = fit_hurdle)
## id time 2.5% 50% 97.5%
## 1 NA 2 0.2332964 0.4894882 0.8856152
Predictions
To round off this series on longitudinal hurdle models, I want to show how to simulate draws (and find quantiles) from the posterior predictive distribution for a new observation (˜Y). Firstly for a patient i already in the data set, where we draw from
f(˜y∣y;x)=f(˜y∣ui,vi,θ,y;x)f(ui,vi,θ∣y)
library(brms)
predict(fit_hurdle,
newdata = data.frame(id = 1, time = "2"),
robust = TRUE)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 0.2448615 0.3224153 0 2.145083
and, secondly, for a new patient i′, where we draw from
f(˜y∣y;x)=f(˜y∣ui′,vi′,θ,y;x)f(ui′,vi′∣θ,y)f(θ∣y)
predict(fit_hurdle,
newdata = data.frame(id = NA, time = "2"),
allow_new_levels = TRUE,
sample_new_levels = "gaussian",
robust = TRUE)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 0.4614108 0.6840876 0 52.57643