Code
# Generate PLN data with covariates
rPLN_covar <- function(n) {
sim <- list()
(sim$Abundance <- rPLN(49))
sim$Covariate <- mvnfast::rmvn(49, c(rep(0, 4)), diag(4))
sim <- prepare_data(sim$Abundance, sim$Covariate)
return(sim)
}To better understand the PLN distribution, and also to understand the impact of the VEM algorithm on the inference of the model, a naive idea is to compute the expectation and covariance matrix.
We can easily solve the system and find the unknown parameters: \[ \boldsymbol{\Sigma}= \log \left( J + \Delta_Y^{-1} \left( \boldsymbol{\Sigma}_Y - \Delta_Y \right) \Delta_Y^{-1} \right) \] \[ \mu = \log \mu_Y - \frac{1}{2} \mathrm{diag}\,\boldsymbol{\Sigma}\] then estimate \(\boldsymbol{\Sigma}_Y\) and \(\mu_Y\) by their respective empirical means.
Two problems arise:
# Generate PLN data with covariates
rPLN_covar <- function(n) {
sim <- list()
(sim$Abundance <- rPLN(49))
sim$Covariate <- mvnfast::rmvn(49, c(rep(0, 4)), diag(4))
sim <- prepare_data(sim$Abundance, sim$Covariate)
return(sim)
}# sapply()
# Fit PLN model parameters via moments estimation
PLN_moments_fit <- function(data = trichoptera) {
d <- data$Abundance
# mu_yi <- sapply(sapply(1 / (1 + d), mean), lambda_inverse)
mu_yi <- 1 / colMeans(d)
# print(rbind(mu_yi, 1 / diag(cov(d))))
# options(scipen = 999)
delta_yi <- diag(mu_yi)
sigma_y <- cov(d)
# diag(sigma_y) <- apply(rbind(1.1 / mu_yi, diag(sigma_y)), 2, max)
ones <- matrix(1, ncol(d), ncol(d))
m <- ones + delta_yi %*% sigma_y %*% delta_yi - delta_yi
sigma <- log(m)
sigma[is.na(sigma)] <- 0
mu <- log(mu_yi) - 0.5 * diag(sigma)
params <- list()
params$Sigma <- sigma
params$mu <- mu
return(params)
}
poisson_inverse <- function(n = 10, lambda = 0) {
x <- data.frame(rpois(n, lambda))
ggplot(x, aes(x)) +
geom_density()
}
inverse_fun <- function(f, lower = -100, upper = 100) {
function(y) {
uniroot((function(x) f(x) - y),
lower = lower,
upper = upper
)[1]$root
}
}
lambda_inverse <- inverse_fun(function(x) x * (1 - exp(-1 / x)), 0, 1000)The moments estimators perform badly in comparison to the VEM estimators, especially as there is no reason that the empirical moments verify the over-dispersion condition.
In order to obtain better results, we could try to remove the zeroes in the counts, then estimate the Poisson parameters, before adding the correct number of zeroes.
However, they could be an alternative for the initialization of the VEM algorithm.
\[ (Z_1, \dots, Z_n) \sim \mathrm{SAR}(\mu, \boldsymbol{\Sigma}, \boldsymbol{\Phi}) \] and \[ Y_{ij} | Z_i \overset{\perp}{\sim} \mathcal{P}( \exp Z_{ij}), 1 \leq j \leq p, 1 \leq i \leq n \]
If \(\boldsymbol{\Phi}= \mathbf0\), then we retrieve the classical PLN model.
Let us take the same approach as for non-AR PLN, and choose the set \(\mathcal{Q}\) of Gaussian variational distributions. This means that we will try to have correct results with the simplifying hypothesis that the variational variables \(Z_i \sim Q_{\psi_i}, 1 \leq i \leq n\) are mutually independent.
If this fails, there are other approaches, like choosing an AR structure for the variational variables.
Let us denote \(\theta = (\mu, \boldsymbol{\Sigma})\) and \(\psi = (m_1, \dots, m_n, \mathbf{S}_1, \dots, \mathbf{S}_n)\) such that \(Q_\psi \in \mathcal{Q}\).
Also, to be more general we place ourselves in regression, with \(\mathbf{B}= (\beta_1 | \dots | \beta_p) \in \mathbb{R}^{d \times p}\) and \(\mu_i= \mathbf{B}^\top X_i, 1 \leq i \leq n\) (for the non-conditional version, it suffices to take \(\mu_1 = \dots = \mu_n = \mu\)). Then: \[ \begin{aligned} J ( \theta, \psi; \mathbf{Y}) &= \mathbb{E}_\psi [ \ell_\theta ( \mathbf{Y}| \mathbf{Z}) + \ell_\theta (\mathbf{Z}) - \ell_\psi (\mathbf{Z}) | \mathbf{Y}] \\ &= \sum_{i=1}^n \mathbb{E}_{\psi_i} [ \ell ( \psi; Y_i | Z_i ) | Y_i ] + \mathbb{E}_{\psi_{i-1}, \psi_i} [ \ell (\theta, \psi_{i-1}, \psi_i; Z_{i-1}) ] - \mathbb{E}_\psi [ \ell (\psi; \mathbf{Z}) ] \\ J ( \theta, \psi; \mathbf{Y}) &= \sum_{i=1}^{n} J_i ( \theta, \psi_{i-1}, \psi_i; Y_i) \end{aligned} \]
where: \[ J_i ( \theta, \psi_{i-1}, \psi_i; Y_i) = \sum_{j=1}^{p} \mathbb{E}_{\psi_i^j} \left[ \left. \ell ( Y_i^j | Z_i^j ) \right| Y_i^j \right] - \mathbb{E}_{\psi_{i-1}} \left[ \mathrm{KL}\,\left( \left. \mathcal{N}_p (m_i, \mathbf{S}_i) \right\| \mathcal{N}_p ( \boldsymbol{\Phi}(Z_{i-1}-\mu_{i-1}) + \mu_i, \boldsymbol{\Sigma}_i ) \right) \right] \]
Finally, we compute the ELBO:
Proposition 2.1 For \(1 \leq i \leq n\): \[ \begin{aligned} J_i ( \theta, \psi_{i-1}, \psi_i; Y_i) &= - \left\| \exp \left( m_i + \tfrac{s_i^2}2 \right) \right\|_1 + \langle m_i, Y_i \rangle - \| \log (Y_i!) \|_1 \\ &- \frac12 \| \mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1}) \|_{\boldsymbol{\Sigma}_i}^2 + \frac12 \log \left| \boldsymbol{\Omega}_i \mathbf{S}_i \right| - \frac12 \langle \boldsymbol{\Omega}_i, \mathbf{S}_i + \boldsymbol{\Phi}\mathbf{S}_{i-1} \boldsymbol{\Phi}^\top \rangle + \frac{p}2 \end{aligned} \] where we defined:
Remark. Fortunately, when we force \(\boldsymbol{\Phi}=0\), we retrieve the non-AR ELBO.
The PLNmodels package is written in R, with two backends, one in torch, the other in C++ using nlopt.
There are multiple possibilities. Either we start with a VE-step and we initialise \(\beta\), \(\boldsymbol{\Sigma}\) and \(\boldsymbol{\Phi}\), by:
or we start with an M-step and we initialise \(m_1, \dots, m_n, \mathbf{S}_1, \dots, \mathbf{S}_n\) and \(\beta\) using a Poisson regression, which is the current choice for the initialization of the non-AR VEM algorithm.
For \(\boldsymbol{\Phi}\), we could also use AR estimation techniques, after estimating \(\mathbf{Z}\) with non-AR VEM or Poisson regression.
Since there were explicit expressions for the zeroes of the objective function for classical PLN, we should hope the same goes for PLN-AR. But the AR parameter \(\Phi\) complicates the differentiation. Explicit expressions are useful because they allow us to compute the updates for the C++ backend, and a profiled ELBO, useful to speed up the torch backend.
Let us denote \(\widetilde{\mathbf{S}} = \left( s_1 | \dots | s_n \right)\) the matrix whose rows are the diagonals of the \(\mathbf{S}_1, \dots, \mathbf{S}_n\).
\(\mu\): \[ \nabla_{\mu_i} J (\theta, \psi) = ( \boldsymbol{\Phi}\Omega_{i+1} \boldsymbol{\Phi}^\top - \Omega_i ) (\mu_i - m_i) + \Omega_i \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1}) + \boldsymbol{\Phi}^\top \Omega_{i+1} (\mu_{i+1} - m_{i+1}) \]
\(\boldsymbol{\Sigma}\): Let us denote: \[ \mathbf{F}_i = \frac12 \boldsymbol{\Omega}_i \left[ (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1})) (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1}))^\top + \mathbf{S}_i + \boldsymbol{\Phi}\mathbf{S}_{i-1} \boldsymbol{\Phi}^\top - \boldsymbol{\Sigma}_i \right] \boldsymbol{\Omega}_i \] Then: \[ \nabla_{\boldsymbol{\Sigma}} J (\theta, \psi) = \mathbf{F}_1 + \sum_{i=2}^n \mathbf{F}_i + \boldsymbol{\Phi}^\top \mathbf{F}_i \boldsymbol{\Phi} \]
\(\boldsymbol{\Phi}\): \[ \begin{aligned} \nabla_{\boldsymbol{\Phi}} J (\theta, \psi) = \boldsymbol{\Omega}_\varepsilon \sum_{i=2}^n &\boldsymbol{\Phi}\boldsymbol{\Sigma}- \boldsymbol{\Phi}\mathbf{S}_{i-1} + (\mathbf{S}_i + \boldsymbol{\Phi}\mathbf{S}_{i-1} \boldsymbol{\Phi}^\top) \boldsymbol{\Omega}_\varepsilon \boldsymbol{\Phi}\boldsymbol{\Sigma}\\ &- (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1})) (\mu_{i-1} - m_{i-1})^\top \\ &- (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1})) (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1}))^\top \boldsymbol{\Omega}_\varepsilon \Phi^\top \boldsymbol{\Sigma} \end{aligned} \]
We need to compute the gradients, but their solutions are non-explicit, so we will use the nlopt optimisation library in C++.
\(\mathbf{M}\): \[ \begin{aligned} \nabla_{\mathbf{M}} J (\theta, \psi) = &- \exp(\mathbf{M}+ \frac12 \widetilde{\mathbf{S}}) + \mathbf{Y}- e_1^\top \boldsymbol{\Omega}(\mu_1 - \mathbf{M}^\top e_1) \\ &- \boldsymbol{\Omega}_\varepsilon (\boldsymbol{\mu}- \mathbf{M}- \boldsymbol{\Phi}(\boldsymbol{\mu}- \mathbf{M}) \mathbf{N}) + \boldsymbol{\Phi}^\top \boldsymbol{\Omega}_\varepsilon (\boldsymbol{\mu}- \mathbf{M}- \boldsymbol{\Phi}(\boldsymbol{\mu}- \mathbf{M}) \mathbf{N}) \mathbf{N}^\top \end{aligned} \]
where \(\mathbf{N}= \begin{pmatrix} 0 & 1 & \dots & 0 \\ \vdots & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & 1 \\ 0 & \dots & \dots & 0 \end{pmatrix}\).
\(\mathbf{S}_i\): \[ \begin{aligned} 2 \nabla_{\mathbf{S}_i} J (\theta, \psi) = \mathbf{S}_i^{-1} - \exp(m_i + \frac12 \mathrm{diag}\,\mathbf{S}_i) - (\boldsymbol{\Omega}_i + \boldsymbol{\Phi}^\top \boldsymbol{\Omega}_{i+1} \boldsymbol{\Phi}) \end{aligned} \]
library(mvnfast)
library(tidyverse)
n <- 5e5
p <- 1
mu <- rep(log(10), p)
Sigma <- diag(log(3), p)
depths <- rowSums(exp(rep(1, n) %o% diag(Sigma) / 2 + mu)) # Null offsets
rPLN_withZ <- function(n = 10, mu = rep(0, ncol(Sigma)), Sigma = diag(
1, 5,
5
), depths = rep(10000, n)) {
p <- ncol(Sigma)
if (any(is.vector(mu), ncol(mu) == 1)) {
mu <- matrix(rep(mu, n), ncol = p, byrow = TRUE)
}
if (length(depths) != n) {
depths <- rep(depths[1], n)
}
exp_depths <- rowSums(exp(rep(1, n) %o% diag(Sigma) / 2 + mu))
offsets <- log(depths %o% rep(1, p)) - log(exp_depths)
Z <- mu + rmvn(n, rep(0, ncol(Sigma)), as.matrix(Sigma)) +
offsets
Y <- matrix(rpois(n * p, as.vector(exp(Z))), n, p)
dimnames(Y) <- list(paste0("S", 1:n), paste0("Y", 1:p))
dimnames(Z) <- list(paste0("S", 1:n), paste0("Z", 1:p))
attr(Y, "offsets") <- offsets
data.frame(Y, Z)
}
Z_cond <- rPLN_withZ(n, mu, Sigma, depths)
choice <- Z_cond |>
group_by(Y1) |>
summarise(n = n()) |>
arrange(desc(n))
Z_cond <- Z_cond |>
filter(Y1 == choice[1, 1]) |>
select(Z1) |>
mutate(Z_norm = rmvn(n(), mean(Z1), var(Z1))) |>
pivot_longer(everything())
p0 <- ggplot(Z_cond) +
geom_density(aes(x = value, color = name))
p0