2 Contribution: Adding dependency between sites

2.1 Method of moments

To better understand the PLN distribution, and also to understand the impact of the VEM algorithm on the inference of the model, a naive idea is to compute the expectation and covariance matrix.

We can easily solve the system and find the unknown parameters: \[ \boldsymbol{\Sigma}= \log \left( J + \Delta_Y^{-1} \left( \boldsymbol{\Sigma}_Y - \Delta_Y \right) \Delta_Y^{-1} \right) \] \[ \mu = \log \mu_Y - \frac{1}{2} \mathrm{diag}\,\boldsymbol{\Sigma}\] then estimate \(\boldsymbol{\Sigma}_Y\) and \(\mu_Y\) by their respective empirical means.

2.1.1 Theoretical performance analysis

Two problems arise:

The data must verify the over-dispersion hypothesis empirically to be able to compute the logarithm.
It seems hard to control the bias created by estimating \(\Delta_Y^{-1}\). By using a Taylor series expansion, one can prove that is impossible to estimate the inverse of a Poisson parameter without bias.

2.1.2 Numerical performance analysis

Code

# Generate PLN data with covariates
rPLN_covar <- function(n) {
  sim <- list()
  (sim$Abundance <- rPLN(49))
  sim$Covariate <- mvnfast::rmvn(49, c(rep(0, 4)), diag(4))

  sim <- prepare_data(sim$Abundance, sim$Covariate)
  return(sim)
}

Code

# sapply()

# Fit PLN model parameters via moments estimation
PLN_moments_fit <- function(data = trichoptera) {
  d <- data$Abundance
  # mu_yi <- sapply(sapply(1 / (1 + d), mean), lambda_inverse)
  mu_yi <- 1 / colMeans(d)
  # print(rbind(mu_yi, 1 / diag(cov(d))))
  # options(scipen = 999)

  delta_yi <- diag(mu_yi)

  sigma_y <- cov(d)
  # diag(sigma_y) <- apply(rbind(1.1 / mu_yi, diag(sigma_y)), 2, max)
  ones <- matrix(1, ncol(d), ncol(d))
  m <- ones + delta_yi %*% sigma_y %*% delta_yi - delta_yi

  sigma <- log(m)
  sigma[is.na(sigma)] <- 0

  mu <- log(mu_yi) - 0.5 * diag(sigma)

  params <- list()
  params$Sigma <- sigma
  params$mu <- mu
  return(params)
}

poisson_inverse <- function(n = 10, lambda = 0) {
  x <- data.frame(rpois(n, lambda))
  ggplot(x, aes(x)) +
    geom_density()
}

inverse_fun <- function(f, lower = -100, upper = 100) {
  function(y) {
    uniroot((function(x) f(x) - y),
      lower = lower,
      upper = upper
    )[1]$root
  }
}

lambda_inverse <- inverse_fun(function(x) x * (1 - exp(-1 / x)), 0, 1000)

The moments estimators perform badly in comparison to the VEM estimators, especially as there is no reason that the empirical moments verify the over-dispersion condition.

In order to obtain better results, we could try to remove the zeroes in the counts, then estimate the Poisson parameters, before adding the correct number of zeroes.

However, they could be an alternative for the initialization of the VEM algorithm.

2.2 The new model

\[ (Z_1, \dots, Z_n) \sim \mathrm{SAR}(\mu, \boldsymbol{\Sigma}, \boldsymbol{\Phi}) \] and \[ Y_{ij} | Z_i \overset{\perp}{\sim} \mathcal{P}( \exp Z_{ij}), 1 \leq j \leq p, 1 \leq i \leq n \]

If \(\boldsymbol{\Phi}= \mathbf0\), then we retrieve the classical PLN model.

2.3 Computing the modified ELBO

Let us take the same approach as for non-AR PLN, and choose the set \(\mathcal{Q}\) of Gaussian variational distributions. This means that we will try to have correct results with the simplifying hypothesis that the variational variables \(Z_i \sim Q_{\psi_i}, 1 \leq i \leq n\) are mutually independent.

If this fails, there are other approaches, like choosing an AR structure for the variational variables.

Let us denote \(\theta = (\mu, \boldsymbol{\Sigma})\) and \(\psi = (m_1, \dots, m_n, \mathbf{S}_1, \dots, \mathbf{S}_n)\) such that \(Q_\psi \in \mathcal{Q}\).

Also, to be more general we place ourselves in regression, with \(\mathbf{B}= (\beta_1 | \dots | \beta_p) \in \mathbb{R}^{d \times p}\) and \(\mu_i= \mathbf{B}^\top X_i, 1 \leq i \leq n\) (for the non-conditional version, it suffices to take \(\mu_1 = \dots = \mu_n = \mu\)). Then: \[ \begin{aligned} J ( \theta, \psi; \mathbf{Y}) &= \mathbb{E}_\psi [ \ell_\theta ( \mathbf{Y}| \mathbf{Z}) + \ell_\theta (\mathbf{Z}) - \ell_\psi (\mathbf{Z}) | \mathbf{Y}] \\ &= \sum_{i=1}^n \mathbb{E}_{\psi_i} [ \ell ( \psi; Y_i | Z_i ) | Y_i ] + \mathbb{E}_{\psi_{i-1}, \psi_i} [ \ell (\theta, \psi_{i-1}, \psi_i; Z_{i-1}) ] - \mathbb{E}_\psi [ \ell (\psi; \mathbf{Z}) ] \\ J ( \theta, \psi; \mathbf{Y}) &= \sum_{i=1}^{n} J_i ( \theta, \psi_{i-1}, \psi_i; Y_i) \end{aligned} \]

where: \[ J_i ( \theta, \psi_{i-1}, \psi_i; Y_i) = \sum_{j=1}^{p} \mathbb{E}_{\psi_i^j} \left[ \left. \ell ( Y_i^j | Z_i^j ) \right| Y_i^j \right] - \mathbb{E}_{\psi_{i-1}} \left[ \mathrm{KL}\,\left( \left. \mathcal{N}_p (m_i, \mathbf{S}_i) \right\| \mathcal{N}_p ( \boldsymbol{\Phi}(Z_{i-1}-\mu_{i-1}) + \mu_i, \boldsymbol{\Sigma}_i ) \right) \right] \]

Finally, we compute the ELBO:

Proposition 2.1 For \(1 \leq i \leq n\): \[ \begin{aligned} J_i ( \theta, \psi_{i-1}, \psi_i; Y_i) &= - \left\| \exp \left( m_i + \tfrac{s_i^2}2 \right) \right\|_1 + \langle m_i, Y_i \rangle - \| \log (Y_i!) \|_1 \\ &- \frac12 \| \mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1}) \|_{\boldsymbol{\Sigma}_i}^2 + \frac12 \log \left| \boldsymbol{\Omega}_i \mathbf{S}_i \right| - \frac12 \langle \boldsymbol{\Omega}_i, \mathbf{S}_i + \boldsymbol{\Phi}\mathbf{S}_{i-1} \boldsymbol{\Phi}^\top \rangle + \frac{p}2 \end{aligned} \] where we defined:

\(\mu_0 = m_0 = 0\), \(\mathbf{S}_0 = 0\)
\(\boldsymbol{\Sigma}_1 = \boldsymbol{\Sigma}\), \(\boldsymbol{\Sigma}_i = \boldsymbol{\Sigma}_\varepsilon = \boldsymbol{\Sigma}- \boldsymbol{\Phi}\boldsymbol{\Sigma}\boldsymbol{\Phi}^\top\) for \(2 \leq i \leq n\)
\(\boldsymbol{\Omega}_i = \boldsymbol{\Sigma}_i^{-1}\)
\(s_i = \mathrm{diag}\,\mathbf{S}_i\).

Remark. Fortunately, when we force \(\boldsymbol{\Phi}=0\), we retrieve the non-AR ELBO.

2.4 Implementation

The PLNmodels package is written in R, with two backends, one in torch, the other in C++ using nlopt.

2.4.1 Initialisation

There are multiple possibilities. Either we start with a VE-step and we initialise \(\beta\), \(\boldsymbol{\Sigma}\) and \(\boldsymbol{\Phi}\), by:

the method of moments
their estimation through non-AR VEM
a non-zero random value (in the case of \(\boldsymbol{\Phi}\))

or we start with an M-step and we initialise \(m_1, \dots, m_n, \mathbf{S}_1, \dots, \mathbf{S}_n\) and \(\beta\) using a Poisson regression, which is the current choice for the initialization of the non-AR VEM algorithm.

For \(\boldsymbol{\Phi}\), we could also use AR estimation techniques, after estimating \(\mathbf{Z}\) with non-AR VEM or Poisson regression.

2.4.2 M-step

Since there were explicit expressions for the zeroes of the objective function for classical PLN, we should hope the same goes for PLN-AR. But the AR parameter \(\Phi\) complicates the differentiation. Explicit expressions are useful because they allow us to compute the updates for the C++ backend, and a profiled ELBO, useful to speed up the torch backend.

Let us denote \(\widetilde{\mathbf{S}} = \left( s_1 | \dots | s_n \right)\) the matrix whose rows are the diagonals of the \(\mathbf{S}_1, \dots, \mathbf{S}_n\).

\(\mu\): \[ \nabla_{\mu_i} J (\theta, \psi) = ( \boldsymbol{\Phi}\Omega_{i+1} \boldsymbol{\Phi}^\top - \Omega_i ) (\mu_i - m_i) + \Omega_i \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1}) + \boldsymbol{\Phi}^\top \Omega_{i+1} (\mu_{i+1} - m_{i+1}) \]
\(\boldsymbol{\Sigma}\): Let us denote: \[ \mathbf{F}_i = \frac12 \boldsymbol{\Omega}_i \left[ (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1})) (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1}))^\top + \mathbf{S}_i + \boldsymbol{\Phi}\mathbf{S}_{i-1} \boldsymbol{\Phi}^\top - \boldsymbol{\Sigma}_i \right] \boldsymbol{\Omega}_i \] Then: \[ \nabla_{\boldsymbol{\Sigma}} J (\theta, \psi) = \mathbf{F}_1 + \sum_{i=2}^n \mathbf{F}_i + \boldsymbol{\Phi}^\top \mathbf{F}_i \boldsymbol{\Phi} \]
\(\boldsymbol{\Phi}\): \[ \begin{aligned} \nabla_{\boldsymbol{\Phi}} J (\theta, \psi) = \boldsymbol{\Omega}_\varepsilon \sum_{i=2}^n &\boldsymbol{\Phi}\boldsymbol{\Sigma}- \boldsymbol{\Phi}\mathbf{S}_{i-1} + (\mathbf{S}_i + \boldsymbol{\Phi}\mathbf{S}_{i-1} \boldsymbol{\Phi}^\top) \boldsymbol{\Omega}_\varepsilon \boldsymbol{\Phi}\boldsymbol{\Sigma}\\ &- (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1})) (\mu_{i-1} - m_{i-1})^\top \\ &- (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1})) (\mu_i - m_i - \boldsymbol{\Phi}(\mu_{i-1} - m_{i-1}))^\top \boldsymbol{\Omega}_\varepsilon \Phi^\top \boldsymbol{\Sigma} \end{aligned} \]

2.4.3 VE-step

We need to compute the gradients, but their solutions are non-explicit, so we will use the nlopt optimisation library in C++.

\(\mathbf{M}\): \[ \begin{aligned} \nabla_{\mathbf{M}} J (\theta, \psi) = &- \exp(\mathbf{M}+ \frac12 \widetilde{\mathbf{S}}) + \mathbf{Y}- e_1^\top \boldsymbol{\Omega}(\mu_1 - \mathbf{M}^\top e_1) \\ &- \boldsymbol{\Omega}_\varepsilon (\boldsymbol{\mu}- \mathbf{M}- \boldsymbol{\Phi}(\boldsymbol{\mu}- \mathbf{M}) \mathbf{N}) + \boldsymbol{\Phi}^\top \boldsymbol{\Omega}_\varepsilon (\boldsymbol{\mu}- \mathbf{M}- \boldsymbol{\Phi}(\boldsymbol{\mu}- \mathbf{M}) \mathbf{N}) \mathbf{N}^\top \end{aligned} \]

where \(\mathbf{N}= \begin{pmatrix} 0 & 1 & \dots & 0 \\ \vdots & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & 1 \\ 0 & \dots & \dots & 0 \end{pmatrix}\).
\(\mathbf{S}_i\): \[ \begin{aligned} 2 \nabla_{\mathbf{S}_i} J (\theta, \psi) = \mathbf{S}_i^{-1} - \exp(m_i + \frac12 \mathrm{diag}\,\mathbf{S}_i) - (\boldsymbol{\Omega}_i + \boldsymbol{\Phi}^\top \boldsymbol{\Omega}_{i+1} \boldsymbol{\Phi}) \end{aligned} \]

2.5 Study of the conditional distribution \(Z | Y\)

Code

library(mvnfast)
library(tidyverse)

n <- 5e5
p <- 1
mu <- rep(log(10), p)
Sigma <- diag(log(3), p)
depths <- rowSums(exp(rep(1, n) %o% diag(Sigma) / 2 + mu)) # Null offsets

rPLN_withZ <- function(n = 10, mu = rep(0, ncol(Sigma)), Sigma = diag(
                         1, 5,
                         5
                       ), depths = rep(10000, n)) {
  p <- ncol(Sigma)
  if (any(is.vector(mu), ncol(mu) == 1)) {
    mu <- matrix(rep(mu, n), ncol = p, byrow = TRUE)
  }
  if (length(depths) != n) {
    depths <- rep(depths[1], n)
  }
  exp_depths <- rowSums(exp(rep(1, n) %o% diag(Sigma) / 2 + mu))
  offsets <- log(depths %o% rep(1, p)) - log(exp_depths)
  Z <- mu + rmvn(n, rep(0, ncol(Sigma)), as.matrix(Sigma)) +
    offsets
  Y <- matrix(rpois(n * p, as.vector(exp(Z))), n, p)
  dimnames(Y) <- list(paste0("S", 1:n), paste0("Y", 1:p))
  dimnames(Z) <- list(paste0("S", 1:n), paste0("Z", 1:p))
  attr(Y, "offsets") <- offsets
  data.frame(Y, Z)
}

Z_cond <- rPLN_withZ(n, mu, Sigma, depths)
choice <- Z_cond |>
  group_by(Y1) |>
  summarise(n = n()) |>
  arrange(desc(n))
Z_cond <- Z_cond |>
  filter(Y1 == choice[1, 1]) |>
  select(Z1) |>
  mutate(Z_norm = rmvn(n(), mean(Z1), var(Z1))) |>
  pivot_longer(everything())

p0 <- ggplot(Z_cond) +
  geom_density(aes(x = value, color = name))
p0

In dimension \(1\), the conditional distribution is almost Gaussian around \(\log (Y)\). Using a Taylor development, we obtain expressions of its expectation and its variance with the Lambert \(W\) implicit functions:
- \(\mu (y) = y - W(e^y) = \log y - \frac{\log y}y + o(1)\)
- \(\sigma^2 (y) = (1 + W(e^y))^{-1}\)
In dimension \(p>1\), it is hard to compute the parameters of this Gaussian approximation.