ICS and the discriminant subspace
using the fourth-order covariance operator

Workshop on Invariant Coordinate Selection and Related Methods
University of Helsinki

Doctoral Workshop on Decision Mathematics and Statistics

June 4, 2026

Introduction

From Principal Component Analysis…

Figure 1: Ellipse of \(\Sigma\) (covariance) on a 2D two-group Gaussian mixture. Computed with the icspylab package.

…to Invariant Coordinate Selection

Figure 2: Ellipses of \(\Sigma\) (covariance) and \(\Sigma_4\) (fourth-order covariance) on a 2D two-group Gaussian mixture. Computed with the icspylab package.

Tip

Both scatters differ in shape: \(\Sigma\) is less attracted by the small group than \(\Sigma_4\). ICS jointly diagonalizes the two in order to compare them and to reveal the direction from group 0 to group 1.

Motivation

If \(P\) is a distribution over \(\mathbb R^p\) with expectation \(\mu\), we define

  • the covariance \(\Sigma = \int (x-\mu) (x-\mu)^\top \, \mathrm{d}P(x)\) that we assume is invertible.
  • the fourth-order covariance \(\Sigma_4 = \frac1{p+2} \int (x-\mu)^\top \Sigma^{-1} (x-\mu) (x-\mu) (x-\mu)^\top \, \mathrm{d}P(x)\).

Invariant Coordinate Selection (ICS) relies on eigendecomposition of \(\Sigma^{-1} \Sigma_4\) to reveal interesting projections, understand the structure of \(P\) (clusters, outliers) (Tyler et al. 2009).

Assumption \(P\) is an elliptical location mixture distribution with \(k\) groups
Prior work Becquart et al. (2026)
Objectives 1. How to compute \(\Sigma_4\) explicitly?
2. Which (simple) equation do the eigenvalues of ICS satisfy?
3. Why are some eigenvalues small and other large?
4. When do the eigenvectors of ICS span the discriminant subspace?
5. Which groups are separated by which eigenvectors?
Tools Affine geometry, Barycentric coordinates, Symbolic calculus

Mixture distributions

  • Let \(P = \sum_{i=0}^{k-1} \pi_i P_i\) be a mixture distribution on \(\mathbb R^p\), where the group proportions are \[(\pi_0, \dots, \pi_{k-1})^\top \in \mathcal S^k = \{ \pi \in (\mathbb R_+^*)^k | \sum_{i=0}^{k-1} \pi_i = 1 \}.\]

  • If each \(P_i\) has mean \(\mu_i\) then the mean of \(P\) is \[\mu = \sum_{i=0}^{k-1} \pi_i \mu_i.\]

  • If moreover \(P_0, \dots, P_{k-1}\) share the same within-group covariance matrix \(W\), then the covariance matrix of \(P\) is \[\Sigma = W + B\] where \(B = \sum_{i=0}^{k-1} \pi_i \delta_i \delta_i^\top\) is the between-group covariance matrix that is based on the group directions \[\delta_i = \mu_i - \mu, 0 \leq i \leq k-1.\]

  • How to compute the fourth-order covariance matrix \(\Sigma_4\)?

A general formula by Peña, Prieto, and Viladomat (2010)

Theorem 1 (Peña, Prieto, and Viladomat (2010)) If the \(P_i, 0 \leq i \leq k-1\) are elliptical distributions, then \[(p+2) \Sigma_4 = \alpha \Sigma + \sum_{0 \leq i,j \leq k-1} \beta_{ij} \delta_i \delta_j^\top \tag{1}\] where

  • \(\alpha = \tilde k p + (1 − \tilde k) \sum_{i=0}^{k-1} \pi_i \delta_i^\top \Sigma^{-1} \delta_i + \bar k\)
  • \(\beta_{ij} = \delta_{ij} \gamma \pi_i + (\delta_{ij} \pi_i + \eta \, \pi_i \pi_j) \delta_i^\top \Sigma^{-1} \delta_j, 0 \leq i,j \leq k-1\)
  • \(\gamma = (1 − \tilde k)p − 2\bar k + 4\)
  • \(\eta = \tilde k + \bar k - 6\)
  • \(\tilde k, \bar k\) are some ellipticity parameters.

Moreover, \(\alpha\) is an eigenvalue of \((p+2) \Sigma^{-1} \Sigma_4\) with multiplicity greater than or equal to \(p-r\), where \(r \leq k-1\) is the dimension of the affine space generated by the group means.

But…

Equation 1 is incorrect.

A general and correct formula by Peña, Prieto, and Viladomat (2010)

Theorem 2 (Peña, Prieto, and Viladomat (2010), corrected) If the \(P_i, 0 \leq i \leq k-1\) are elliptical distributions, then \[\begin{aligned} (p+2) \Sigma_4 &= \underbrace{\alpha \Sigma}_{\color{#6a4c93}{B = 0 \text{ (elliptical)}}} \\ &\quad + \underbrace{(\tilde k - 1) \operatorname{tr} (W \Sigma^{-1}) W + (\bar k - 2) W \Sigma^{-1} W}_{\color{#3a7d5c}{B \neq 0 \text{ and } W \neq 0 \text{ (non-Gaussian)}}} \\ &\quad - \underbrace{\operatorname{tr} (B \Sigma^{-1}) B + 2 B \Sigma^{-1} B}_{\color{#2d6a8a}{B \neq 0 \text{ and } W \neq 0}} \\ &\quad + \underbrace{\sum_{i=0}^{k-1} \pi_i \delta_i^\top \Sigma^{-1} \delta_i \delta_i \delta_i^\top}_{\color{#b8860b}{W = 0 \text{ (discrete)}}} \end{aligned}\] where \(\alpha = \tilde k p + (1 − \tilde k) \sum_{i=0}^{k-1} \pi_i \delta_i^\top \Sigma^{-1} \delta_i + \bar k\) and \(\tilde k, \bar k\) are some ellipticity parameters.

Moreover, \(\alpha\) is an eigenvalue of \((p+2) \Sigma^{-1} \Sigma_4\) with multiplicity greater than or equal to \(p-r\), where \(r \leq k-1\) is the dimension of the affine space generated by the group means.

A general and correct formula by Peña, Prieto, and Viladomat (2010), Gaussian case

Corollary 1 (Peña, Prieto, and Viladomat (2010), Gaussian) If the \(P_i, 0 \leq i \leq k-1\) are Gaussian distributions, then \[\begin{aligned} (p+2) \Sigma_4 &= \underbrace{(p+2) \Sigma}_{\color{#6a4c93}{B = 0 \text{ (elliptical)}}} \\ &\quad - \underbrace{\left( \operatorname{tr} (B \Sigma^{-1}) B + 2 B \Sigma^{-1} B \right)}_{\color{#2d6a8a}{B \neq 0 \text{ and } W \neq 0}} \\ &\quad + \underbrace{\sum_{i=0}^{k-1} \pi_i \delta_i^\top \Sigma^{-1} \delta_i \, \delta_i \delta_i^\top}_{\color{#b8860b}{W = 0 \text{ (discrete)}}} \end{aligned}\] Moreover, \(1\) is an eigenvalue of \(\Sigma^{-1} \Sigma_4\) with multiplicity greater than or equal to \(p-r\), where \(r \leq k-1\) is the dimension of the affine space generated by the group means.

But…

This formula relies on \(\Sigma^{-1}\) that is not explicit in the parameters \(W\), \((\mu_0, \dots, \mu_{k-1}), (\pi_1, \dots, \pi_{k-1})\) of \(P\).

The case of a discrete distribution on simplex vertices

The setting

  • From now on we assume that \(P\) is a discrete distribution on simplex vertices, so that

    1. \(W=0\), i.e \(\Sigma = B\): no within-group noise
    2. \(p = k-1\)
    3. \((\mu_0, \dots, \mu_{k-1})\) are in general position, i.e. form an affine basis aka a simplex in \(\mathbb R^{k-1}\).
  • In that case, Mahalanobis inner products between the group directions \(\delta_j \Sigma^{-1} \delta_i\) become tractable, and we prove that the eigenvalues of \(\Sigma^{-1} \Sigma_4\) depend only on group proportions.

An edge case

  • Simulations suggest this is specific to this case: when \(W \neq 0\) or the group means are not in general position, their directions matter.
  • This situation is interesting as a limit case when \(W B^{-1} \rightarrow 0\).

Computing \(\Sigma_4\)

Let us begin by proving a simple formula for \(\Sigma_4\).

Lemma 1 Let \(P = \sum_{i=0}^{k-1} \pi_i \delta_{\mu_i}\) be a discrete distribution on the simplex \((\mu_0, \dots, \mu_{k-1})\) of \(\mathbb R^{k-1}\) and with \((\pi_0, \dots, \pi_{k-1})^\top \in \mathcal S^k\). With \(\mu = \sum_{i=0}^{k-1} \pi_i \mu_i\) and \(\delta_i = \mu_i - \mu, 0 \leq i \leq k-1\) \[ \Sigma_4 = \frac1{k+1} \sum_{i=0}^{k-1} (1 - \pi_i) \delta_i \delta_i^\top \]

A rational equation for eigenvalues

Now we can derive a simple rational equation depending only on proportions satisfied by the eigenvalues of \(\Sigma_4 \Sigma^{-1}\) and express their corresponding eigenvectors.

Theorem 3 Let \(P = \sum_{i=0}^{k-1} \pi_i \delta_{\mu_i}\) be a discrete distribution on the simplex \((\mu_0, \dots, \mu_{k-1})\) of \(\mathbb R^{k-1}\) and \(\lambda > 0\). Let us denote

  • \(\alpha_i = \frac{1-\pi_i}{\pi_i}\) the odds against group \(i \in \{ 0, \dots, k-1 \}\)
  • \(I_\lambda = \left\{ i \in \{0, \dots, k-1 \} \mid \alpha_i = \lambda \right\}\)
  • \(E_\lambda = \operatorname{ker}(\Sigma_4\Sigma^{-1} - \lambda I_{k-1})\).

Then \(\lambda\) is an eigenvalue of \((k+1) \Sigma_4 \Sigma^{-1}\) if and only if one of the following holds:

  1. \(I_\lambda = \emptyset\) and \[\sum_{i=0}^{k-1} \frac{\pi_i}{\alpha_i - \lambda} = 0 \tag{2}\] In that case, \(\lambda\) is simple and \(E_\lambda = \operatorname{span}(h)\) where \(h = \sum_{i=0}^{k-1} \frac{1}{\alpha_i - \lambda}\,\delta_i\).
  2. \(| I_\lambda | \geq 2\). In that case, \(\lambda\) has multiplicity \(|I_\lambda| - 1\) and \(E_\lambda = \operatorname{span} (\delta_i - \delta_{i_0})_{i \in I_\lambda}\) for any \(i_0 \in I_\lambda\).

Visualising the eigenvalues of \((k+1) \Sigma_4 \Sigma^{-1}\) for \(k=5\)

Figure 3: Graph of \(\displaystyle f(x) = \sum_{i=0}^{k-1} \frac{\pi_i}{\alpha_i - x}\). The eigenvalues of \((k+1) \Sigma_4 \Sigma^{-1}\) (zeroes of \(f\)) are interlaced with the odds against each group (poles of \(f\)).

Computing the characteristic polynomial

We can reunite both cases in Theorem 3 using the characteristic polynomial \[ \chi = \det(t \, I_{k-1} - (k+1) \Sigma^{-1} \Sigma_4) \] which is the unit polynomial whose roots are the eigenvalues of \((k+1) \Sigma^{-1} \Sigma_4\) (with multiplicities).

Corollary 2 Let \(P = \sum_{i=0}^{k-1} \pi_i \delta_{\mu_i}\) be a discrete distribution on the simplex \((\mu_0, \dots, \mu_{k-1})\) of \(\mathbb R^{k-1}\) and \(\alpha_i = \frac{1-\pi_i}{\pi_i}\) the odds against group \(i \in \{ 0, \dots, k-1 \}\). Let \(Q = \prod_{i=0}^{k-1} (t - \alpha_i)\) and \(\chi = \det(t \, I_{k-1} - (k+1) \Sigma^{-1} \Sigma_4)\). Then \[ \begin{aligned} \chi &= \frac{Q + Q'}{t+1} = Q(t) \sum_{i=0}^{k-1} \frac{\pi_i}{t - \alpha_i} \\ &= \tfrac1{\prod_{i=0}^{k-1} \pi_i} \sum_{i=0}^k (-1)^{k-i} (t+1)^{i - 2} (t + 1 + i) e_i(\pi_0, \dots, \pi_{k-1}) \end{aligned} \] using the elementary symmetric polynomials \[e_i(\pi_0, \dots, \pi_{k-1}) = \sum_{0\leq j_1<j_2<\cdots<j_i\leq k-1} \pi_{j_1}\pi_{j_2}\dotsm \pi_{j_i}, 0 \leq i \leq k.\]

Application. A criterion for discriminant subspace reconstructability

A criterion for discriminant subspace reconstructability

Theorem 4 (Tyler et al. (2009)) If \(P\) is a mixture of \(k\) elliptical distributions with same within-group covariance, there exists a noise eigenvalue that has multiplicity at least \(p-r\) where \(r \leq k-1\) is the dimension of the affine space generated by the group means. If its multiplicity is minimal, then the remaining eigenvectors span the discriminant subspace.

Step 1: Identifying the noise eigenvalue. For \((\Sigma, (k+1) \Sigma_4)\), Theorem 2 shows that it is \(\alpha\), which corresponds to the proportionality constant when \(B=0\) (elliptical case).

Step 2: Characterising the parameters for which the multiplicity of this eigenvalue is minimal. In the case of a discrete distribution on a simplex, this means that \(k+1\) is not an eigenvalue of \((k+1)\Sigma_4 \Sigma^{-1}\), i.e. \[\chi (k+1) \neq 0\]

  • For \(k=2\), we recover the condition

\[6 \pi_{0} \pi_{1} - 1 \neq 0 \iff \pi_0 \notin \{ \tfrac12 - \tfrac{\sqrt{3}}6, \tfrac12 + \tfrac{\sqrt{3}}6 \} \approx \{ 0.2113, 0.7887 \}.\]

  • For \(k=3\), we obtain the condition

\[40 \pi_{0} \pi_{1} \pi_{2} - 7 \pi_{0} \pi_{1} - 7 \pi_{0} \pi_{2} - 7 \pi_{1} \pi_{2} + 1\neq 0 \]

Example for \(k=3\). Beyond the zero-noise general position case

Appendix

Proof of Lemma 1

Since \(P = \sum_i \pi_i \delta_{\mu_i}\), expanding the definition of \(\Sigma_4\) gives: \[(k+1)\Sigma_4 = \sum_{i=0}^{k-1} \pi_i (\delta_i^\top \Sigma^{-1} \delta_i) \, \delta_i \delta_i^\top.\] It remains to compute the Mahalanobis distances \(\delta_i^\top \Sigma^{-1} \delta_i, 0 \leq i \leq k-1\) between the group means and the global mean. Since \(\Sigma = \sum_{i=0}^{k-1} \pi_i \delta_i \delta_i^\top\) and \(\sum_{i=0}^{k-1} \pi_i \delta_i = 0\), then for any \(0 \leq j \leq k-1\): \[\delta_j = \Sigma \Sigma^{-1} \delta_j = \sum_{i=0}^{k-1} \pi_i \left( \delta_i^\top \Sigma^{-1}\delta_j + 1 \right) \delta_i, \qquad \sum_{i=0}^{k-1} \pi_i \bigl(\delta_i^\top \Sigma^{-1}\delta_j + 1\bigr) = 1.\] By uniqueness of the barycentric coordinates of \(\delta_j\) in the affine basis \((\delta_0, \dots, \delta_{k-1})\), we get \[\pi_j(\delta_j^\top \Sigma^{-1}\delta_j + 1) = 1, \qquad (k+1)\Sigma_4 = \sum_{i=0}^{k-1}(1-\pi_i)\,\delta_i\delta_i^\top.\]

Proof of Theorem 3

Let \(M_\lambda = (k+1) (\Sigma_4 \Sigma^{-1} - \lambda I_{k-1})\), \(E_\lambda = \ker M_\lambda\), \(F_\lambda = \ker(\Sigma^{-1}\Sigma_4 - \lambda I_{k-1}) = \Sigma^{-1} E_\lambda\), \(\beta_i = \alpha_i - (k+1)\lambda\) and \(\pi_\lambda = \frac1{(k+1)\lambda+1}\) for \(0 \leq i \leq k-1\). Note that \(\beta_i = 0 \iff \pi_i = \pi_\lambda\). Let \(v = \sum_{i=0}^{k-1} v_i \delta_i \in \mathbb R^{k-1}\) with \(\sum_{i=0}^{k-1} v_i = 1\) (barycentric coordinates). From the proof of Lemma 1, \(\delta_j^\top \Sigma^{-1} \delta_i = \frac{\delta_{ij} - \pi_j}{\pi_j}\) for \(0 \leq i,j \leq k-1\), so \[(k+1)\Sigma_4\Sigma^{-1}\,\delta_i = \sum_{j=0}^{k-1} (1-\pi_j)\frac{\delta_{ij} - \pi_j}{\pi_j}\,\delta_j = \alpha_i\,\delta_i - \sum_{j=0}^{k-1} \delta_j, \quad M_\lambda v = \sum_{j=0}^{k-1} (\beta_j v_j - 1) \delta_j.\] Since \(\sum_{i=0}^{k-1} \pi_i \delta_i = 0\) we have \[v \in E_\lambda \iff \exists\, t \in \mathbb R, \; \forall\, 0 \leq i \leq k-1, \quad \beta_i v_i = 1 - t\frac{\pi_i}{\pi_\lambda}.\] Set \(h_i = 1/\beta_i\) when \(i \notin I_\lambda\). Since \(\pi_i \beta_i = 1 - \pi_i/\pi_\lambda\), we get \(\pi_i h_i = \pi_\lambda(h_i - \pi_i)\), so \[v \in E_\lambda \iff \exists\, t \in \mathbb R, \quad \begin{cases} 1 - t\frac{\pi_i}{\pi_\lambda} = 0 & \text{for } i \in I_\lambda \\ v_i = (1-t)\,h_i + t\,\pi_i & \text{for } i \notin I_\lambda \end{cases} \qquad (\star)\]

Case a. \(I_\lambda = \emptyset\). Set \(h = \sum_{i=0}^{k-1} h_i \delta_i\). Summing \((\star)\) over \(0 \leq i \leq k-1\) and using \(\sum_{i=0}^{k-1} v_i = 1\), \(\sum_{i=0}^{k-1} \pi_i = 1\) gives \[(1-t)\Bigl(\sum_{i=0}^{k-1} h_i - 1\Bigr) = 0 \qquad (\star\star)\] so \((\star\star) \iff t = 1\) or \(\sum_{i=0}^{k-1} h_i = 1\). If \(t = 1\), then \(v_i = \pi_i\) for \(0 \leq i \leq k-1\), so \(v = \sum_{i=0}^{k-1} \pi_i \delta_i = 0\), which is trivial. Therefore \[E_\lambda \neq \{0\} \iff \sum_{i=0}^{k-1} h_i = 1 \iff \sum_{i=0}^{k-1} \frac{1}{\alpha_i - (k+1)\lambda} = 1\] and then \(E_\lambda = \operatorname{span}(h)\), so \(\lambda\) is simple. We have \(F_\lambda = \operatorname{span}(\Sigma^{-1} h)\) with \(\delta_i^\top \Sigma^{-1} h = (h_i - \pi_i)/\pi_i = h_i/\pi_\lambda\) for \(0 \leq i \leq k-1\).

Case b. \(I_\lambda \neq \emptyset\). Then \((\star)\) gives \(t = 1\), so \[v \in E_\lambda \iff v_i = \pi_i \text{ for } i \notin I_\lambda \text{ and } \sum_{i \in I_\lambda} v_i = 1 - \sum_{i \notin I_\lambda} \pi_i.\] Setting \(c_i = v_i - \pi_i\) for \(i \in I_\lambda\), the constraint becomes \(\sum_{i \in I_\lambda} c_i = 0\) and \(v = \sum_{i \in I_\lambda} c_i\,\delta_i\). Hence \(E_\lambda = \bigl\{\sum_{i \in I_\lambda} c_i\,\delta_i | \sum_{i \in I_\lambda} c_i = 0\bigr\}\), so \(E_\lambda \neq \{0\} \iff |I_\lambda| \geq 2\), and then \(\dim E_\lambda = |I_\lambda| - 1\). Since \(x \in F_\lambda \iff \Sigma x \in E_\lambda\) and the barycentric coordinate of \(\Sigma x\) on \(\delta_i\) is \(\pi_i(\delta_i^\top x + 1)\), we get \(x \in F_\lambda \iff \delta_i^\top x = 0\) for \(i \notin I_\lambda\), i.e. \(F_\lambda = \{\delta_i, i \notin I_\lambda\}^\perp\).

Proof of Corollary 2

Let \(r\) be the number of distinct roots of \(Q\) and \(\operatorname{rad}(Q)\) the unit polynomial of degree \(r\) whose roots are those of \(Q\). Then \(\frac{Q}{\operatorname{rad}(Q)} = \gcd(Q,Q')\) is the unit polynomial of degree \(k-r\) obtained by lowering the multiplicities of the roots of \(Q\) by 1. The Theorem 3 allows us to write \(\chi = \frac{Q}{\operatorname{rad}(Q)} \cdot \frac{H \circ u}{(k+1)^{k-1} u}\) where \(H\) is the unit polynomial of degree \(r-1\) whose roots are the solutions of \(\frac{Q'}{Q} \circ u(\lambda) + 1 = 0\) with multiplicity 1.

Furthermore, we have \(Q + Q' = \gcd(Q,Q') R\) where \(\gcd(Q,R) = 1\) and \(R\) is a unit polynomial of degree \(r\). Since \(Q(0) = (-1)^k \prod_{i=0}^{k-1} \frac1{\pi_i} > 0\) and \(Q'(0) = (-1)^{k-1} \sum_{i=0}^{k-1} \prod_{j \neq i} \frac1{\pi_j} = - \sum_{i=0}^{k-1} \pi_i Q(0) = - Q(0)\), then \(t\) divides \(Q + Q'\), so \(t\) divides \(R\). It suffices to show that \(\frac{R}{t} = H\). Since both polynomials are unital of degree \(r-1\), it suffices to verify that every root of \(H\) is a root of \(\frac{R}{t}\). Now, if \(v\) is a root of \(H\), it satisfies \(Q(v) \neq 0\) and \(\frac{Q'}{Q}(v) + 1 = 0\), so \(Q(v) + Q'(v) = 0\) but \(v \gcd(Q,Q')(v) \neq 0\), so \(\frac{R(v)}v = 0\). The second equality follows from \(Q = \sum_{i=0}^{k-1} (-1)^{k-1-i} t^i e_{k-1-i}(\pi_0^{-1}, \dots, \pi_{k-1}^{-1})\) and from \(e_{k-1-i}(\pi_0^{-1}, \dots, \pi_{k-1}^{-1}) = \frac1{\prod_{i=0}^{k-1} \pi_i} e_i (\pi_0, \dots, \pi_{k-1})\).

References

Becquart, Colombe, Aurore Archimbaud, Anne Ruiz-Gazen, Luka Prlić, and Klaus Nordhausen. 2026. “Invariant Coordinate Selection and Fisher Discriminant Subspace Beyond the Case of Two Groups.” Journal of Multivariate Analysis 211 (January): 105521. https://doi.org/10.1016/j.jmva.2025.105521.
Peña, Daniel, Francisco J. Prieto, and Júlia Viladomat. 2010. “Eigenvectors of a Kurtosis Matrix as Interesting Directions to Reveal Cluster Structure.” Journal of Multivariate Analysis 101 (9): 1995–2007. https://doi.org/10.1016/j.jmva.2010.04.014.
Tyler, David E., Frank Critchley, Lutz Dümbgen, and Hannu Oja. 2009. “Invariant Co-Ordinate Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (3): 549–92. https://doi.org/10.1111/j.1467-9868.2009.00706.x.