ICS and the discriminant subspace
using the fourth-order covariance operator

Workshop on Invariant Coordinate Selection and Related Methods
University of Helsinki
Doctoral Workshop on Decision Mathematics and Statistics

Camille Mondon

camille.mondon@tse-fr.eu

Toulouse School of Economics

Colombe Becquart

colombe.becquart@tse-fr.eu

Toulouse School of Economics

June 4, 2026

Introduction

From Principal Component Analysis…

Figure 1: Ellipse of \(\Sigma\) (covariance) on a 2D two-group Gaussian mixture. Computed with the `icspylab` package.

…to Invariant Coordinate Selection

Figure 2: Ellipses of \(\Sigma\) (covariance) and \(\Sigma_4\) (fourth-order covariance) on a 2D two-group Gaussian mixture. Computed with the `icspylab` package.

Tip

Both scatters differ in shape: \(\Sigma\) is less attracted by the small group than \(\Sigma_4\). ICS jointly diagonalizes the two in order to compare them and to reveal the direction from group 0 to group 1.

Motivation

If \(P\) is a distribution over \(\mathbb R^p\) with expectation \(\mu\), we define

the covariance \(\Sigma = \int (x-\mu) (x-\mu)^\top \, \mathrm{d}P(x)\) that we assume is invertible.
the fourth-order covariance \(\Sigma_4 = \frac1{p+2} \int (x-\mu)^\top \Sigma^{-1} (x-\mu) (x-\mu) (x-\mu)^\top \, \mathrm{d}P(x)\).

Invariant Coordinate Selection (ICS) relies on eigendecomposition of \(\Sigma^{-1} \Sigma_4\) to reveal interesting projections, understand the structure of \(P\) (clusters, outliers) (Tyler et al. 2009).

Assumption	\(P\) is an elliptical location mixture distribution with \(k\) groups
Prior work	Becquart et al. (2026)
Objectives	1. How to compute \(\Sigma_4\) explicitly? 2. Which (simple) equation do the eigenvalues of ICS satisfy? 3. Why are some eigenvalues small and other large? 4. When do the eigenvectors of ICS span the discriminant subspace? 5. Which groups are separated by which eigenvectors?
Tools	Affine geometry, Barycentric coordinates, Symbolic calculus

Mixture distributions

Let \(P = \sum_{i=0}^{k-1} \pi_i P_i\) be a mixture distribution on \(\mathbb R^p\), where the group proportions are \[(\pi_0, \dots, \pi_{k-1})^\top \in \mathcal S^k = \{ \pi \in (\mathbb R_+^*)^k | \sum_{i=0}^{k-1} \pi_i = 1 \}.\]
If each \(P_i\) has mean \(\mu_i\) then the mean of \(P\) is \[\mu = \sum_{i=0}^{k-1} \pi_i \mu_i.\]
If moreover \(P_0, \dots, P_{k-1}\) share the same within-group covariance matrix \(W\), then the covariance matrix of \(P\) is \[\Sigma = W + B\] where \(B = \sum_{i=0}^{k-1} \pi_i \delta_i \delta_i^\top\) is the between-group covariance matrix that is based on the group directions \[\delta_i = \mu_i - \mu, 0 \leq i \leq k-1.\]
How to compute the fourth-order covariance matrix \(\Sigma_4\)?

A general formula by Peña, Prieto, and Viladomat (2010)

Theorem 1 (Peña, Prieto, and Viladomat (2010)) If the \(P_i, 0 \leq i \leq k-1\) are elliptical distributions, then \[(p+2) \Sigma_4 = \alpha \Sigma + \sum_{0 \leq i,j \leq k-1} \beta_{ij} \delta_i \delta_j^\top \tag{1}\] where

\(\alpha = \tilde k p + (1 − \tilde k) \sum_{i=0}^{k-1} \pi_i \delta_i^\top \Sigma^{-1} \delta_i + \bar k\)
\(\beta_{ij} = \delta_{ij} \gamma \pi_i + (\delta_{ij} \pi_i + \eta \, \pi_i \pi_j) \delta_i^\top \Sigma^{-1} \delta_j, 0 \leq i,j \leq k-1\)
\(\gamma = (1 − \tilde k)p − 2\bar k + 4\)
\(\eta = \tilde k + \bar k - 6\)
\(\tilde k, \bar k\) are some ellipticity parameters.

Moreover, \(\alpha\) is an eigenvalue of \((p+2) \Sigma^{-1} \Sigma_4\) with multiplicity greater than or equal to \(p-r\), where \(r \leq k-1\) is the dimension of the affine space generated by the group means.

But…

Equation 1 is incorrect.

A general and correct formula by Peña, Prieto, and Viladomat (2010)

Theorem 2 (Peña, Prieto, and Viladomat (2010), corrected) If the \(P_i, 0 \leq i \leq k-1\) are elliptical distributions, then \[\begin{aligned} (p+2) \Sigma_4 &= \underbrace{\alpha \Sigma}_{\color{#6a4c93}{B = 0 \text{ (elliptical)}}} \\ &\quad + \underbrace{(\tilde k - 1) \operatorname{tr} (W \Sigma^{-1}) W + (\bar k - 2) W \Sigma^{-1} W}_{\color{#3a7d5c}{B \neq 0 \text{ and } W \neq 0 \text{ (non-Gaussian)}}} \\ &\quad - \underbrace{\operatorname{tr} (B \Sigma^{-1}) B + 2 B \Sigma^{-1} B}_{\color{#2d6a8a}{B \neq 0 \text{ and } W \neq 0}} \\ &\quad + \underbrace{\sum_{i=0}^{k-1} \pi_i \delta_i^\top \Sigma^{-1} \delta_i \delta_i \delta_i^\top}_{\color{#b8860b}{W = 0 \text{ (discrete)}}} \end{aligned}\] where \(\alpha = \tilde k p + (1 − \tilde k) \sum_{i=0}^{k-1} \pi_i \delta_i^\top \Sigma^{-1} \delta_i + \bar k\) and \(\tilde k, \bar k\) are some ellipticity parameters.

A general and correct formula by Peña, Prieto, and Viladomat (2010), Gaussian case

Corollary 1 (Peña, Prieto, and Viladomat (2010), Gaussian) If the \(P_i, 0 \leq i \leq k-1\) are Gaussian distributions, then \[\begin{aligned} (p+2) \Sigma_4 &= \underbrace{(p+2) \Sigma}_{\color{#6a4c93}{B = 0 \text{ (elliptical)}}} \\ &\quad - \underbrace{\left( \operatorname{tr} (B \Sigma^{-1}) B + 2 B \Sigma^{-1} B \right)}_{\color{#2d6a8a}{B \neq 0 \text{ and } W \neq 0}} \\ &\quad + \underbrace{\sum_{i=0}^{k-1} \pi_i \delta_i^\top \Sigma^{-1} \delta_i \, \delta_i \delta_i^\top}_{\color{#b8860b}{W = 0 \text{ (discrete)}}} \end{aligned}\] Moreover, \(1\) is an eigenvalue of \(\Sigma^{-1} \Sigma_4\) with multiplicity greater than or equal to \(p-r\), where \(r \leq k-1\) is the dimension of the affine space generated by the group means.

But…

This formula relies on \(\Sigma^{-1}\) that is not explicit in the parameters \(W\), \((\mu_0, \dots, \mu_{k-1}), (\pi_1, \dots, \pi_{k-1})\) of \(P\).

The case of a discrete distribution on simplex vertices

The setting

From now on we assume that \(P\) is a discrete distribution on simplex vertices, so that
1. \(W=0\), i.e \(\Sigma = B\): no within-group noise
2. \(p = k-1\)
3. \((\mu_0, \dots, \mu_{k-1})\) are in general position, i.e. form an affine basis aka a simplex in \(\mathbb R^{k-1}\).
In that case, Mahalanobis inner products between the group directions \(\delta_j \Sigma^{-1} \delta_i\) become tractable, and we prove that the eigenvalues of \(\Sigma^{-1} \Sigma_4\) depend only on group proportions.

An edge case

Simulations suggest this is specific to this case: when \(W \neq 0\) or the group means are not in general position, their directions matter.
This situation is interesting as a limit case when \(W B^{-1} \rightarrow 0\).

Computing \(\Sigma_4\)

Let us begin by proving a simple formula for \(\Sigma_4\).

Lemma 1 Let \(P = \sum_{i=0}^{k-1} \pi_i \delta_{\mu_i}\) be a discrete distribution on the simplex \((\mu_0, \dots, \mu_{k-1})\) of \(\mathbb R^{k-1}\) and with \((\pi_0, \dots, \pi_{k-1})^\top \in \mathcal S^k\). With \(\mu = \sum_{i=0}^{k-1} \pi_i \mu_i\) and \(\delta_i = \mu_i - \mu, 0 \leq i \leq k-1\) \[ \Sigma_4 = \frac1{k+1} \sum_{i=0}^{k-1} (1 - \pi_i) \delta_i \delta_i^\top \]

A rational equation for eigenvalues

Now we can derive a simple rational equation depending only on proportions satisfied by the eigenvalues of \(\Sigma_4 \Sigma^{-1}\) and express their corresponding eigenvectors.

Theorem 3 Let \(P = \sum_{i=0}^{k-1} \pi_i \delta_{\mu_i}\) be a discrete distribution on the simplex \((\mu_0, \dots, \mu_{k-1})\) of \(\mathbb R^{k-1}\) and \(\lambda > 0\). Let us denote

\(\alpha_i = \frac{1-\pi_i}{\pi_i}\) the odds against group \(i \in \{ 0, \dots, k-1 \}\)
\(I_\lambda = \left\{ i \in \{0, \dots, k-1 \} \mid \alpha_i = \lambda \right\}\)
\(E_\lambda = \operatorname{ker}(\Sigma_4\Sigma^{-1} - \lambda I_{k-1})\).

Then \(\lambda\) is an eigenvalue of \((k+1) \Sigma_4 \Sigma^{-1}\) if and only if one of the following holds:

\(I_\lambda = \emptyset\) and \[\sum_{i=0}^{k-1} \frac{\pi_i}{\alpha_i - \lambda} = 0 \tag{2}\] In that case, \(\lambda\) is simple and \(E_\lambda = \operatorname{span}(h)\) where \(h = \sum_{i=0}^{k-1} \frac{1}{\alpha_i - \lambda}\,\delta_i\).
\(| I_\lambda | \geq 2\). In that case, \(\lambda\) has multiplicity \(|I_\lambda| - 1\) and \(E_\lambda = \operatorname{span} (\delta_i - \delta_{i_0})_{i \in I_\lambda}\) for any \(i_0 \in I_\lambda\).

Visualising the eigenvalues of \((k+1) \Sigma_4 \Sigma^{-1}\) for \(k=5\)

Figure 3: Graph of \(\displaystyle f(x) = \sum_{i=0}^{k-1} \frac{\pi_i}{\alpha_i - x}\). The eigenvalues of \((k+1) \Sigma_4 \Sigma^{-1}\) (zeroes of \(f\)) are interlaced with the odds against each group (poles of \(f\)).

Computing the characteristic polynomial

We can reunite both cases in Theorem 3 using the characteristic polynomial \[ \chi = \det(t \, I_{k-1} - (k+1) \Sigma^{-1} \Sigma_4) \] which is the unit polynomial whose roots are the eigenvalues of \((k+1) \Sigma^{-1} \Sigma_4\) (with multiplicities).

Corollary 2 Let \(P = \sum_{i=0}^{k-1} \pi_i \delta_{\mu_i}\) be a discrete distribution on the simplex \((\mu_0, \dots, \mu_{k-1})\) of \(\mathbb R^{k-1}\) and \(\alpha_i = \frac{1-\pi_i}{\pi_i}\) the odds against group \(i \in \{ 0, \dots, k-1 \}\). Let \(Q = \prod_{i=0}^{k-1} (t - \alpha_i)\) and \(\chi = \det(t \, I_{k-1} - (k+1) \Sigma^{-1} \Sigma_4)\). Then \[ \begin{aligned} \chi &= \frac{Q + Q'}{t+1} = Q(t) \sum_{i=0}^{k-1} \frac{\pi_i}{t - \alpha_i} \\ &= \tfrac1{\prod_{i=0}^{k-1} \pi_i} \sum_{i=0}^k (-1)^{k-i} (t+1)^{i - 2} (t + 1 + i) e_i(\pi_0, \dots, \pi_{k-1}) \end{aligned} \] using the elementary symmetric polynomials \[e_i(\pi_0, \dots, \pi_{k-1}) = \sum_{0\leq j_1<j_2<\cdots<j_i\leq k-1} \pi_{j_1}\pi_{j_2}\dotsm \pi_{j_i}, 0 \leq i \leq k.\]

Application. A criterion for discriminant subspace reconstructability

A criterion for discriminant subspace reconstructability

Theorem 4 (Tyler et al. (2009)) If \(P\) is a mixture of \(k\) elliptical distributions with same within-group covariance, there exists a noise eigenvalue that has multiplicity at least \(p-r\) where \(r \leq k-1\) is the dimension of the affine space generated by the group means. If its multiplicity is minimal, then the remaining eigenvectors span the discriminant subspace.

Step 1: Identifying the noise eigenvalue. For \((\Sigma, (k+1) \Sigma_4)\), Theorem 2 shows that it is \(\alpha\), which corresponds to the proportionality constant when \(B=0\) (elliptical case).

Step 2: Characterising the parameters for which the multiplicity of this eigenvalue is minimal. In the case of a discrete distribution on a simplex, this means that \(k+1\) is not an eigenvalue of \((k+1)\Sigma_4 \Sigma^{-1}\), i.e. \[\chi (k+1) \neq 0\]

For \(k=2\), we recover the condition

\[6 \pi_{0} \pi_{1} - 1 \neq 0 \iff \pi_0 \notin \{ \tfrac12 - \tfrac{\sqrt{3}}6, \tfrac12 + \tfrac{\sqrt{3}}6 \} \approx \{ 0.2113, 0.7887 \}.\]

For \(k=3\), we obtain the condition

\[40 \pi_{0} \pi_{1} \pi_{2} - 7 \pi_{0} \pi_{1} - 7 \pi_{0} \pi_{2} - 7 \pi_{1} \pi_{2} + 1\neq 0 \]

Example for \(k=3\). Beyond the zero-noise general position case

shade sign of det

d3 = require("d3@7")

//
// === Build callable det functions from the sympy-derived strings ===
//
math = {
  const args = "pi1, pi2, m1x, m1y, m2x, m2y, w00, w01, w11";
  return {
    N00:  new Function(args, "return " + N00_src + ";"),
    N01:  new Function(args, "return " + N01_src + ";"),
    N11:  new Function(args, "return " + N11_src + ";"),
    Ddet: new Function(args, "return " + D_src   + ";")
  };
}

//
// === Single d3 panel: centers triangle + zero curve inside ===
//
// Important pattern for buttons + reactive update:
//   We expose a global `window.__icsRender()` from this cell.
//   The buttons cell mutates window.__icsState then calls __icsRender().
//   This avoids the OJS-viewof-event-routing tangle.
//
board = {
  const MU0 = [-0.5, -Math.sqrt(3)/6];
  const DEFAULT_MU1 = [0.5, -Math.sqrt(3)/6];
  const DEFAULT_MU2 = [0.0, Math.sqrt(3)/3];

  // State lives on window so the buttons cell can mutate it directly.
  window.__icsState = window.__icsState || {
    mu1: [...DEFAULT_MU1],
    mu2: [...DEFAULT_MU2],
    covA: 0, covB: 0, covTheta: 0,
    showSign: true
  };
  const s = window.__icsState;
  // Keep DEFAULT_* accessible to the buttons cell.
  window.__icsDefaults = { MU0, DEFAULT_MU1, DEFAULT_MU2,
    DEFAULT_COV: { a: 0.18, b: 0.18, theta: 0 } };

  // --- Set up SVG (idempotent: clear container first in case of re-run) ---
  const container = d3.select("#ics-board");
  container.selectAll("*").remove();

  const W = 720, H = 520;
  const svg = container.append("svg")
    .attr("viewBox", [0, 0, W, H])
        .attr("style", "width: min(75vh, 75vw); display: block; margin: 0 auto; max-width:900px; background:#fff; border:1px solid #d8d4c4; user-select:none;");

  const x = d3.scaleLinear().domain([-1.7, 1.7]).range([60, W - 60]);
  const y = d3.scaleLinear().domain([-1.2, 1.2]).range([H - 30, 30]);

  // Background grid
  for (let v = -1.5; v <= 1.5 + 1e-9; v += 0.5) {
    if (Math.abs(v) < 1e-9) continue;
    svg.append("line").attr("x1", x(v)).attr("x2", x(v))
      .attr("y1", y(-1.15)).attr("y2", y(1.15))
      .attr("stroke", "#e8e4d4").attr("stroke-width", 1);
  }
  for (let v = -1.0; v <= 1.0 + 1e-9; v += 0.5) {
    if (Math.abs(v) < 1e-9) continue;
    svg.append("line").attr("y1", y(v)).attr("y2", y(v))
      .attr("x1", x(-1.65)).attr("x2", x(1.65))
      .attr("stroke", "#e8e4d4").attr("stroke-width", 1);
  }
  svg.append("line").attr("x1", x(-1.65)).attr("x2", x(1.65))
    .attr("y1", y(0)).attr("y2", y(0))
    .attr("stroke", "#1a1a1a").attr("opacity", 0.15);
  svg.append("line").attr("x1", x(0)).attr("x2", x(0))
    .attr("y1", y(-1.15)).attr("y2", y(1.15))
    .attr("stroke", "#1a1a1a").attr("opacity", 0.15);

  // Layers (drawn in order)
  const gShade = svg.append("g");
  const gCurve = svg.append("g");
  const gTriangle = svg.append("g");
  const gEllipses = svg.append("g");
  const gHandles = svg.append("g");

  // --- Math: Σ_W matrix + det evaluator ---
  function sigmaW() {
    const c = Math.cos(s.covTheta), si = Math.sin(s.covTheta);
    return {
      w00: c*c*s.covA**2 + si*si*s.covB**2,
      w01: c*si*(s.covA**2 - s.covB**2),
      w11: si*si*s.covA**2 + c*c*s.covB**2
    };
  }
  function detAt(pi1, pi2) {
    if (pi1 < 0 || pi2 < 0 || pi1 + pi2 > 1) return NaN;
    const m1x = s.mu1[0] - MU0[0], m1y = s.mu1[1] - MU0[1];
    const m2x = s.mu2[0] - MU0[0], m2y = s.mu2[1] - MU0[1];
    const {w00, w01, w11} = sigmaW();
    const D = math.Ddet(pi1, pi2, m1x, m1y, m2x, m2y, w00, w01, w11);
    if (Math.abs(D) < 1e-14) return NaN;
    const n00 = math.N00(pi1, pi2, m1x, m1y, m2x, m2y, w00, w01, w11);
    const n01 = math.N01(pi1, pi2, m1x, m1y, m2x, m2y, w00, w01, w11);
    const n11 = math.N11(pi1, pi2, m1x, m1y, m2x, m2y, w00, w01, w11);
    return n00 * n11 - n01 * n01;
  }
  function bary(pi1, pi2) {
    const pi0 = 1 - pi1 - pi2;
    return [
      pi0*MU0[0] + pi1*s.mu1[0] + pi2*s.mu2[0],
      pi0*MU0[1] + pi1*s.mu1[1] + pi2*s.mu2[1]
    ];
  }

  // --- Render loop: rebuilds shading, curve, triangle, ellipses, handles ---
  function render() {
    const N = 140;
    const F = new Float64Array((N+1)*(N+1));
    const inside = new Uint8Array((N+1)*(N+1));
    for (let j = 0; j <= N; j++) {
      for (let i = 0; i <= N; i++) {
        const pi1 = i/N, pi2 = j/N;
        const idx = j*(N+1)+i;
        if (pi1 + pi2 <= 1.0 + 1e-9) {
          const v = detAt(pi1, pi2);
          F[idx] = v;
          inside[idx] = isNaN(v) ? 0 : 1;
        }
      }
    }

    // Sign shading
    gShade.selectAll("*").remove();
    if (s.showSign) {
      const posPath = [], negPath = [];
      for (let j = 0; j < N; j++) {
        for (let i = 0; i < N; i++) {
          const i00 = j*(N+1)+i, i10 = j*(N+1)+i+1, i01 = (j+1)*(N+1)+i, i11 = (j+1)*(N+1)+i+1;
          if (inside[i00] && inside[i10] && inside[i01]) {
            const avg = (F[i00] + F[i10] + F[i01]) / 3;
            const w0 = bary(i/N, j/N), w1 = bary((i+1)/N, j/N), w2 = bary(i/N, (j+1)/N);
            const seg = `M${x(w0[0])},${y(w0[1])}L${x(w1[0])},${y(w1[1])}L${x(w2[0])},${y(w2[1])}Z`;
            (avg >= 0 ? posPath : negPath).push(seg);
          }
          if (inside[i10] && inside[i11] && inside[i01]) {
            const avg = (F[i10] + F[i11] + F[i01]) / 3;
            const w0 = bary((i+1)/N, j/N), w1 = bary((i+1)/N, (j+1)/N), w2 = bary(i/N, (j+1)/N);
            const seg = `M${x(w0[0])},${y(w0[1])}L${x(w1[0])},${y(w1[1])}L${x(w2[0])},${y(w2[1])}Z`;
            (avg >= 0 ? posPath : negPath).push(seg);
          }
        }
      }
      if (posPath.length) gShade.append("path").attr("d", posPath.join("")).attr("fill", "#bd5d3a").attr("fill-opacity", 0.42);
      if (negPath.length) gShade.append("path").attr("d", negPath.join("")).attr("fill", "#3d6478").attr("fill-opacity", 0.42);
    }

    // Zero curve
    gCurve.selectAll("*").remove();
    const segs = [];
    function interp(pa, va, pb, vb) {
      const t = va / (va - vb);
      return [pa[0] + t*(pb[0]-pa[0]), pa[1] + t*(pb[1]-pa[1])];
    }
    function trace(pa, va, pb, vb, pc, vc) {
      const code = (va>0?1:0) | (vb>0?2:0) | (vc>0?4:0);
      if (code === 0 || code === 7) return;
      let q1, q2;
      if (code === 1 || code === 6) { q1 = interp(pa, va, pb, vb); q2 = interp(pa, va, pc, vc); }
      else if (code === 2 || code === 5) { q1 = interp(pb, vb, pa, va); q2 = interp(pb, vb, pc, vc); }
      else { q1 = interp(pc, vc, pa, va); q2 = interp(pc, vc, pb, vb); }
      segs.push(`M${x(q1[0])},${y(q1[1])}L${x(q2[0])},${y(q2[1])}`);
    }
    for (let j = 0; j < N; j++) {
      for (let i = 0; i < N; i++) {
        const i00 = j*(N+1)+i, i10 = j*(N+1)+i+1, i01 = (j+1)*(N+1)+i, i11 = (j+1)*(N+1)+i+1;
        if (inside[i00] && inside[i10] && inside[i01])
          trace(bary(i/N,j/N), F[i00], bary((i+1)/N,j/N), F[i10], bary(i/N,(j+1)/N), F[i01]);
        if (inside[i10] && inside[i11] && inside[i01])
          trace(bary((i+1)/N,j/N), F[i10], bary((i+1)/N,(j+1)/N), F[i11], bary(i/N,(j+1)/N), F[i01]);
      }
    }
    if (segs.length) gCurve.append("path").attr("d", segs.join("")).attr("fill", "none").attr("stroke", "#1a1a1a").attr("stroke-width", 1.6);

    // Triangle outline
    gTriangle.selectAll("*").remove();
    gTriangle.append("path")
      .attr("d", `M${x(MU0[0])},${y(MU0[1])}L${x(s.mu1[0])},${y(s.mu1[1])}L${x(s.mu2[0])},${y(s.mu2[1])}Z`)
      .attr("fill", "none").attr("stroke", "#1a1a1a").attr("stroke-width", 1.4);

    // Σ_W ellipses at each center
    gEllipses.selectAll("*").remove();
    const scale = x(1) - x(0);
    for (const [cx, cy] of [MU0, s.mu1, s.mu2]) {
      gEllipses.append("ellipse")
        .attr("cx", x(cx)).attr("cy", y(cy))
        .attr("rx", s.covA * scale).attr("ry", s.covB * scale)
        .attr("transform", `rotate(${-s.covTheta*180/Math.PI} ${x(cx)} ${y(cy)})`)
        .attr("fill", "rgba(122,31,31,0.06)").attr("stroke", "#7a1f1f").attr("stroke-width", 1.2);
    }

    // Handles
    const c = Math.cos(s.covTheta), si = Math.sin(s.covTheta);
    const a1 = [MU0[0] + s.covA*c, MU0[1] + s.covA*si];
    const a2 = [MU0[0] - s.covB*si, MU0[1] + s.covB*c];

    gHandles.selectAll("*").remove();
    gHandles.append("circle").attr("cx", x(MU0[0])).attr("cy", y(MU0[1])).attr("r", 4).attr("fill", "#1a1a1a");
    gHandles.append("text").attr("x", x(MU0[0]) + 11).attr("y", y(MU0[1]) - 9)
      .attr("font-family", "serif").attr("font-style", "italic").attr("font-size", 15).text("μ₀");

    const handles = [
      { type: "center", which: 1, pos: s.mu1, label: "μ₁" },
      { type: "center", which: 2, pos: s.mu2, label: "μ₂" },
      { type: "covA", pos: a1 },
      { type: "covB", pos: a2 }
    ];

    const hg = gHandles.selectAll("g.h").data(handles).join("g").attr("class", "h")
      .attr("transform", d => `translate(${x(d.pos[0])}, ${y(d.pos[1])})`)
      .style("cursor", "grab");
    hg.append("circle").attr("r", 18).attr("fill", "transparent");
    hg.filter(d => d.type === "center").append("circle").attr("r", 7)
      .attr("fill", "#fff").attr("stroke", "#1a1a1a").attr("stroke-width", 1.5);
    hg.filter(d => d.type === "covA").append("rect").attr("x", -6).attr("y", -6).attr("width", 12).attr("height", 12)
      .attr("fill", "#fff").attr("stroke", "#7a1f1f").attr("stroke-width", 1.4);
    hg.filter(d => d.type === "covB").append("polygon").attr("points", "0,-7 -6,5 6,5")
      .attr("fill", "#fff").attr("stroke", "#7a1f1f").attr("stroke-width", 1.4);
    hg.filter(d => d.label).append("text").attr("x", 11).attr("y", -9)
      .attr("font-family", "serif").attr("font-style", "italic").attr("font-size", 15).text(d => d.label);

    hg.call(d3.drag()
      .on("drag", (event, d) => {
        const wx = x.invert(event.x), wy = y.invert(event.y);
        if (d.type === "center") {
          const t = d.which === 1 ? s.mu1 : s.mu2;
          t[0] = Math.max(-1.6, Math.min(1.6, wx));
          t[1] = Math.max(-1.15, Math.min(1.15, wy));
        } else if (d.type === "covA") {
          const dx = wx - MU0[0], dy = wy - MU0[1];
          const r = Math.hypot(dx, dy);
          if (r > 0.01) { s.covA = Math.min(1.0, r); s.covTheta = Math.atan2(dy, dx); }
        } else if (d.type === "covB") {
          const dx = wx - MU0[0], dy = wy - MU0[1];
          const perp = -Math.sin(s.covTheta)*dx + Math.cos(s.covTheta)*dy;
          s.covB = Math.min(1.0, Math.max(0.01, Math.abs(perp)));
        }
        render();
      })
    );

    // Readout
    const f = v => v.toFixed(3);
    const w = sigmaW();
    document.getElementById("ics-readout").innerHTML =
      `μ₁ = (${f(s.mu1[0])}, ${f(s.mu1[1])}) · μ₂ = (${f(s.mu2[0])}, ${f(s.mu2[1])}) · ` +
      `Σ_W axes = (${f(s.covA)}, ${f(s.covB)}) at ${(s.covTheta*180/Math.PI).toFixed(1)}°`;
  }

  // Expose the render function globally so the buttons cell can call it.
  window.__icsRender = render;

  // Wire buttons here too (idempotent: replace any previous handlers).
  const D = window.__icsDefaults;
  document.getElementById("b-equil").onclick = () => {
    s.mu1 = [...D.DEFAULT_MU1]; s.mu2 = [...D.DEFAULT_MU2]; render();
  };
  document.getElementById("b-align").onclick = () => {
    s.mu1 = [D.MU0[0] + 1.0, D.MU0[1]];
    s.mu2 = [D.MU0[0] - 0.6, D.MU0[1]]; render();
  };
  document.getElementById("b-circ").onclick = () => {
    s.covA = D.DEFAULT_COV.a; s.covB = D.DEFAULT_COV.b; s.covTheta = D.DEFAULT_COV.theta;
    render();
  };
  document.getElementById("b-zero").onclick = () => {
    s.covA = 0.001; s.covB = 0.001; s.covTheta = 0; render();
  };
  document.getElementById("b-sign").onchange = (e) => {
    s.showSign = e.target.checked; render();
  };

  render();
  return svg.node();
}

Appendix

Proof of Lemma 1

Since \(P = \sum_i \pi_i \delta_{\mu_i}\), expanding the definition of \(\Sigma_4\) gives: \[(k+1)\Sigma_4 = \sum_{i=0}^{k-1} \pi_i (\delta_i^\top \Sigma^{-1} \delta_i) \, \delta_i \delta_i^\top.\] It remains to compute the Mahalanobis distances \(\delta_i^\top \Sigma^{-1} \delta_i, 0 \leq i \leq k-1\) between the group means and the global mean. Since \(\Sigma = \sum_{i=0}^{k-1} \pi_i \delta_i \delta_i^\top\) and \(\sum_{i=0}^{k-1} \pi_i \delta_i = 0\), then for any \(0 \leq j \leq k-1\): \[\delta_j = \Sigma \Sigma^{-1} \delta_j = \sum_{i=0}^{k-1} \pi_i \left( \delta_i^\top \Sigma^{-1}\delta_j + 1 \right) \delta_i, \qquad \sum_{i=0}^{k-1} \pi_i \bigl(\delta_i^\top \Sigma^{-1}\delta_j + 1\bigr) = 1.\] By uniqueness of the barycentric coordinates of \(\delta_j\) in the affine basis \((\delta_0, \dots, \delta_{k-1})\), we get \[\pi_j(\delta_j^\top \Sigma^{-1}\delta_j + 1) = 1, \qquad (k+1)\Sigma_4 = \sum_{i=0}^{k-1}(1-\pi_i)\,\delta_i\delta_i^\top.\]

Proof of Theorem 3

Let \(M_\lambda = (k+1) (\Sigma_4 \Sigma^{-1} - \lambda I_{k-1})\), \(E_\lambda = \ker M_\lambda\), \(F_\lambda = \ker(\Sigma^{-1}\Sigma_4 - \lambda I_{k-1}) = \Sigma^{-1} E_\lambda\), \(\beta_i = \alpha_i - (k+1)\lambda\) and \(\pi_\lambda = \frac1{(k+1)\lambda+1}\) for \(0 \leq i \leq k-1\). Note that \(\beta_i = 0 \iff \pi_i = \pi_\lambda\). Let \(v = \sum_{i=0}^{k-1} v_i \delta_i \in \mathbb R^{k-1}\) with \(\sum_{i=0}^{k-1} v_i = 1\) (barycentric coordinates). From the proof of Lemma 1, \(\delta_j^\top \Sigma^{-1} \delta_i = \frac{\delta_{ij} - \pi_j}{\pi_j}\) for \(0 \leq i,j \leq k-1\), so \[(k+1)\Sigma_4\Sigma^{-1}\,\delta_i = \sum_{j=0}^{k-1} (1-\pi_j)\frac{\delta_{ij} - \pi_j}{\pi_j}\,\delta_j = \alpha_i\,\delta_i - \sum_{j=0}^{k-1} \delta_j, \quad M_\lambda v = \sum_{j=0}^{k-1} (\beta_j v_j - 1) \delta_j.\] Since \(\sum_{i=0}^{k-1} \pi_i \delta_i = 0\) we have \[v \in E_\lambda \iff \exists\, t \in \mathbb R, \; \forall\, 0 \leq i \leq k-1, \quad \beta_i v_i = 1 - t\frac{\pi_i}{\pi_\lambda}.\] Set \(h_i = 1/\beta_i\) when \(i \notin I_\lambda\). Since \(\pi_i \beta_i = 1 - \pi_i/\pi_\lambda\), we get \(\pi_i h_i = \pi_\lambda(h_i - \pi_i)\), so \[v \in E_\lambda \iff \exists\, t \in \mathbb R, \quad \begin{cases} 1 - t\frac{\pi_i}{\pi_\lambda} = 0 & \text{for } i \in I_\lambda \\ v_i = (1-t)\,h_i + t\,\pi_i & \text{for } i \notin I_\lambda \end{cases} \qquad (\star)\]

Case a. \(I_\lambda = \emptyset\). Set \(h = \sum_{i=0}^{k-1} h_i \delta_i\). Summing \((\star)\) over \(0 \leq i \leq k-1\) and using \(\sum_{i=0}^{k-1} v_i = 1\), \(\sum_{i=0}^{k-1} \pi_i = 1\) gives \[(1-t)\Bigl(\sum_{i=0}^{k-1} h_i - 1\Bigr) = 0 \qquad (\star\star)\] so \((\star\star) \iff t = 1\) or \(\sum_{i=0}^{k-1} h_i = 1\). If \(t = 1\), then \(v_i = \pi_i\) for \(0 \leq i \leq k-1\), so \(v = \sum_{i=0}^{k-1} \pi_i \delta_i = 0\), which is trivial. Therefore \[E_\lambda \neq \{0\} \iff \sum_{i=0}^{k-1} h_i = 1 \iff \sum_{i=0}^{k-1} \frac{1}{\alpha_i - (k+1)\lambda} = 1\] and then \(E_\lambda = \operatorname{span}(h)\), so \(\lambda\) is simple. We have \(F_\lambda = \operatorname{span}(\Sigma^{-1} h)\) with \(\delta_i^\top \Sigma^{-1} h = (h_i - \pi_i)/\pi_i = h_i/\pi_\lambda\) for \(0 \leq i \leq k-1\).

Case b. \(I_\lambda \neq \emptyset\). Then \((\star)\) gives \(t = 1\), so \[v \in E_\lambda \iff v_i = \pi_i \text{ for } i \notin I_\lambda \text{ and } \sum_{i \in I_\lambda} v_i = 1 - \sum_{i \notin I_\lambda} \pi_i.\] Setting \(c_i = v_i - \pi_i\) for \(i \in I_\lambda\), the constraint becomes \(\sum_{i \in I_\lambda} c_i = 0\) and \(v = \sum_{i \in I_\lambda} c_i\,\delta_i\). Hence \(E_\lambda = \bigl\{\sum_{i \in I_\lambda} c_i\,\delta_i | \sum_{i \in I_\lambda} c_i = 0\bigr\}\), so \(E_\lambda \neq \{0\} \iff |I_\lambda| \geq 2\), and then \(\dim E_\lambda = |I_\lambda| - 1\). Since \(x \in F_\lambda \iff \Sigma x \in E_\lambda\) and the barycentric coordinate of \(\Sigma x\) on \(\delta_i\) is \(\pi_i(\delta_i^\top x + 1)\), we get \(x \in F_\lambda \iff \delta_i^\top x = 0\) for \(i \notin I_\lambda\), i.e. \(F_\lambda = \{\delta_i, i \notin I_\lambda\}^\perp\).

Proof of Corollary 2

Let \(r\) be the number of distinct roots of \(Q\) and \(\operatorname{rad}(Q)\) the unit polynomial of degree \(r\) whose roots are those of \(Q\). Then \(\frac{Q}{\operatorname{rad}(Q)} = \gcd(Q,Q')\) is the unit polynomial of degree \(k-r\) obtained by lowering the multiplicities of the roots of \(Q\) by 1. The Theorem 3 allows us to write \(\chi = \frac{Q}{\operatorname{rad}(Q)} \cdot \frac{H \circ u}{(k+1)^{k-1} u}\) where \(H\) is the unit polynomial of degree \(r-1\) whose roots are the solutions of \(\frac{Q'}{Q} \circ u(\lambda) + 1 = 0\) with multiplicity 1.

Furthermore, we have \(Q + Q' = \gcd(Q,Q') R\) where \(\gcd(Q,R) = 1\) and \(R\) is a unit polynomial of degree \(r\). Since \(Q(0) = (-1)^k \prod_{i=0}^{k-1} \frac1{\pi_i} > 0\) and \(Q'(0) = (-1)^{k-1} \sum_{i=0}^{k-1} \prod_{j \neq i} \frac1{\pi_j} = - \sum_{i=0}^{k-1} \pi_i Q(0) = - Q(0)\), then \(t\) divides \(Q + Q'\), so \(t\) divides \(R\). It suffices to show that \(\frac{R}{t} = H\). Since both polynomials are unital of degree \(r-1\), it suffices to verify that every root of \(H\) is a root of \(\frac{R}{t}\). Now, if \(v\) is a root of \(H\), it satisfies \(Q(v) \neq 0\) and \(\frac{Q'}{Q}(v) + 1 = 0\), so \(Q(v) + Q'(v) = 0\) but \(v \gcd(Q,Q')(v) \neq 0\), so \(\frac{R(v)}v = 0\). The second equality follows from \(Q = \sum_{i=0}^{k-1} (-1)^{k-1-i} t^i e_{k-1-i}(\pi_0^{-1}, \dots, \pi_{k-1}^{-1})\) and from \(e_{k-1-i}(\pi_0^{-1}, \dots, \pi_{k-1}^{-1}) = \frac1{\prod_{i=0}^{k-1} \pi_i} e_i (\pi_0, \dots, \pi_{k-1})\).

References

Becquart, Colombe, Aurore Archimbaud, Anne Ruiz-Gazen, Luka Prlić, and Klaus Nordhausen. 2026. “Invariant Coordinate Selection and Fisher Discriminant Subspace Beyond the Case of Two Groups.” Journal of Multivariate Analysis 211 (January): 105521. https://doi.org/10.1016/j.jmva.2025.105521.

Peña, Daniel, Francisco J. Prieto, and Júlia Viladomat. 2010. “Eigenvectors of a Kurtosis Matrix as Interesting Directions to Reveal Cluster Structure.” Journal of Multivariate Analysis 101 (9): 1995–2007. https://doi.org/10.1016/j.jmva.2010.04.014.

Tyler, David E., Frank Critchley, Lutz Dümbgen, and Hannu Oja. 2009. “Invariant Co-Ordinate Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (3): 549–92. https://doi.org/10.1111/j.1467-9868.2009.00706.x.

ICS and the discriminant subspace using the fourth-order covariance operator

Introduction

From Principal Component Analysis…

…to Invariant Coordinate Selection

Motivation

Mixture distributions

A general formula by Peña, Prieto, and Viladomat (2010)

A general and correct formula by Peña, Prieto, and Viladomat (2010)

A general and correct formula by Peña, Prieto, and Viladomat (2010), Gaussian case

The case of a discrete distribution on simplex vertices

The setting

Computing \(\Sigma_4\)

A rational equation for eigenvalues

Visualising the eigenvalues of \((k+1) \Sigma_4 \Sigma^{-1}\) for \(k=5\)

Computing the characteristic polynomial

Application. A criterion for discriminant subspace reconstructability

A criterion for discriminant subspace reconstructability

Example for \(k=3\). Beyond the zero-noise general position case

Appendix

Proof of Lemma 1

Proof of Theorem 3

Proof of Corollary 2

References

ICS and the discriminant subspace
using the fourth-order covariance operator