Coordinate-free ICS for distributional and functional data

Workshop on Invariant Coordinate Selection and Related Methods
University of Helsinki

May 7, 2025

Outline

  1. Temperature distributions in Finland
  2. Coordinate-free ICS (Invariant Coordinate Selection)
  3. Functional outlier detection using coordinate-free ICS
  4. Comparison with other methods

How to study climate change in Finland?

From temperature data…

Code
library(dda)
library(sf)
library(tidyverse)

load("data/finland/finclim.RData")
load("data/finland/finregions.RData")
t0 <- as.Date("2024-01-01")

finclim_sample <- finclim |>
  filter(time %in% c(t0, t0 + 121, t0 + 244)) |>
  filter(!is.na(tday))

ggplot(finregions) +
  geom_sf(alpha = 0) +
  geom_sf(data = finclim_sample, aes(color = tday)) +
  scale_color_viridis_c() +
  facet_wrap(vars(time)) +
  scale_x_continuous(breaks = seq(from = -180, to = 180, by = 4)) +
  labs(color = "Temperature (deg. Celsius)")
Figure 1: Daily average temperature per station in Finland on several days of 2024. Total size: 190 stations \(\times\) 35 years \(\times\) 365 days > 2 million observations. Source: Weather observations by the Finnish Meteorological Institute.

How to apply ICS to this distributional dataset?

ICS, as defined by Tyler et al. (2009), requires:

  1. observations from a random vector \(X\) in the space \(\mathbb R^p\);
  2. estimating two scatter matrices \(S_1[X]\) and \(S_2 [X]\);
  3. estimating eigenvectors \(H = (h_1 | \dots | h_p)\) and eigenvalues \((\lambda_1, \dots, \lambda_p)\) from their joint diagonalisation;
  4. computing invariant coordinates \((z_1, \dots, z_p)\).

What if \(X\) is a functional (or distributional) object?

Idea 1: back to a coordinate vector

  • Find a way to send the functional object \(X\) to some space \(\mathbb R^p\), using either

    1. its \(y\)-values on a discretisation of the \(x\)-axis; or

    2. the coordinates of an approximation in a basis of a space \(E\) of functions. How to choose the space? Usually some finite dimensional subspace of \(L^2 (a,b)\). How to choose the approximation? This process is called smoothing. Note: (a) is actually a particular case of (b) for a certain choice of basis.

  • Apply ICS to the coordinate vector that was obtained.

  • Problems:

    • How to choose the basis? The output of ICS (eigenelements, ICs) might depend on it, but why should it?
    • How to link the output of ICS to the original functional object \(X\)? In particular, the eigenvectors might be distorted in the chosen basis is not orthonormal.

Idea 2

  • Still find an approximation of \(X\) in a finite-dimensional functional space \(E\).

  • But define ICS directly on the space \(E\) in a coordinate-free manner:

    • Dependence on choice of units \(\rightarrow\) Affine invariance.
    • Dependence on choice of basis \(\rightarrow\) Coordinate freedom.
Multivariate ICS Coordinate-free ICS
the space \(\mathbb R^p\) a \(p\)-dimensional Euclidean space \(E\)
a random vector \(X \in \mathbb R^p\) a random object \(X \in E\)
two scatter matrices \(S_1[X]\) and \(S_2[X]\) two scatter operators (linear maps) \(S_1[X]\) and \(S_2[X]\)
a matrix of eigenvectors \(H\) an eigenbasis \(H = (h_1, \dots, h_p)\) of \(E\)
eigenvalues \((\lambda_1, \dots, \lambda_p)\) unchanged
invariant coordinates unchanged

Observations from a random object in a Euclidean space

Theory

  • Let \((E, \langle \cdot, \cdot \rangle)\) be a \(p\)-dimensional Euclidean space (abstract vector space with an inner product). Why a Euclidean space? In order to define scatter operators, an inner product is required to have a notion of symmetry and positiveness.

  • Let \(X\) be an \(E\)-valued random object, from which we observe a data set \(D = (x_1, \dots, x_n)\).

Finland application

Code
load("data/finland/fint.RData")

fint_trend |>
  plot_funs(tday) +
  ylim(NA, 0.1) +
  labs(x = "Temperature (deg. Celsius)", y = "Density")
Figure 3: The data set \(D\) of trends of temperature change in Finland in a 11-dimensional space \(E\) of compositional splines.

Two scatter operators

Theory

  • We need to find two scatter operators \(S_1[X]\) and \(S_2[X]\) that are symmetric positive-definite linear operators on \(E\).

  • The \(w\)-weighted covariance operator is defined by \[\begin{multline*}\langle \operatorname{Cov}_w [X] x, y \rangle \\ = \mathbb E [ w^2(\| \operatorname{Cov}[X]^{-1/2} (X - \mathbb E X) \|) \\ \langle X - \mathbb EX, x \rangle \langle X - \mathbb EX, y \rangle ] \end{multline*}\] where \((x,y) \in E^2\) and \(w : \mathbb R^+ \rightarrow \mathbb R\) is a measurable function.

  • It is an affine equivariant scatter operator and its definition is completely coordinate-free: it exists on every Euclidean space.

  • Example: \(S_1 = \operatorname{Cov}\) for \(w=1\) and \(S_2 = \operatorname{Cov}_4\) for \(w = \operatorname{id}\).

Finland application

Code
library(ICS)
load("data/finland/fint.RData")

fint_eval <- fint_trend |>
  mutate(tday = as.fd(tday)) |>
  eval_funs(tday, n = 100) |>
  st_drop_geometry() |>
  select(x, y, fmisid) |>
  pivot_wider(names_from = fmisid, values_from = y) |>
  column_to_rownames("x") |>
  t()

fint_cov <- cov(fint_eval)
eps <- 1e-8 * matrix(rnorm(ncol(fint_eval) * nrow(fint_eval), sd = max(abs(fint_eval))),
  nrow = nrow(fint_eval)
)
fint_cov4 <- cov4(fint_eval + eps)

fint_trend_scatters <- crossing(t_1 = colnames(fint_eval), t_2 = colnames(fint_eval)) |>
  rowwise() |>
  mutate(Cov = fint_cov[t_1, t_2], Cov_4 = fint_cov4[t_1, t_2]) |>
  pivot_longer(c(Cov, Cov_4), names_to = "scatter", values_to = "z") |>
  mutate(t_1 = as.numeric(t_1), t_2 = as.numeric(t_2))

tday_ics <- dda::ICS(fint_trend$tday)
fint_basis <- fint_trend$tday[[1]]$basis

changemat <- to_zbsplines(basis = fint_basis, inv = TRUE)
gram <- t(changemat) %*% fda::inprod(fint_basis, fint_basis) %*% changemat
tday_cov_zb <- cov(t(to_zbsplines(c(fint_trend$tday))))
tday_cov4_zb <- cov4(t(to_zbsplines(c(fint_trend$tday))))

# changemat_inv <- to_zbsplines(basis = fint_basis)
# tday_cov_b <- t(changemat_inv) %*% tday_cov_zb %*% changemat_inv
# tday_cov4_b <- t(changemat_inv) %*% tday_cov4_zb %*% changemat_inv

rangeval <- fint_trend$tday[[1]]$basis$rangeval
tval <- seq(rangeval[1], rangeval[2], length.out = 100)

zbsp <- fd(to_zbsplines(inv = TRUE, coefs = diag(11), basis = fint_basis), fint_basis)
zbsp_tval <- eval.fd(tval, zbsp)
tday_cov_grid <- zbsp_tval %*% tday_cov_zb %*% t(zbsp_tval)
tday_cov4_grid <- zbsp_tval %*% tday_cov4_zb %*% t(zbsp_tval)
dimnames(tday_cov_grid) <- list(tval, tval)
dimnames(tday_cov4_grid) <- list(tval, tval)

fint_trend_scatters_2 <- reshape2::melt(tday_cov_grid) |>
  rename(t_1 = Var1, t_2 = Var2, Cov = value) |>
  mutate(Cov_4 = reshape2::melt(tday_cov4_grid)$value) |>
  pivot_longer(c(Cov, Cov_4), names_to = "scatter", values_to = "z")

ggplot(fint_trend_scatters_2) +
  stat_contour_filled(aes(t_1, t_2, z = z)) +
  facet_wrap(vars(scatter), nrow = 2) +
  labs(x = "Temperature (deg. celsius)", y = "Temperature (deg. celsius)", z = "Value")
Figure 4: Empirical estimates of the scatter operators \(\operatorname{Cov} [X]\) and \(\operatorname{Cov}_4 [X]\) for the data set on trends of temperature change in Finland.

Coordinate-free ICS

Invariant Coordinate Selection

Theory

Definition 1 (ICS problem)  

  • \(\operatorname{ICS} (X, \color{#0072b2}{S_1},\color{#d55e00}{S_2})\): in \((E, \langle \cdot, \color{#0072b2}{S_1}[X] \cdot \rangle)\) Euclidean space of dimension \(p\), find an orthonormal basis \(\color{#009e73}{H} = (h_1, \dots, h_p)\) diagonalising the symmetric operator \(\color{#0072b2}{S_1}[X]^{-1} \color{#d55e00}{S_2}[X]\), i.e:

    \[\left\{ \begin{aligned} \langle \color{#0072b2}{S_1} [X] h_i, h_j \rangle &= \delta_{ij} \\ \langle \color{#d55e00}{S_2}[X] h_i, h_j \rangle &= \lambda_i \delta_{ij} \end{aligned} \right. \text{ for all }1 \leq i,j \leq p\] where \(\lambda_1 \geq \dots \geq \lambda_p \geq 0\) are called generalised kurtosis.

  • \(z_i = \langle X - \mathbb EX, h_i \rangle, 1 \leq i \leq p\) are called invariant coordinates.

Finland application

ICS eigenelements

Code
library(dda)

tday_ics <- dda::ICS(fint_trend$tday)

tday_ics_eigen <- tibble(
  index = factor(names(tday_ics$gen_kurtosis),
    levels = names(tday_ics$gen_kurtosis)
  ),
  gen_kurtosis = tday_ics$gen_kurtosis,
  H_dual = as.list(tday_ics$H_dual)
)

tday_ics_eigen |>
  ggplot(aes(index, gen_kurtosis, group = 1)) +
  geom_line(alpha = 0.5) +
  geom_point(aes(color = index), size = 3) +
  labs(x = "", y = "ICS eigenvalues") +
  guides(color = "none")

h_dual_shift <- c(1 * tday_ics$H_dual)
h_dual_shift$coefs <- sweep(h_dual_shift$coefs, 1, -c(mean(fint_trend$tday))$coefs)
tday_ics_eigen |>
  mutate(H_dual_shift = as.list(h_dual_shift)) |>
  plot_funs(H_dual_shift, color = index) +
  geom_function(fun = function(x) eval(c(mean(fint_trend$tday)), x), color = "black", linetype = "dotted") +
  facet_wrap(vars(index)) +
  labs(x = "Temperature (deg. Celsius)", y = "Density")
Figure 5: Scree plot of ICS eigenvalues.
Figure 6: Mean trend of temperature change, shifted by each ICS eigendensity.

Proposition 1 (Reconstructing formula) \(X = \mathbb EX + \sum_{i=1}^p z_i h^*_i\) where \(\color{#009e73}{H^*} = (h^*_i)_{1 \leq i \leq p} = (\color{#0072b2}{S_1} [X] h_i)_{1 \leq i \leq p}\) is the dual basis of \(\color{#009e73}{H}\).

Invariant coordinates

Code
fint_trend |>
  cbind(as_tibble(tday_ics$scores)) |>
  ggplot(aes(IC.1, IC.2,
    label = name, color = region, shape = NA
  )) +
  geom_point() +
  geom_text() +
  coord_fixed() +
  labs(color = "") +
  guides(color = "none", shape = "none")

fint_trend |>
  cbind(as_tibble(tday_ics$scores)) |>
  ggplot(aes(IC.10, IC.11,
    label = name, color = region, shape = NA
  )) +
  geom_point() +
  geom_text() +
  coord_fixed() +
  guides(alpha = "none", shape = "none")
Figure 7: First two invariant coordinates coloured by region.
Figure 8: Last two invariant coordinates coloured by region.

Implementation

Let

  • \(B\) be any basis of \(E\);
  • \(G_B = (\langle b_i, b_j \rangle)_{1 \leq i,j \leq p}\) its Gram matrix;
  • \([\cdot]_B: E \rightarrow \mathbb R^p\) the coordinates in basis \(B\).

Corollary 1 (of Proposition 2) For any basis \(H\) of \(E\), these are equivalent:

0. \(H\) solves \(\operatorname{ICS} (X, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\) over \(E\)
1. \(\color{#d55e00}{G_B^{1/2}} [H]_B\) solves \(\operatorname{ICS} (\color{#d55e00}{G_B^{1/2}} [X]_B, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\) over \(\mathbb R^p\)
2. \([H]_B\) solves \(\operatorname{ICS} (\color{#d55e00}{G_B} [X]_B, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\) over \(\mathbb R^p\)
3. \(\color{#d55e00}{G_B} [H]_B\) solves \(\operatorname{ICS} ([X]_B, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\) over \(\mathbb R^p\)

Functional outlier detection using coordinate-free ICS

Theory

Based on ICSOutlier (Archimbaud, Nordhausen, and Ruiz-Gazen 2018a).

  1. Compute the invariant coordinates.
  2. Select \(k\) components. Using the scree plot or D’Agostino normality tests, etc. Now compute the ICS distances: \[\sum_{j=1}^k |z_j|^2\]
  3. Choose a cut-off \(\tau\). Based on Monte Carlo simulations to compute quantiles (Archimbaud, Nordhausen, and Ruiz-Gazen 2018b). We can avoid this step by drawing ROC curves instead.

Finland application

Code
tday_out <- dda::ICS_outlier(fint_trend$tday, index = 1:2)

fint_trend <- fint_trend |>
  mutate(outlying = tday_out$outliers == 1)

fint_trend |>
  filter(outlying) |>
  plot_funs(tday, color = region) +
  ylim(NA, 0.1) +
  geom_function(fun = function(x) eval(c(mean(fint_trend$tday)), x), color = "black", linetype = "dotted") +
  labs(x = "Temperature (deg. Celsius)", y = "Density") +
  guides(color = "none")

ggplot(finregions) +
  geom_sf(alpha = 0) +
  geom_sf(data = filter(fint_trend, outlying), aes(color = region)) +
  scale_x_continuous(breaks = seq(from = -180, to = 180, by = 4)) +
  labs(color = "Region")
Figure 9: Outlying trends in temperature change in the Finland data set.
Figure 10: Ouytling stations on a map of Finland.

Comparison with other methods

Data generating process with outlier contamination

Figure 11: Three schemes to generate density data with outliers.

ROC curves

Code
load("data/ICS_comparison.RData")

# Aggregate results (averaging over simulations)
comparison_avg <- comparison |>
  separate_wider_delim(Method, "_", names = c("Method", "Transf")) |>
  group_by(Scheme, Transf, Method, PP) |>
  summarize(
    n = n(),
    TPR_avg = mean(TPR),
    TPR_conf = qnorm(0.975) * sd(TPR) / sqrt(n()),
    FPR_avg = mean(FPR),
    FPR_conf = qnorm(0.975) * sd(FPR) / sqrt(n()),
    .groups = "drop"
  )

ggplot(comparison_avg, aes(FPR_avg, TPR_avg, color = Scheme, group = Scheme)) +
  geom_line() +
  geom_ribbon(aes(
    ymin = pmax(0, TPR_avg - TPR_conf),
    ymax = pmin(1, TPR_avg + TPR_conf),
    xmin = pmax(0, FPR_avg - FPR_conf),
    xmax = pmin(1, FPR_avg + FPR_conf),
    fill = Scheme
  ), alpha = 0.1, linetype = "dotted") +
  geom_function(fun = identity, linetype = "dotted") +
  facet_wrap(vars(paste(Method, Transf, sep = "_")), nrow = 2) +
  coord_fixed() +
  labs(
    x = "False Positive Rate (FPR)",
    y = "True Positive Rate (TPR)",
  ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
Figure 12: Comparing ICS with other distributional outlier detection methods through simulated ROC curves for several outlier generating schemes.

Area under the curve (AUC)

Code
comparison_auc <- comparison_avg |>
  group_by(Scheme, Method, Transf) |>
  summarise(AUC = integrate(approxfun(FPR_avg, TPR_avg), 0, 1)$value)

comparison_auc_summary <- comparison_auc |>
  group_by(Method, Transf) |>
  summarise(AUC_avg = mean(AUC)) |>
  arrange(desc(AUC_avg))

ggplot(comparison_auc_summary, aes(reorder(paste(Method, Transf, sep = "\n"), -AUC_avg), AUC_avg, fill = Method)) +
  geom_col() + coord_cartesian(ylim=c(0.5, NA)) +
  labs(x = "")
Figure 13: Area under the ROC curve for each outlier detection method.

Thank you for listening!

For more details: (Mondon et al. 2024).

Any questions?

References

Archimbaud A., Boulfani F., Gendre X., Nordhausen K., Ruiz-Gazen A. and Virta J. 2022. ICS for multivariate functional anomaly detection with applications to predictive maintenance and quality control.” Econometrics and Statistics, March.
Archimbaud A., Nordhausen K. and Ruiz-Gazen A. 2018a. ICSOutlier: Unsupervised Outlier Detection for Low- Dimensional Contamination Structure.” The R Journal 10 (1) : 234–50.
———. 2018b. ICS for multivariate outlier detection with application to quality control.” Computational Statistics & Data Analysis 128 (December) : 184–99.
Li B., Van Bever G., Oja H., Sabolová R. and Critchley F. 2021. Functional independent component analysis : An extension of fourth-order blind identification.”
Machalová J., Talská R., Hron K. and Gába A. 2021. Compositional splines for representation of density functions.” Computational Statistics 36 (2) : 1031–64.
Mondon C., Trinh H. T., Ruiz-Gazen A. and Thomas-Agnan C. 2024. ICS for complex data with application to outlier detection for density data objects.” TSE Working Paper.
Silverman B. W. 1982. On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method.” The Annals of Statistics 10 (3) : 795–810.
Tyler D. E., Critchley F., Dümbgen L. and Oja H. 2009. Invariant co-ordinate selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (3) : 549–92.
Van Den Boogaart K. G., Egozcue J. J. and Pawlowsky-Glahn V. 2014. Bayes Hilbert Spaces.” Australian & New Zealand Journal of Statistics 56 (2) : 171–94.
Virta J., Li B., Nordhausen K. and Oja H. 2020. Independent component analysis for multivariate functional data.” Journal of Multivariate Analysis 176 : 104568.

Appendix

Scatter operators

  • \((E, \langle \cdot, \cdot \rangle)\) Euclidean space of dimension \(p\).

  • \(X \in \mathcal M (\Omega, E)\) random variable over \(E\).

  • \(\mathcal S^{+} (E)\) space of non-negative symmetric operators over \(E\).

  • \(\mathcal {GS}^{+} (E)\) space of positive symmetric operators over \(E\).

  • \(S: \mathcal A \subseteq \mathcal M (\Omega, E) \rightarrow \mathcal S^{+} (E)\) scatter operator over \(\mathcal A\) if:

    • Invariance by equality in distribution: \[\forall (X,Y) \in \mathcal A^2, X \sim Y \Rightarrow S[X] = S[Y] \]
    • It is affine equivariant if: \[\forall A \in \mathcal{GL} (E), \forall b \in E, S[AX+b] = A S[X] A^*\]

Proof of Proposition 1

Proof.

Let us decompose \(S_1[X]^{-1} (X - \mathbb EX)\) over the basis \(H\), which is orthonormal in \((E, \langle \cdot, S_1[X] \cdot \rangle)\): \[\begin{aligned} S_1[X]^{-1} (X - \mathbb EX) &= \sum_{i=1}^p \langle S_1[X]^{-1} (X - \mathbb EX), S_1[X] h_i \rangle h_i \\ &= \sum_{i=1}^p \langle X - \mathbb EX, h_i \rangle h_i \\ S_1[X]^{-1} (X - \mathbb EX) &= \sum_{i=1}^p z_i h_i \end{aligned}\]

The dual basis \(H^*\) of \(H\) is the one that satisfies \(\langle h_i, h^*_j \rangle = \delta_{ij}\) for all \(1 \leq i,j \leq p\) and we know from the definition of ICS that this holds for \((S_1[X] h_i)_{1 \leq i \leq p}\).

Proof of Corollary 1

Proof. In ICS, the first step is to centre the data so wlog: \(\mathbb E [X] = 0\).
\(E\) is isometric to \(\mathbb R^p\) via the linear isomorphism: \[\phi_B: x \mapsto G_B^{1/2} [x]_B\] Then for any \((x,y) \in E^2\): \[ \begin{aligned} \langle \operatorname{Cov}_w [X] x, y \rangle_E &= \mathbb E [ w^2(X) \langle X, x \rangle_E \langle X, y \rangle_E ] \\ &= \mathbb E [ w^2([X]_B) \langle G_B^{1/2} [X]_B, G_B^{1/2} [x]_B \rangle \langle G_B^{1/2} [X]_B, G_B^{1/2} [y]_B \rangle ] \\ &= \langle \operatorname{Cov}_w (G_B^{1/2} [X]_B) G_B^{1/2} [x]_B, G_B^{1/2} [y]_B \rangle \\ &= \langle \operatorname{Cov}_w (G_B [X]_B) [x]_B, [y]_B \rangle \\ &= \langle \operatorname{Cov}_w ([X]_B) G_B [x]_B, G_B [y]_B \rangle \end{aligned} \]

Bayes space and CB-splines

Source: Van Den Boogaart, Egozcue, and Pawlowsky-Glahn (2014) and Machalová et al. (2021).

  • \(P = \frac1{\lambda(I)} \lambda\) uniform probability measure over \(I=[a,b]\).
  • \(L^2_0 (P) = \{ f \in L^2 (P) | \int_I f \mathrm dP = 0 \}\) is a separable Hilbert space.
  • Centred log-ratio transform \(\operatorname{clr}: p \in \mathbb R^I \mapsto \log(p) - \int_I \log(p) \mathrm dP\) (if it exists)
  • Bayes space \(B^2 (P) = \{ \operatorname{clr}^{-1} (\{f\}), f \in L^2_0 (I) \}\) inherits the separable Hilbert space structure of \(L^2_0 (P)\). Each equivalence class contains a density function defined almost everywhere.
  • B-splines basis \(B=(B_i)_{-q \leq i \leq k}\) of \(\mathcal S_{q+1}^\Delta (I) \subset L^2 (P)\), the subspace of polynomial splines on \(I\) of order \(q+1\) and with knots \(\Delta\).
  • ZB-splines basis \(Z=(Z_i)_{-q+1 \leq i \leq k-1} = \left(\frac{\mathrm d B_i}{\mathrm dx}\right)_{-q+1 \leq i \leq k-1}\) of \(\mathcal Z_q^\Delta (I) = \mathcal S_q^\Delta (I) \cap L^2_0 (P)\)
  • CB-splines basis \(C = (C_i)_{-q+1 \leq i \leq k-1} = (\operatorname{clr}^{-1} (Z_i))_{-q+1 \leq i \leq k-1}\) of \(\mathcal C_q^\Delta (I)\) which is a Euclidean subspace of \(B^2 (P)\) of dimension \(p=k+q-1\).