Coordinate-free ICS for distributional and functional data

Workshop on Invariant Coordinate Selection and Related Methods
University of Helsinki

Camille Mondon

camille.mondon@tse-fr.eu

Huong Thi Trinh

trinhthihuong@tmu.edu.vn

Anne M. Ruiz

anne.ruiz-gazen@tse-fr.eu

Christine Thomas-Agnan

christine.thomas@tse-fr.eu

May 7, 2025

Outline

Temperature distributions in Finland
Coordinate-free ICS (Invariant Coordinate Selection)
Functional outlier detection using coordinate-free ICS
Comparison with other methods

How to study climate change in Finland?

From temperature data…

Code

library(dda)
library(sf)
library(tidyverse)

load("data/finland/finclim.RData")
load("data/finland/finregions.RData")
t0 <- as.Date("2024-01-01")

finclim_sample <- finclim |>
  filter(time %in% c(t0, t0 + 121, t0 + 244)) |>
  filter(!is.na(tday))

ggplot(finregions) +
  geom_sf(alpha = 0) +
  geom_sf(data = finclim_sample, aes(color = tday)) +
  scale_color_viridis_c() +
  facet_wrap(vars(time)) +
  scale_x_continuous(breaks = seq(from = -180, to = 180, by = 4)) +
  labs(color = "Temperature (deg. Celsius)")

Figure 1: Daily average temperature per station in Finland on several days of 2024. **Total size:** 190 stations \(\times\) 35 years \(\times\) 365 days > 2 million observations. **Source:** Weather observations by the **Finnish Meteorological Institute**.

…to temperature distribution trends

Code

load("data/finland/fint.RData")

fint_trend |>
  group_by(region) |>
  filter(n() > 15) |>
  ungroup() |>
  plot_funs(tday) +
  facet_wrap(vars(region)) +
  ylim(NA, 0.1) +
  labs(x = "Temperature (deg. Celsius)", y = "Density")

Figure 2: Yearly trend of daily average temperature between 1990 and 2024 for the stations in 3 regions of Finland. Computed in **2 steps:** (1) **Maximum Penalised Likelihood** density estimation (Silverman 1982) per station and per year. The smooth densities are compositional splines and belong to the **Bayes Hilbert space \(B^2 ([a,b])\)** (Van Den Boogaart, Egozcue, and Pawlowsky-Glahn 2014) \(\rightarrow\) 190 \(\times\) 35 functional observations. (2) Regression of density function on year per station \(\rightarrow\) 190 functional trends of temperature change.

How to apply ICS to this distributional dataset?

ICS, as defined by Tyler et al. (2009), requires:

observations from a random vector \(X\) in the space \(\mathbb R^p\);
estimating two scatter matrices \(S_1[X]\) and \(S_2 [X]\);
estimating eigenvectors \(H = (h_1 | \dots | h_p)\) and eigenvalues \((\lambda_1, \dots, \lambda_p)\) from their joint diagonalisation;
computing invariant coordinates \((z_1, \dots, z_p)\).

What if \(X\) is a functional (or distributional) object?

Idea 1: back to a coordinate vector

Find a way to send the functional object \(X\) to some space \(\mathbb R^p\), using either
1. its \(y\)-values on a discretisation of the \(x\)-axis; or
2. the coordinates of an approximation in a basis of a space \(E\) of functions. How to choose the space? Usually some finite dimensional subspace of \(L^2 (a,b)\). How to choose the approximation? This process is called smoothing. Note: (a) is actually a particular case of (b) for a certain choice of basis.
Apply ICS to the coordinate vector that was obtained.
Problems:
- How to choose the basis? The output of ICS (eigenelements, ICs) might depend on it, but why should it?
- How to link the output of ICS to the original functional object \(X\)? In particular, the eigenvectors might be distorted in the chosen basis is not orthonormal.

Idea 2

Still find an approximation of \(X\) in a finite-dimensional functional space \(E\).
But define ICS directly on the space \(E\) in a coordinate-free manner:
- Dependence on choice of units \(\rightarrow\) Affine invariance.
- Dependence on choice of basis \(\rightarrow\) Coordinate freedom.

Multivariate ICS	Coordinate-free ICS
the space \(\mathbb R^p\)	a \(p\)-dimensional Euclidean space \(E\)
a random vector \(X \in \mathbb R^p\)	a random object \(X \in E\)
two scatter matrices \(S_1[X]\) and \(S_2[X]\)	two scatter operators (linear maps) \(S_1[X]\) and \(S_2[X]\)
a matrix of eigenvectors \(H\)	an eigenbasis \(H = (h_1, \dots, h_p)\) of \(E\)
eigenvalues \((\lambda_1, \dots, \lambda_p)\)	unchanged
invariant coordinates	unchanged

Observations from a random object in a Euclidean space

Theory

Let \((E, \langle \cdot, \cdot \rangle)\) be a \(p\)-dimensional Euclidean space (abstract vector space with an inner product). Why a Euclidean space? In order to define scatter operators, an inner product is required to have a notion of symmetry and positiveness.
Let \(X\) be an \(E\)-valued random object, from which we observe a data set \(D = (x_1, \dots, x_n)\).

Finland application

Code

load("data/finland/fint.RData")

fint_trend |>
  plot_funs(tday) +
  ylim(NA, 0.1) +
  labs(x = "Temperature (deg. Celsius)", y = "Density")

Figure 3: The data set \(D\) of trends of temperature change in Finland in a 11-dimensional space \(E\) of compositional splines.

Two scatter operators

Theory

We need to find two scatter operators \(S_1[X]\) and \(S_2[X]\) that are symmetric positive-definite linear operators on \(E\).
The \(w\)-weighted covariance operator is defined by \[\begin{multline*}\langle \operatorname{Cov}_w [X] x, y \rangle \\ = \mathbb E [ w^2(\| \operatorname{Cov}[X]^{-1/2} (X - \mathbb E X) \|) \\ \langle X - \mathbb EX, x \rangle \langle X - \mathbb EX, y \rangle ] \end{multline*}\] where \((x,y) \in E^2\) and \(w : \mathbb R^+ \rightarrow \mathbb R\) is a measurable function.
It is an affine equivariant scatter operator and its definition is completely coordinate-free: it exists on every Euclidean space.
Example: \(S_1 = \operatorname{Cov}\) for \(w=1\) and \(S_2 = \operatorname{Cov}_4\) for \(w = \operatorname{id}\).

Finland application

Code

library(ICS)
load("data/finland/fint.RData")

fint_eval <- fint_trend |>
  mutate(tday = as.fd(tday)) |>
  eval_funs(tday, n = 100) |>
  st_drop_geometry() |>
  select(x, y, fmisid) |>
  pivot_wider(names_from = fmisid, values_from = y) |>
  column_to_rownames("x") |>
  t()

fint_cov <- cov(fint_eval)
eps <- 1e-8 * matrix(rnorm(ncol(fint_eval) * nrow(fint_eval), sd = max(abs(fint_eval))),
  nrow = nrow(fint_eval)
)
fint_cov4 <- cov4(fint_eval + eps)

fint_trend_scatters <- crossing(t_1 = colnames(fint_eval), t_2 = colnames(fint_eval)) |>
  rowwise() |>
  mutate(Cov = fint_cov[t_1, t_2], Cov_4 = fint_cov4[t_1, t_2]) |>
  pivot_longer(c(Cov, Cov_4), names_to = "scatter", values_to = "z") |>
  mutate(t_1 = as.numeric(t_1), t_2 = as.numeric(t_2))

tday_ics <- dda::ICS(fint_trend$tday)
fint_basis <- fint_trend$tday[[1]]$basis

changemat <- to_zbsplines(basis = fint_basis, inv = TRUE)
gram <- t(changemat) %*% fda::inprod(fint_basis, fint_basis) %*% changemat
tday_cov_zb <- cov(t(to_zbsplines(c(fint_trend$tday))))
tday_cov4_zb <- cov4(t(to_zbsplines(c(fint_trend$tday))))

# changemat_inv <- to_zbsplines(basis = fint_basis)
# tday_cov_b <- t(changemat_inv) %*% tday_cov_zb %*% changemat_inv
# tday_cov4_b <- t(changemat_inv) %*% tday_cov4_zb %*% changemat_inv

rangeval <- fint_trend$tday[[1]]$basis$rangeval
tval <- seq(rangeval[1], rangeval[2], length.out = 100)

zbsp <- fd(to_zbsplines(inv = TRUE, coefs = diag(11), basis = fint_basis), fint_basis)
zbsp_tval <- eval.fd(tval, zbsp)
tday_cov_grid <- zbsp_tval %*% tday_cov_zb %*% t(zbsp_tval)
tday_cov4_grid <- zbsp_tval %*% tday_cov4_zb %*% t(zbsp_tval)
dimnames(tday_cov_grid) <- list(tval, tval)
dimnames(tday_cov4_grid) <- list(tval, tval)

fint_trend_scatters_2 <- reshape2::melt(tday_cov_grid) |>
  rename(t_1 = Var1, t_2 = Var2, Cov = value) |>
  mutate(Cov_4 = reshape2::melt(tday_cov4_grid)$value) |>
  pivot_longer(c(Cov, Cov_4), names_to = "scatter", values_to = "z")

ggplot(fint_trend_scatters_2) +
  stat_contour_filled(aes(t_1, t_2, z = z)) +
  facet_wrap(vars(scatter), nrow = 2) +
  labs(x = "Temperature (deg. celsius)", y = "Temperature (deg. celsius)", z = "Value")

Figure 4: Empirical estimates of the scatter operators \(\operatorname{Cov} [X]\) and \(\operatorname{Cov}_4 [X]\) for the data set on trends of temperature change in Finland.

Coordinate-free ICS

Setting: \((E, \langle \cdot, \cdot \rangle)\) Euclidean space of dimension \(p\)
Examples: \(\mathbb R^p\), square matrices, functions (polynomials, splines).
Aim: generalising (Tyler et al. 2009) where \(E=\mathbb R^p\).
See also: Mervin Stone in the discussion of (Tyler et al. 2009), (Li et al. 2021), (Virta et al. 2020) & (Archimbaud et al. 2022).

Invariant Coordinate Selection

Theory

Definition 1 (ICS problem)

\(\operatorname{ICS} (X, \color{#0072b2}{S_1},\color{#d55e00}{S_2})\): in \((E, \langle \cdot, \color{#0072b2}{S_1}[X] \cdot \rangle)\) Euclidean space of dimension \(p\), find an orthonormal basis \(\color{#009e73}{H} = (h_1, \dots, h_p)\) diagonalising the symmetric operator \(\color{#0072b2}{S_1}[X]^{-1} \color{#d55e00}{S_2}[X]\), i.e:

\[\left\{ \begin{aligned} \langle \color{#0072b2}{S_1} [X] h_i, h_j \rangle &= \delta_{ij} \\ \langle \color{#d55e00}{S_2}[X] h_i, h_j \rangle &= \lambda_i \delta_{ij} \end{aligned} \right. \text{ for all }1 \leq i,j \leq p\] where \(\lambda_1 \geq \dots \geq \lambda_p \geq 0\) are called generalised kurtosis.
\(z_i = \langle X - \mathbb EX, h_i \rangle, 1 \leq i \leq p\) are called invariant coordinates.

Finland application

ICS eigenelements

Code

library(dda)

tday_ics <- dda::ICS(fint_trend$tday)

tday_ics_eigen <- tibble(
  index = factor(names(tday_ics$gen_kurtosis),
    levels = names(tday_ics$gen_kurtosis)
  ),
  gen_kurtosis = tday_ics$gen_kurtosis,
  H_dual = as.list(tday_ics$H_dual)
)

tday_ics_eigen |>
  ggplot(aes(index, gen_kurtosis, group = 1)) +
  geom_line(alpha = 0.5) +
  geom_point(aes(color = index), size = 3) +
  labs(x = "", y = "ICS eigenvalues") +
  guides(color = "none")

h_dual_shift <- c(1 * tday_ics$H_dual)
h_dual_shift$coefs <- sweep(h_dual_shift$coefs, 1, -c(mean(fint_trend$tday))$coefs)
tday_ics_eigen |>
  mutate(H_dual_shift = as.list(h_dual_shift)) |>
  plot_funs(H_dual_shift, color = index) +
  geom_function(fun = function(x) eval(c(mean(fint_trend$tday)), x), color = "black", linetype = "dotted") +
  facet_wrap(vars(index)) +
  labs(x = "Temperature (deg. Celsius)", y = "Density")

Figure 5: Scree plot of ICS eigenvalues.

Figure 6: Mean trend of temperature change, shifted by each ICS eigendensity.

Proposition 1 (Reconstructing formula) \(X = \mathbb EX + \sum_{i=1}^p z_i h^*_i\) where \(\color{#009e73}{H^*} = (h^*_i)_{1 \leq i \leq p} = (\color{#0072b2}{S_1} [X] h_i)_{1 \leq i \leq p}\) is the dual basis of \(\color{#009e73}{H}\).

Invariant coordinates

Code

fint_trend |>
  cbind(as_tibble(tday_ics$scores)) |>
  ggplot(aes(IC.1, IC.2,
    label = name, color = region, shape = NA
  )) +
  geom_point() +
  geom_text() +
  coord_fixed() +
  labs(color = "") +
  guides(color = "none", shape = "none")

fint_trend |>
  cbind(as_tibble(tday_ics$scores)) |>
  ggplot(aes(IC.10, IC.11,
    label = name, color = region, shape = NA
  )) +
  geom_point() +
  geom_text() +
  coord_fixed() +
  guides(alpha = "none", shape = "none")

Figure 7: First two invariant coordinates coloured by region.

Figure 8: Last two invariant coordinates coloured by region.

A link between ICS problems

Proposition 2 Let \((E, \langle \cdot, \cdot \rangle_E) \overset{\varphi}{\rightarrow} (F, \langle \cdot, \cdot \rangle_F)\) be an isometry between two Euclidean spaces of dimension \(p\) and \(w_1, w_2 : \mathbb R^+ \rightarrow \mathbb R\) two measurable functions. For any integrable \(E\)-valued random object \(X\), the equality \[ \operatorname{Cov}_{w_\ell}^F [\varphi(X)] = \varphi \circ \operatorname{Cov}_{w_\ell}^E [X] \circ \varphi^{-1} \] holds for \(\ell \in \{ 1, 2 \}\), as well as the equivalence between the following assertions, for any basis \(H = (h_1, \dots, h_p)\) of \(E\), and any finite non-increasing real sequence \(\Lambda = (\lambda_1 \geq \ldots \geq \lambda_p)\):

\((H, \Lambda)\) solves \(\operatorname{ICS} (X, \operatorname{Cov}_{w_1}^E, \operatorname{Cov}_{w_2}^E)\) in the space \(E\).
\((\varphi(H), \Lambda)\) solves \(\operatorname{ICS} (\varphi(X), \operatorname{Cov}_{w_1}^F, \operatorname{Cov}_{w_2}^F)\) in the space \(F\).

Implementation

Let

\(B\) be any basis of \(E\);
\(G_B = (\langle b_i, b_j \rangle)_{1 \leq i,j \leq p}\) its Gram matrix;
\([\cdot]_B: E \rightarrow \mathbb R^p\) the coordinates in basis \(B\).

Corollary 1 (of Proposition 2) For any basis \(H\) of \(E\), these are equivalent:


0.	\(H\)	solves	\(\operatorname{ICS} (X, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\)	over	\(E\)
1.	\(\color{#d55e00}{G_B^{1/2}} [H]_B\)	solves	\(\operatorname{ICS} (\color{#d55e00}{G_B^{1/2}} [X]_B, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\)	over	\(\mathbb R^p\)
2.	\([H]_B\)	solves	\(\operatorname{ICS} (\color{#d55e00}{G_B} [X]_B, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\)	over	\(\mathbb R^p\)
3.	\(\color{#d55e00}{G_B} [H]_B\)	solves	\(\operatorname{ICS} ([X]_B, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\)	over	\(\mathbb R^p\)

Functional outlier detection using coordinate-free ICS

Theory

Based on ICSOutlier (Archimbaud, Nordhausen, and Ruiz-Gazen 2018a).

Compute the invariant coordinates.
Select \(k\) components. Using the scree plot or D’Agostino normality tests, etc. Now compute the ICS distances: \[\sum_{j=1}^k |z_j|^2\]
Choose a cut-off \(\tau\). Based on Monte Carlo simulations to compute quantiles (Archimbaud, Nordhausen, and Ruiz-Gazen 2018b). We can avoid this step by drawing ROC curves instead.

Finland application

Code

tday_out <- dda::ICS_outlier(fint_trend$tday, index = 1:2)

fint_trend <- fint_trend |>
  mutate(outlying = tday_out$outliers == 1)

fint_trend |>
  filter(outlying) |>
  plot_funs(tday, color = region) +
  ylim(NA, 0.1) +
  geom_function(fun = function(x) eval(c(mean(fint_trend$tday)), x), color = "black", linetype = "dotted") +
  labs(x = "Temperature (deg. Celsius)", y = "Density") +
  guides(color = "none")

ggplot(finregions) +
  geom_sf(alpha = 0) +
  geom_sf(data = filter(fint_trend, outlying), aes(color = region)) +
  scale_x_continuous(breaks = seq(from = -180, to = 180, by = 4)) +
  labs(color = "Region")

Figure 9: Outlying trends in temperature change in the Finland data set.

Figure 10: Ouytling stations on a map of Finland.

Comparison with other methods

Data generating process with outlier contamination

Figure 11: Three schemes to generate density data with outliers.

ROC curves

Code

load("data/ICS_comparison.RData")

# Aggregate results (averaging over simulations)
comparison_avg <- comparison |>
  separate_wider_delim(Method, "_", names = c("Method", "Transf")) |>
  group_by(Scheme, Transf, Method, PP) |>
  summarize(
    n = n(),
    TPR_avg = mean(TPR),
    TPR_conf = qnorm(0.975) * sd(TPR) / sqrt(n()),
    FPR_avg = mean(FPR),
    FPR_conf = qnorm(0.975) * sd(FPR) / sqrt(n()),
    .groups = "drop"
  )

ggplot(comparison_avg, aes(FPR_avg, TPR_avg, color = Scheme, group = Scheme)) +
  geom_line() +
  geom_ribbon(aes(
    ymin = pmax(0, TPR_avg - TPR_conf),
    ymax = pmin(1, TPR_avg + TPR_conf),
    xmin = pmax(0, FPR_avg - FPR_conf),
    xmax = pmin(1, FPR_avg + FPR_conf),
    fill = Scheme
  ), alpha = 0.1, linetype = "dotted") +
  geom_function(fun = identity, linetype = "dotted") +
  facet_wrap(vars(paste(Method, Transf, sep = "_")), nrow = 2) +
  coord_fixed() +
  labs(
    x = "False Positive Rate (FPR)",
    y = "True Positive Rate (TPR)",
  ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Figure 12: Comparing ICS with other distributional outlier detection methods through simulated ROC curves for several outlier generating schemes.

Area under the curve (AUC)

Code

comparison_auc <- comparison_avg |>
  group_by(Scheme, Method, Transf) |>
  summarise(AUC = integrate(approxfun(FPR_avg, TPR_avg), 0, 1)$value)

comparison_auc_summary <- comparison_auc |>
  group_by(Method, Transf) |>
  summarise(AUC_avg = mean(AUC)) |>
  arrange(desc(AUC_avg))

ggplot(comparison_auc_summary, aes(reorder(paste(Method, Transf, sep = "\n"), -AUC_avg), AUC_avg, fill = Method)) +
  geom_col() + coord_cartesian(ylim=c(0.5, NA)) +
  labs(x = "")

Figure 13: Area under the ROC curve for each outlier detection method.

Thank you for listening!

Temperature distribution data: reducing the dimension with ICS
Pre-processing: ICS needs an inner product on a finite-dimensional space containing all the estimated densities
Coordinate-free ICS: defining the Invariant Coordinates without using a ‘canonical’ basis (change the inner product rather than transform the dataset)
Implementation: using coordinates in a CB-spline basis
Submit the dda package to CRAN
ICS in infinite-dimensional Hilbert spaces?

For more details: (mondon_ics_2024?).

Any questions?

References

Archimbaud A., Boulfani F., Gendre X., Nordhausen K., Ruiz-Gazen A. and Virta J. 2022. “ICS for multivariate functional anomaly detection with applications to predictive maintenance and quality control.” Econometrics and Statistics, March.

Archimbaud A., Nordhausen K. and Ruiz-Gazen A. 2018a. “ICSOutlier: Unsupervised Outlier Detection for Low- Dimensional Contamination Structure.” The R Journal 10 (1) : 234–50.

———. 2018b. “ICS for multivariate outlier detection with application to quality control.” Computational Statistics & Data Analysis 128 (December) : 184–99.

Li B., Van Bever G., Oja H., Sabolová R. and Critchley F. 2021. “Functional independent component analysis : An extension of fourth-order blind identification.”

Machalová J., Talská R., Hron K. and Gába A. 2021. “Compositional splines for representation of density functions.” Computational Statistics 36 (2) : 1031–64.

Silverman B. W. 1982. “On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method.” The Annals of Statistics 10 (3) : 795–810.

Tyler D. E., Critchley F., Dümbgen L. and Oja H. 2009. “Invariant co-ordinate selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (3) : 549–92.

Van Den Boogaart K. G., Egozcue J. J. and Pawlowsky-Glahn V. 2014. “Bayes Hilbert Spaces.” Australian & New Zealand Journal of Statistics 56 (2) : 171–94.

Virta J., Li B., Nordhausen K. and Oja H. 2020. “Independent component analysis for multivariate functional data.” Journal of Multivariate Analysis 176 : 104568.

Appendix

Scatter operators

\((E, \langle \cdot, \cdot \rangle)\) Euclidean space of dimension \(p\).
\(X \in \mathcal M (\Omega, E)\) random variable over \(E\).
\(\mathcal S^{+} (E)\) space of non-negative symmetric operators over \(E\).
\(\mathcal {GS}^{+} (E)\) space of positive symmetric operators over \(E\).
\(S: \mathcal A \subseteq \mathcal M (\Omega, E) \rightarrow \mathcal S^{+} (E)\) scatter operator over \(\mathcal A\) if:
- Invariance by equality in distribution: \[\forall (X,Y) \in \mathcal A^2, X \sim Y \Rightarrow S[X] = S[Y] \]
- It is affine equivariant if: \[\forall A \in \mathcal{GL} (E), \forall b \in E, S[AX+b] = A S[X] A^*\]

Proof of Proposition 1

Proof.

Let us decompose \(S_1[X]^{-1} (X - \mathbb EX)\) over the basis \(H\), which is orthonormal in \((E, \langle \cdot, S_1[X] \cdot \rangle)\): \[\begin{aligned} S_1[X]^{-1} (X - \mathbb EX) &= \sum_{i=1}^p \langle S_1[X]^{-1} (X - \mathbb EX), S_1[X] h_i \rangle h_i \\ &= \sum_{i=1}^p \langle X - \mathbb EX, h_i \rangle h_i \\ S_1[X]^{-1} (X - \mathbb EX) &= \sum_{i=1}^p z_i h_i \end{aligned}\]

The dual basis \(H^*\) of \(H\) is the one that satisfies \(\langle h_i, h^*_j \rangle = \delta_{ij}\) for all \(1 \leq i,j \leq p\) and we know from the definition of ICS that this holds for \((S_1[X] h_i)_{1 \leq i \leq p}\).

Proof of Corollary 1

Proof. In ICS, the first step is to centre the data so wlog: \(\mathbb E [X] = 0\).
\(E\) is isometric to \(\mathbb R^p\) via the linear isomorphism: \[\phi_B: x \mapsto G_B^{1/2} [x]_B\] Then for any \((x,y) \in E^2\): \[ \begin{aligned} \langle \operatorname{Cov}_w [X] x, y \rangle_E &= \mathbb E [ w^2(X) \langle X, x \rangle_E \langle X, y \rangle_E ] \\ &= \mathbb E [ w^2([X]_B) \langle G_B^{1/2} [X]_B, G_B^{1/2} [x]_B \rangle \langle G_B^{1/2} [X]_B, G_B^{1/2} [y]_B \rangle ] \\ &= \langle \operatorname{Cov}_w (G_B^{1/2} [X]_B) G_B^{1/2} [x]_B, G_B^{1/2} [y]_B \rangle \\ &= \langle \operatorname{Cov}_w (G_B [X]_B) [x]_B, [y]_B \rangle \\ &= \langle \operatorname{Cov}_w ([X]_B) G_B [x]_B, G_B [y]_B \rangle \end{aligned} \]

Bayes space and CB-splines

Source: Van Den Boogaart, Egozcue, and Pawlowsky-Glahn (2014) and Machalová et al. (2021).

\(P = \frac1{\lambda(I)} \lambda\) uniform probability measure over \(I=[a,b]\).
\(L^2_0 (P) = \{ f \in L^2 (P) | \int_I f \mathrm dP = 0 \}\) is a separable Hilbert space.
Centred log-ratio transform \(\operatorname{clr}: p \in \mathbb R^I \mapsto \log(p) - \int_I \log(p) \mathrm dP\) (if it exists)
Bayes space \(B^2 (P) = \{ \operatorname{clr}^{-1} (\{f\}), f \in L^2_0 (I) \}\) inherits the separable Hilbert space structure of \(L^2_0 (P)\). Each equivalence class contains a density function defined almost everywhere.
B-splines basis \(B=(B_i)_{-q \leq i \leq k}\) of \(\mathcal S_{q+1}^\Delta (I) \subset L^2 (P)\), the subspace of polynomial splines on \(I\) of order \(q+1\) and with knots \(\Delta\).
ZB-splines basis \(Z=(Z_i)_{-q+1 \leq i \leq k-1} = \left(\frac{\mathrm d B_i}{\mathrm dx}\right)_{-q+1 \leq i \leq k-1}\) of \(\mathcal Z_q^\Delta (I) = \mathcal S_q^\Delta (I) \cap L^2_0 (P)\)
CB-splines basis \(C = (C_i)_{-q+1 \leq i \leq k-1} = (\operatorname{clr}^{-1} (Z_i))_{-q+1 \leq i \leq k-1}\) of \(\mathcal C_q^\Delta (I)\) which is a Euclidean subspace of \(B^2 (P)\) of dimension \(p=k+q-1\).