Coordinate-free ICS for distributional and functional data

Doctoral Workshop on Decision Mathematics and Statistics

Camille Mondon

camille.mondon@tse-fr.eu

Huong Thi Trinh

trinhthihuong@tmu.edu.vn

Anne M. Ruiz

anne.ruiz-gazen@tse-fr.eu

Christine Thomas-Agnan

christine.thomas@tse-fr.eu

June 19, 2025

Outline

So that’s our theoretical aim. In this presentation, in order to make it more interactive, I will also study the example of temperature data, and our aim will be to detect outliers in a data set of density functions that capture the temperature change in Occitanie in the last 60 years. Today we focus on Occitanie, although in the article we studied Vietnam. Now, let us look at the workflow we propose. First, there is a preprocessing step from raw daily maximum temperature measurements to annual temperature distributions, for each station. I won’t get into details here due to lack of time, but it is a particular technique of density estimation. Then a second preprocessing step to transform this time series of temperature distribution into what we call a trend slope density, using density-on-scalar regression. Again, I won’t get into details, but I will give you the intuition to interpret these density curves.

Theoretical aim: Generalising ICS to distributional and functional data.
Case study aim: Detecting meteorological stations with an outlying evolution of their annual temperature distribution.

Figure 1: Workflow diagram of distributional outlier detection using ICS.

Preprocessing

From temperature data…

Source: Daily weather observations in Occitanie by Météo France.

Code

library(dda)
library(sf)
library(tidyverse)
set.seed(9)

load("data/france/occitanie.RData")
load("data/france/ocdepts.RData")

occlim <- unnest(occlim)

selected_times <- sample(unique(occlim$time), 3)

occlim_sample <- occlim |>
  filter(time %in% selected_times)

ggplot(ocdepts) +
  geom_sf(alpha = 0) +
  geom_sf(data = occlim_sample, aes(color = tx)) +
  scale_color_viridis_c() +
  facet_wrap(vars(time)) +
  labs(color = "Temperature")

Figure 2: Daily maximum temperature per meteorological station in Occitanie between 1950 and 2025. 3 days chosen at random are extracted from the data set.

Total size: 300 stations \(\times\) 65 years \(\times\) 365 days > 4 million observations.

…through distributional time series…

Code

oct |>
  filter(name == "TOULOUSE-BLAGNAC") |>
  plot_funs(tx, color = year) +
  facet_wrap(vars(10 * year %/% 10)) +
  labs(x = "Temperature", y = "Density")

Figure 3: Annual maximum temperature densities for the TOULOUSE-BLAGNAC station, grouped by decade.

…to trend slope densities

Interpretation: using odds ratios defined by Maier et al. (2025).

Code

oct_slope <- oct_slope |>
  mutate(tx = as_dd(tx)) |>
  st_join(ocdepts, suffix = c("", "_dept")) |>
  rename(dept = name_dept) |>
  select(id, name, dept, tx)

c(mean(oct_slope$tx)) |>
  plot() +
  geom_hline(
    yintercept = 1 / diff(oct_slope$tx[[1]]$basis$rangeval),
    linetype = "dashed"
  ) +
  geom_vline(
    xintercept = 15,
    linetype = "dashed"
  ) +
  labs(x = "Temperature", y = "Density")

Figure 4: Mean trend slope density across the 300 stations of Occitanie.

Data set of trend slope densities

Case study

Code

oct_slope |>
  plot_funs(tx) +
  labs(x = "Temperature", y = "Density")

Figure 5: Data set \(D\) of trend slope densities of temperature change for the 300 stations in Occitanie.

Theory

Density data

11-dimensional space \(\mathcal C_q^\Delta (a,b)\) of compositional splines.
Subspace of the Bayes Hilbert space \((B^2 (a,b), \oplus, \odot, \langle \cdot, \cdot \rangle)\) (Van Den Boogaart, Egozcue, and Pawlowsky-Glahn 2014).
Inner product \[\langle f, g \rangle_{B^2 (a,b)} = \frac1{2(b-a)} \int_a^b \int_a^b \frac{\ln f(x)}{\ln f(y)} \frac{\ln g(x)}{\ln g(y)} dxdy\]

General framework

\((E, \langle \cdot, \cdot \rangle)\): \(p\)-dimensional Euclidean space.
Why a Euclidean space? To define scatter operators, an inner product is required to have a notion of symmetry and positiveness.
Let \(X\) be an \(E\)-valued random object, from which we observe a sample \(D = (X_1, \dots, X_n)\).

Scatter operators

Case study

Code

library(ICS)

oct_eval <- oct_slope |>
  mutate(tx = as.fd(tx)) |>
  eval_funs(tx, n = 100) |>
  st_drop_geometry() |>
  select(x, y, id) |>
  pivot_wider(names_from = id, values_from = y) |>
  column_to_rownames("x") |>
  t()

oct_cov <- cov(oct_eval)
eps <- 1e-8 * matrix(rnorm(ncol(oct_eval) * nrow(oct_eval), sd = max(abs(oct_eval))),
  nrow = nrow(oct_eval)
)
oct_cov4 <- cov4(oct_eval + eps)

oct_slope_scatters <- crossing(t_1 = colnames(oct_eval), t_2 = colnames(oct_eval)) |>
  rowwise() |>
  mutate(Cov = oct_cov[t_1, t_2], Cov_4 = oct_cov4[t_1, t_2]) |>
  pivot_longer(c(Cov, Cov_4), names_to = "scatter", values_to = "z") |>
  mutate(t_1 = as.numeric(t_1), t_2 = as.numeric(t_2))

tx_ics <- dda::ICS(oct_slope$tx)
oct_basis <- oct_slope$tx[[1]]$basis

changemat <- to_zbsplines(basis = oct_basis, inv = TRUE)
gram <- t(changemat) %*% fda::inprod(oct_basis, oct_basis) %*% changemat
tx_cov_zb <- cov(t(to_zbsplines(c(oct_slope$tx))))
tx_cov4_zb <- cov4(t(to_zbsplines(c(oct_slope$tx))))

# changemat_inv <- to_zbsplines(basis = oct_basis)
# tx_cov_b <- t(changemat_inv) %*% tx_cov_zb %*% changemat_inv
# tx_cov4_b <- t(changemat_inv) %*% tx_cov4_zb %*% changemat_inv

rangeval <- oct_slope$tx[[1]]$basis$rangeval
tval <- seq(rangeval[1], rangeval[2], length.out = 100)

zbsp <- fd(to_zbsplines(inv = TRUE, coefs = diag(11), basis = oct_basis), oct_basis)
zbsp_tval <- eval.fd(tval, zbsp)
tx_cov_grid <- zbsp_tval %*% tx_cov_zb %*% t(zbsp_tval)
tx_cov4_grid <- zbsp_tval %*% tx_cov4_zb %*% t(zbsp_tval)
dimnames(tx_cov_grid) <- list(tval, tval)
dimnames(tx_cov4_grid) <- list(tval, tval)

oct_slope_scatters_2 <- reshape2::melt(tx_cov_grid) |>
  rename(t_1 = Var1, t_2 = Var2, Cov = value) |>
  mutate(Cov_4 = reshape2::melt(tx_cov4_grid)$value) |>
  pivot_longer(c(Cov, Cov_4), names_to = "scatter", values_to = "z")

ggplot(oct_slope_scatters_2) +
  stat_contour_filled(aes(t_1, t_2, z = z)) +
  facet_wrap(vars(scatter), nrow = 2) +
  labs(x = "Temperature", y = "Temperature", fill = "Value")

Figure 6: Empirical estimates of the scatter operators \(\color{#0072b2}{\operatorname{Cov}} [X]\) and \(\color{#d55e00}{\operatorname{Cov}_4} [X]\) for the data set of slope densities of temperature change in Occitanie.

Theory

\(w\)-weighted covariance operator \[\begin{multline*}\langle \operatorname{Cov}_w [X] x, y \rangle \\ = \mathbb E [ w^2(\| \operatorname{Cov}[X]^{-1/2} (X - \mathbb E X) \|) \\ \langle X - \mathbb EX, x \rangle \langle X - \mathbb EX, y \rangle ] \end{multline*}\] for \((x,y) \in E\).
Examples.
1. \(w = 1\) defines \(\color{#0072b2}{\operatorname{Cov}}\);
2. \(w = \operatorname{id}\) defines \(\color{#d55e00}{\operatorname{Cov}_4}\).
Affine equivariant and defined coordinate-freely: no reference to any basis, exist on every Euclidean space.

Coordinate-free ICS

ICS problem

Theory

Definition 1 (ICS problem) \(\operatorname{ICS} (X, \color{#0072b2}{S_1},\color{#d55e00}{S_2})\): joint diagonalisation of \(\color{#0072b2}{S_1}[X]\) and \(\color{#d55e00}{S_2}[X]\), i.e:

\[\left\{ \begin{aligned} \langle \color{#0072b2}{S_1} [X] h_j, h_{j'} \rangle &= \delta_{jj'} \\ \langle \color{#d55e00}{S_2}[X] h_j, h_{j'} \rangle &= \lambda_j \delta_{jj'} \end{aligned} \right. \] for all \(1 \leq j,j' \leq p\) where

\(\lambda_1 \geq \dots \geq \lambda_p \geq 0\) are called ICS eigenvalues.
\((h_1, \dots, h_p)\) is an ICS eigenbasis.

Solutions exist provided that \(\color{#0072b2}{S_1} [X]\) is invertible.

Case study

Code

library(dda)

tx_ics <- dda::ICS(oct_slope$tx, )

tx_ics_eigen <- tibble(
  index = factor(names(tx_ics$gen_kurtosis),
    levels = names(tx_ics$gen_kurtosis)
  ),
  gen_kurtosis = tx_ics$gen_kurtosis,
  H_dual = as.list(tx_ics$H_dual),
)

tx_ics_eigen |>
  ggplot(aes(index, gen_kurtosis, group = 1)) +
  geom_line(alpha = 0.5) +
  geom_point(aes(color = gen_kurtosis), size = 3) +
  labs(x = "", y = "ICS eigenvalues") +
  guides(color = "none")

Figure 7: Scree plot of ICS eigenvalues for the data set of slopes densities in Occitanie.

Code

tx_ics_eigen |>
  mutate(H_dual = as_dd(H_dual)) |>
  plot_funs(H_dual, color = gen_kurtosis) +
  facet_wrap(vars(index)) +
  geom_hline(
    yintercept = 1 / diff(oct_slope$tx[[1]]$basis$rangeval),
    linetype = "dotted"
  ) +
  scale_y_continuous(n.breaks = 3) +
  labs(x = "Temperature", y = "Density") +
  guides(color = "none")

Figure 8: ICS eigendensities for the data set of slope densities in Occitanie.

Invariant coordinates

Case study

Code

oct_slope |>
  cbind(as_tibble(tx_ics$scores)) |>
  ggplot(aes(IC.1, IC.2, color = dept)) +
  geom_point() +
  coord_fixed() +
  labs(color = "Department")

Figure 9: First two invariant coordinates coloured by department.

Theory

For \(1 \leq j \leq p\): \[z_j = \langle X - \mathbb EX, h_j \rangle\] is the \(i\)-th invariant coordinate of \(X\).

Reconstructing the curves

Case study

Code

h_dual_shift <- tx_ics$H_dual
h_dual_shift$coefs <- sweep(h_dual_shift$coefs, 1, -c(mean(oct_slope$tx))$coefs)
tx_ics_eigen |>
  mutate(H_dual_shift = as.list(as_dd(h_dual_shift))) |>
  plot_funs(H_dual_shift, color = gen_kurtosis) +
  geom_function(
    fun = function(x) eval(c(mean(oct_slope$tx)), x),
    color = "black", linetype = "dotted"
  ) +
  scale_y_continuous(n.breaks = 3) +
  facet_wrap(vars(index)) +
  labs(x = "Temperature", y = "Density") +
  guides(color = "none")

Figure 10: Mean slope density across Occitanie, shifted by each ICS dual eigendensity.

Theory

Proposition 1 (Reconstructing formula) \[X = \mathbb EX + \sum_{j=1}^p z_j h^*_j\] where \[\begin{aligned} H^* &= (h^*_j)_{1 \leq j \leq p} \\ &= (\color{#0072b2}{S_1} [X] h_j)_{1 \leq j \leq p} \end{aligned}\] is the dual basis of \(H=(h_j)_{1 \leq j \leq p}\).

A link between ICS problems

Proposition 2 Let \((E, \langle \cdot, \cdot \rangle_E) \overset{\varphi}{\rightarrow} (F, \langle \cdot, \cdot \rangle_F)\) be an isometry between two Euclidean spaces of dimension \(p\) and \(\color{#0072b2}{w_1}, \color{#d55e00}{w_2} : \mathbb R^+ \rightarrow \mathbb R\) two measurable functions. For any integrable \(E\)-valued random object \(X\) and for \(\ell \in \{ 1, 2 \}\) \[ \operatorname{Cov}_{w_\ell}^F [\varphi(X)] = \varphi \circ \operatorname{Cov}_{w_\ell}^E [X] \circ \varphi^{-1} \] Moreover, for any basis \(H = (h_1, \dots, h_p)\) of \(E\) and any \(\Lambda = (\lambda_1 \geq \ldots \geq \lambda_p)\)

\((H, \Lambda)\) solves \(\operatorname{ICS} (X, \color{#0072b2}{\operatorname{Cov}_{w_1}^E}, \color{#d55e00}{\operatorname{Cov}_{w_2}^E})\) in the space \(E\)
\((\varphi(H), \Lambda)\) solves \(\operatorname{ICS} (\varphi(X), \color{#0072b2}{\operatorname{Cov}_{w_1}^F}, \color{#d55e00}{\operatorname{Cov}_{w_2}^F})\) in the space \(F\)

are equivalent.

Implementation

Let

\(B\) be any basis of \(E\);
\(G_B = (\langle b_i, b_j \rangle)_{1 \leq i,j \leq p}\) its Gram matrix;
\([\cdot]_B: E \rightarrow \mathbb R^p\) the coordinates in basis \(B\).

Corollary 1 (of Proposition 2) For any basis \(H\) of \(E\), these are equivalent:


0.	\(H\)	solves	\(\operatorname{ICS} (X, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\)	over	\(E\)
1.	\(\color{#d55e00}{G_B^{1/2}} [H]_B\)	solves	\(\operatorname{ICS} (\color{#d55e00}{G_B^{1/2}} [X]_B, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\)	over	\(\mathbb R^p\)
2.	\([H]_B\)	solves	\(\operatorname{ICS} (\color{#d55e00}{G_B} [X]_B, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\)	over	\(\mathbb R^p\)
3.	\(\color{#d55e00}{G_B} [H]_B\)	solves	\(\operatorname{ICS} ([X]_B, \operatorname{Cov}_{w_1}, \operatorname{Cov}_{w_2})\)	over	\(\mathbb R^p\)

Outlier detection using ICS

Theory

Based on ICSOutlier (Archimbaud, Nordhausen, and Ruiz-Gazen 2018a).

Compute the invariant coordinates.

Select \(k\) components. Using the scree plot or D’Agostino normality tests, etc. Now compute the ICS distances: \(\sum_{j=1}^k |z_j|^2\).

Choose a cut-off \(\tau\). Based on Monte Carlo simulations to compute quantiles (Archimbaud, Nordhausen, and Ruiz-Gazen 2018b).

Case study

Code

tx_out <- dda::ICS_outlier(oct_slope$tx, index = 1)

oct_slope <- oct_slope |>
  mutate(
    distance = tx_out$ics_distances,
    outlying = tx_out$outliers == 1
  )

oct_slope |>
  mutate(name = ifelse(outlying, name, NA)) |>
  ggplot(aes(id, distance, color = dept, label = name)) +
  geom_point() +
  geom_text() +
  scale_color_discrete(limits = function(x) ifelse(is.na(x), "", x)) +
  labs(color = "Outlying\nstations") +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.title.x = element_blank()
  )

Figure 11: ICS distances for \(k=1\) selected component.

Code

oct_slope |>
  mutate(name = ifelse(outlying, name, NA)) |>
  ggplot(aes(id, distance, color = dept, label = name)) +
  geom_point() +
  geom_text() +
  scale_color_discrete(limits = function(x) ifelse(is.na(x), "", x)) +
  geom_hline(yintercept = tx_out$ics_dist_cutoff) +
  labs(color = "Outlying\nstations") +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.title.x = element_blank()
  )

Figure 12: ICS distances for k=1 selected component, with threshold 5.5.

Results for the case study

Code

oct_slope |>
  filter(outlying) |>
  plot_funs(tx, color = name) +
  geom_hline(
    yintercept = 1 / diff(oct_slope$tx[[1]]$basis$rangeval),
    linetype = "dotted"
  ) +
  geom_function(
    fun = function(x) eval(mean(c(oct_slope$tx)), x),
    color = "black"
  ) +
  labs(x = "Temperature", y = "Density") +
  guides(color = "none")

ggplot(ocdepts) +
  geom_sf(alpha = 0) +
  geom_sf(data = filter(oct_slope, outlying), aes(color = name)) +
  geom_sf_text(data = filter(oct_slope, outlying), aes(color = name, label = name)) +
  labs(x = "", y = "", color = "Station")

Figure 13: Outlying stations for the data set of slope densities in Occitanie.

Figure 14: Outlying stations on a map of Occitanie.

Comparison with other methods

Monte Carlo simulations

Generate density data with outliers according to 3 different outlier schemes.
(Optionally) apply a transformation:

log quantile density
quantile function

then choose a metric:

\(L^2 (a,b)\)
\(B^2 (a,b)\)
Wasserstein.

Compute scores for 4 different functional outlier detection methods adapted to density data:

Coordinate-free ICS
Distance to median (Murph et al. 2024)
MBD (modified band depth) (Sun and Genton 2011)
MUOD (Massive Unsupervised Outlier Detection) indices (Ojo et al. 2022).

ROC curves

Code

load("data/ICS_comparison.RData")

# Aggregate results (averaging over simulations)
comparison_avg <- comparison |>
  separate_wider_delim(Method, "_", names = c("Method", "Transf")) |>
  group_by(Scheme, Transf, Method, PP) |>
  summarize(
    n = n(),
    TPR_avg = mean(TPR),
    TPR_conf = qnorm(0.975) * sd(TPR) / sqrt(n()),
    FPR_avg = mean(FPR),
    FPR_conf = qnorm(0.975) * sd(FPR) / sqrt(n()),
    .groups = "drop"
  )

ggplot(comparison_avg, aes(FPR_avg, TPR_avg, color = Scheme, group = Scheme)) +
  geom_line() +
  geom_ribbon(aes(
    ymin = pmax(0, TPR_avg - TPR_conf),
    ymax = pmin(1, TPR_avg + TPR_conf),
    xmin = pmax(0, FPR_avg - FPR_conf),
    xmax = pmin(1, FPR_avg + FPR_conf),
    fill = Scheme
  ), alpha = 0.1, linetype = "dotted") +
  geom_function(fun = identity, linetype = "dotted") +
  facet_wrap(vars(paste(Method, Transf, sep = "_")), nrow = 2) +
  coord_fixed() +
  labs(
    x = "False Positive Rate (FPR)",
    y = "True Positive Rate (TPR)",
  ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Figure 15: Comparing ICS with other distributional outlier detection methods through simulated ROC curves for several outlier generating schemes.

Area under the curve (AUC)

Code

comparison_auc <- comparison_avg |>
  group_by(Scheme, Method, Transf) |>
  summarise(AUC = integrate(approxfun(FPR_avg, TPR_avg), 0, 1)$value)

comparison_auc_summary <- comparison_auc |>
  group_by(Method, Transf) |>
  summarise(AUC_avg = mean(AUC)) |>
  arrange(desc(AUC_avg))

ggplot(comparison_auc_summary, aes(reorder(paste(Method, Transf, sep = "\n"), -AUC_avg), AUC_avg, fill = Method)) +
  geom_col() +
  coord_cartesian(ylim = c(0.5, NA)) +
  labs(x = "")

Figure 16: Area under the ROC curve for each outlier detection method.

Thank you for listening!

Temperature distribution data: reducing the dimension with ICS
Pre-processing: ICS needs an inner product on a finite-dimensional space containing all the estimated densities
Coordinate-free ICS: defining the Invariant Coordinates without using a ‘canonical’ basis (change the inner product rather than transform the dataset)
Implementation: using coordinates in a CB-spline basis
Submit the ICSComplex package to CRAN
ICS in infinite-dimensional Hilbert spaces?

For more details: Mondon et al. (2025).

Any questions?

References

Archimbaud A., Nordhausen K. and Ruiz-Gazen A. 2018a. “ICSOutlier: Unsupervised Outlier Detection for Low- Dimensional Contamination Structure.” The R Journal 10 (1) : 234–50.

———. 2018b. “ICS for multivariate outlier detection with application to quality control.” Computational Statistics & Data Analysis 128 (December) : 184–99.

Machalová J., Talská R., Hron K. and Gába A. 2021. “Compositional splines for representation of density functions.” Computational Statistics 36 (2) : 1031–64.

Maier E.-M., Stöcker A., Fitzenberger B. and Greven S. 2025. “Additive density-on-scalar regression in Bayes Hilbert spaces with an application to gender economics.” The Annals of Applied Statistics 19 (1) : 680–700.

Mondon C., Trinh H. T., Ruiz-Gazen A. and Thomas-Agnan C. 2025. “ICS for complex data with application to outlier detection for density data.” arXiv.

Murph A. C., Strait J. D., Moran K. R., Hyman J. D. and Stauffer P. H. 2024. “Visualisation and outlier detection for probability density function ensembles.” Stat 13 (2) : e662.

Ojo O. T., Fernández Anta A., Lillo R. E. and Sguera C. 2022. “Detecting and classifying outliers in big functional data.” Advances in Data Analysis and Classification 16 (3) : 725–60.

Sun Y. and Genton M. G. 2011. “Functional Boxplots.” Journal of Computational and Graphical Statistics 20 (2) : 316–34.

Van Den Boogaart K. G., Egozcue J. J. and Pawlowsky-Glahn V. 2014. “Bayes Hilbert Spaces.” Australian & New Zealand Journal of Statistics 56 (2) : 171–94.

Appendix

Scatter operators

\((E, \langle \cdot, \cdot \rangle)\) Euclidean space of dimension \(p\).
\(X \in \mathcal M (\Omega, E)\) random variable over \(E\).
\(\mathcal S^{+} (E)\) space of non-negative symmetric operators over \(E\).
\(\mathcal {GS}^{+} (E)\) space of positive symmetric operators over \(E\).
\(S: \mathcal A \subseteq \mathcal M (\Omega, E) \rightarrow \mathcal S^{+} (E)\) scatter operator over \(\mathcal A\) if:
- Invariance by equality in distribution: \[\forall (X,Y) \in \mathcal A^2, X \sim Y \Rightarrow S[X] = S[Y] \]
- It is affine equivariant if: \[\forall A \in \mathcal{GL} (E), \forall b \in E, S[AX+b] = A S[X] A^*\]

Proof of Proposition 1

Proof.

Let us decompose \(S_1[X]^{-1} (X - \mathbb EX)\) over the basis \(H\), which is orthonormal in \((E, \langle \cdot, S_1[X] \cdot \rangle)\): \[\begin{aligned} S_1[X]^{-1} (X - \mathbb EX) &= \sum_{i=1}^p \langle S_1[X]^{-1} (X - \mathbb EX), S_1[X] h_j \rangle h_j \\ &= \sum_{i=1}^p \langle X - \mathbb EX, h_j \rangle h_j \\ S_1[X]^{-1} (X - \mathbb EX) &= \sum_{i=1}^p z_j h_j \end{aligned}\]

The dual basis \(H^*\) of \(H\) is the one that satisfies \(\langle h_j, h^*_j \rangle = \delta_{ij}\) for all \(1 \leq i,j \leq p\) and we know from the definition of ICS that this holds for \((S_1[X] h_j)_{1 \leq j \leq p}\).

Proof of Corollary 1

Proof. In ICS, the first step is to centre the data so wlog: \(\mathbb E [X] = 0\).
\(E\) is isometric to \(\mathbb R^p\) via the linear isomorphism: \[\phi_B: x \mapsto G_B^{1/2} [x]_B\] Then for any \((x,y) \in E^2\): \[ \begin{aligned} \langle \operatorname{Cov}_w [X] x, y \rangle_E &= \mathbb E [ w^2(X) \langle X, x \rangle_E \langle X, y \rangle_E ] \\ &= \mathbb E [ w^2([X]_B) \langle G_B^{1/2} [X]_B, G_B^{1/2} [x]_B \rangle \langle G_B^{1/2} [X]_B, G_B^{1/2} [y]_B \rangle ] \\ &= \langle \operatorname{Cov}_w (G_B^{1/2} [X]_B) G_B^{1/2} [x]_B, G_B^{1/2} [y]_B \rangle \\ &= \langle \operatorname{Cov}_w (G_B [X]_B) [x]_B, [y]_B \rangle \\ &= \langle \operatorname{Cov}_w ([X]_B) G_B [x]_B, G_B [y]_B \rangle \end{aligned} \]

Bayes space and CB-splines

Source: Van Den Boogaart, Egozcue, and Pawlowsky-Glahn (2014) and Machalová et al. (2021).

\(P = \frac1{\lambda(I)} \lambda\) uniform probability measure over \(I=[a,b]\).
\(L^2_0 (P) = \{ f \in L^2 (P) | \int_I f \mathrm dP = 0 \}\) is a separable Hilbert space.
Centred log-ratio transform \(\operatorname{clr}: p \in \mathbb R^I \mapsto \log(p) - \int_I \log(p) \mathrm dP\) (if it exists)
Bayes space \(B^2 (P) = \{ \operatorname{clr}^{-1} (\{f\}), f \in L^2_0 (I) \}\) inherits the separable Hilbert space structure of \(L^2_0 (P)\). Each equivalence class contains a density function defined almost everywhere.
B-splines basis \(B=(B_i)_{-q \leq i \leq k}\) of \(\mathcal S_{q+1}^\Delta (I) \subset L^2 (P)\), the subspace of polynomial splines on \(I\) of order \(q+1\) and with knots \(\Delta\).
ZB-splines basis \(Z=(Z_i)_{-q+1 \leq i \leq k-1} = \left(\frac{\mathrm d B_i}{\mathrm dx}\right)_{-q+1 \leq i \leq k-1}\) of \(\mathcal Z_q^\Delta (I) = \mathcal S_q^\Delta (I) \cap L^2_0 (P)\)
CB-splines basis \(C = (C_i)_{-q+1 \leq i \leq k-1} = (\operatorname{clr}^{-1} (Z_i))_{-q+1 \leq i \leq k-1}\) of \(\mathcal C_q^\Delta (I)\) which is a Euclidean subspace of \(B^2 (P)\) of dimension \(p=k+q-1\).