3 Outlier detection using ICS for compositional data

3.1 ICS on the ilr-space

Given what has been said earlier about outlier detection and compositional data, one way to treat the subject would be to send isometrically the compositional data set to the \(\mathop{\mathrm{ilr}}\)-space and apply ICS under the elliptical mixture model to \(\mathop{\mathrm{ilr}}\left(\mathbf{X}\right)\).
Applying results like Theorem 3 from the article by Tyler et al. (2009), we would know that under some assumptions the outliers would be highlighted by an orthogonal projection on the first or the last \(\mathop{\mathrm{ilr}}\)-coordinates.
However, many questions arise from this approach:

What is the dependence of this model on the choice of isometry \(\mathop{\mathrm{ilr}}\)?
Why use an \(\mathop{\mathrm{ilr}}\) transformation and not the \(\mathop{\mathrm{clr}}\) transformation?
How to interpret (if it is unique) the ICS transformation back in the simplex?
Are elliptical distributions on the simplex and the results of the ICS method meaningful and can they model real data?

The last question can already be partially answered, since elliptical distributions are a generalization of the normal distribution on the simplex, for which the central limit theorem (Theorem 6.24 in (Pawlowsky-Glahn, Egozcue, and Tolosana-Delgado 2015)) still applies.
This suggests that the family of distributions with elliptical symmetry is still a worthy candidate to model real compositional data.
First, let us apply the elliptical mixture model after the isometric logratio transformation, and define an intrinsic model on the simplex.

3.2 ICS on the simplex: what is possible – what is not

3.2.1 Elliptical mixture models on the simplex

Firstly, if the distribution function \(F_\mathbf{X}\) of \(\mathbf{X}\) verifies: \[F_{\mathbf{X}^*} = \left(1-\varepsilon\right) F^*_0 + \sum_{k=1}^{k=q} \varepsilon_k F^*_k\] in the \(\mathop{\mathrm{ilr}}\)-space then, by definition of distribution functions on \(\mathcal{S}^D\) (as push-forward measures using the measurable function \(\mathop{\mathrm{ilr}}^{-1}\)): \[F_\mathbf{X}= \left(1-\varepsilon\right) F_0 + \sum_{k=1}^{k=q} \varepsilon_k F_k\] where, for \(0 \leq k \leq q\) we define \(F_k = F^*_k \circ \mathop{\mathrm{ilr}}\) (which does not rely on \(\mathop{\mathrm{ilr}}\)).
Now, let us apply the Invariant Coordinate Selection method on the \(\mathop{\mathrm{ilr}}\) coordinates and try to get rid of the dependence on the isometric logratio transformation.

3.2.2 ICS from the ilr-space back to the simplex

Code

source("../functions.R")
.mixturecomp <- rmvnormmix.acomp(
  150,
  c(0.92, 0.08),
  matrix(c(rep(1, 5), 4, 1, 1, 1, 1), 2, 5, TRUE),
  0.1 * identity.acomp(5)
)
.mixturecomp_index <- .mixturecomp$index

.B <- matrix(rnorm(25), 5, 5)
# .B = identity.acomp(5)
mixturecomp <- clrInv(clr(acomp(.mixturecomp$data)) %*% .B)
.mixturecomp_ICS <- ics.acomp(mixturecomp)
mixturecomp_ICS_scores <- .mixturecomp_ICS$data

plot_data(
  mixturecomp,
  col = .mixturecomp_index,
  width = 14,
  height = 10,
)

plot_data(
  mixturecomp_ICS_scores,
  col = .mixturecomp_index,
  width = 14,
  height = 10,
)

Figure 3.1: Data generating according to a Gaussian mixture model in \(\mathcal{S}^5\) (*top*), and its Invariant Coordinates (*bottom*).

Figure 3.2: Data generating according to a Gaussian mixture model in \(\mathcal{S}^5\) (*top*), and its Invariant Coordinates (*bottom*).

Table 3.1: [1] “\[" \begin{array}{rrrrrr} & \mathtt{IC.1} & \mathtt{IC.2} & \mathtt{IC.3} & \mathtt{IC.4} & \mathtt{IC.5} \\ \mathtt{X_1} & 0.33 & 0.15 & 0.22 & -0.44 & -0.38 \\ \mathtt{X_2} & -0.00 & 0.18 & 0.01 & 0.21 & -0.22 \\ \mathtt{X_3} & 0.08 & -0.52 & 0.09 & 0.27 & 0.07 \\ \mathtt{X_4} & -0.65 & 0.62 & -0.24 & 0.52 & 0.57 \\ \mathtt{X_5} & -0.12 & -0.07 & 0.13 & 0.05 & 0.23 \\ \end{array} [1] "\]”

Code

print_table(.mixturecomp_ICS$cov_log)

Code

.mixturecomp_ICS_outliers <- norm.acomp(acompmargin(.mixturecomp_ICS$data, c(4)))

mixturecomp_ICS_distances <- data.frame(cbind(
  1:length(.mixturecomp_ICS_outliers),
  .mixturecomp_ICS_outliers
))
colnames(mixturecomp_ICS_distances) <- c("Index", "ICS distances")

plot_data(
  mixturecomp_ICS_distances,
  grid = TRUE,
  col = .mixturecomp_index,
  asp = 0,
)

Figure 3.3: Outliers in selected Invariant Coordinates of the data set generated according to a Gaussian mixture model.

As we saw earlier when we whitened compositional data, the \(\mathop{\mathrm{ilr}}\)-linear transformation corresponds, back in the simplex, to: \[\mathbf{Z}= \mathop{\mathrm{H}}^\top \boxdot \, \left(\mathbf{X}\ominus \mathbf{g}\left[\mathbf{X}\right]\right)\] and \(\mathop{\mathrm{H}}^* = \mathop{\mathrm{S}}_1^{*-\frac{1}{2}} \mathop{\mathrm{Q}}^*\) is associated with \(\mathop{\mathrm{H}}= {\mathop{\mathrm{S}}_1}^{-\frac{1}{2}} \mathop{\mathrm{Q}}\) (where the inverse square root exists because the linear mapping associated with the \(\mathop{\mathrm{clr}}\)-matrix \(\mathop{\mathrm{S}}_1\) is positive-definite). So applying \(\mathop{\mathrm{H}}^\top\) means first applying \(\mathop{\mathrm{S}}_1\) (i.e. changing the Aitchison inner product) and then applying the orthogonal matrix \(\mathop{\mathrm{Q}}\): the interpretation of the ICS transformation remains the same on the simplex.
What we need to keep in mind is that the ICS method is based on the interpretation of the eigenvalues of \({\mathop{\mathrm{S}}_1}^{-1} \mathop{\mathrm{S}}_2\) as a measure of kurtosis according to each axis, and that a high kurtosis according to a direction suggests the existence of outliers in that particular direction: one way or another, we must choose \(\mathop{\mathrm{H}}^*\) so that this interpretation is not lost when isometrically coming back to the simplex.
Here, one could imagine an alternative way to find an orthogonal matrix \(\mathop{\mathrm{Q}}\) by diagonalizing \(\mathop{\mathrm{A}}= {{\mathop{\mathrm{S}}}_1}^{-\frac{1}{2}} {\mathop{\mathrm{S}}}_2 \, {{\mathop{\mathrm{S}}}_1}^{-\frac{1}{2}}\) directly in the \(\mathop{\mathrm{clr}}\)-space. Naturally, the eigenvalues (omitting the 0 associated to the zero-sum property) would be preserved, but the eigenvectors would form a basis of \(\mathbb{R}^D\) instead of the \(\mathop{\mathrm{clr}}\)-space \(\mathop{\mathrm{\mathcal{H}}}\). However, when diagonalizing in an \(\mathop{\mathrm{ilr}}\)-space then computing \(\mathop{\mathrm{Q}}= \mathop{\mathrm{V}}\mathop{\mathrm{Q}}^* \mathop{\mathrm{V}}^\top\), the matrix \(\mathop{\mathrm{Q}}\) does not diagonalize \(\mathop{\mathrm{A}}\) (it is not even orthogonal as a \(D \times D\)-matrix).

Example 3.1 Before analyzing in detail the dependence of the method on the choice of \(\mathop{\mathrm{ilr}}\) transformation (which will be the object of the next section), we present an application of this compositional ICS method to a data set of size \(n = 150\) generated according to a compositional Gaussian mixture model.
We first compute the matrix \(\mathop{\mathrm{H}}\) and the Invariant Coordinates as we described at the top of this paragraph (see ?fig-mixturecomp_ICS_scores).
Then, we select \(\mathtt{IC.1}\) and regroup the other components, and we see that the outliers are best identified using this particular amalgamation.
Finally we plot the Aitchison norms of the selected components on Figure 3.3.

3.2.3 The effects of changing the contrast matrix

Let us suppose that we have two transformations \(\mathop{\mathrm{ilr}}_1\) and \(\mathop{\mathrm{ilr}}_2\), whose respective contrast matrices are \(\mathop{\mathrm{V}}_1\) and \(\mathop{\mathrm{V}}_2\).
Then for each \(i \in \left\{1,2\right\}\), we define an orthogonal matrix \(\mathop{\mathrm{Q}}_i^{*_i}\) diagonalizing \(\mathop{\mathrm{A}}^{*_i}\) and note its associated \(\mathop{\mathrm{clr}}\)-matrix \(\mathop{\mathrm{Q}}_i \in \mathcal{M}\).
The aim of the subsection is to connect \(\mathop{\mathrm{Q}}_1\) to \(\mathop{\mathrm{Q}}_2\). Let us write the equalities we know this far: \[\begin{aligned} \mathop{\mathrm{A}}&= \mathop{\mathrm{V}}_1 \mathop{\mathrm{A}}^{*_1} {\mathop{\mathrm{V}}_1}^\top & & & \mathop{\mathrm{A}}&= \mathop{\mathrm{V}}_2 \mathop{\mathrm{A}}^{*_2} {\mathop{\mathrm{V}}_2}^\top \\ \end{aligned} \tag{3.1}\] \[\begin{aligned} \mathop{\mathrm{Q}}_1 &= \mathop{\mathrm{V}}_1 \mathop{\mathrm{Q}}_1^{*_1} {\mathop{\mathrm{V}}_1}^\top & & & \mathop{\mathrm{Q}}_1 &= \mathop{\mathrm{V}}_2 \mathop{\mathrm{Q}}_1^{*_2} {\mathop{\mathrm{V}}_2}^\top \\ \end{aligned} \tag{3.2}\] \[\begin{aligned} \mathop{\mathrm{Q}}_2 &= \mathop{\mathrm{V}}_1 \mathop{\mathrm{Q}}_2^{*_1} {\mathop{\mathrm{V}}_1}^\top & & & \mathop{\mathrm{Q}}_2 &= \mathop{\mathrm{V}}_2 \mathop{\mathrm{Q}}_2^{*_2} {\mathop{\mathrm{V}}_2}^\top \end{aligned} \tag{3.3}\] \[\begin{aligned} {\mathop{\mathrm{Q}}_1^{*_1}}^\top \mathop{\mathrm{A}}^{*_1} \mathop{\mathrm{Q}}_1^{*_1} &= \Delta^* & & & {\mathop{\mathrm{Q}}_2^{*_2}}^\top \mathop{\mathrm{A}}^{*_2} \mathop{\mathrm{Q}}_2^{*_2} &= \Delta^* \end{aligned} \tag{3.4}\]

Code

.D <- 5
.V1 <- ilrBase(D = .D)
.V2 <- t(matrix(ilr2clr(randortho(.D - 1)), .D - 1, .D))

plot_data(
  ics.acomp(mixturecomp, V = .V1)$data,
  col = .mixturecomp_index,
  width = 14,
  height = 10,
)

plot_data(
  ics.acomp(mixturecomp, V = .V2)$data,
  col = .mixturecomp_index,
  width = 14,
  height = 10,
)

The equations of line Equation 3.4 use the fact that the positive semi-definite matrices \(\mathop{\mathrm{A}}\), \(\mathop{\mathrm{A}}_1^{*_1}\), \(\mathop{\mathrm{A}}_2^{*_2}\) have the same non-zero eigenvalues, which justifies the notation \(\Delta^*\) (without reference to \(\mathop{\mathrm{ilr}}_1\) or \(\mathop{\mathrm{ilr}}_2\)) for the diagonal matrix contain them in decreasing order.
By equalizing the two different expressions of \(\mathop{\mathrm{A}}\) on line (Equation 3.1), we get: \[\mathop{\mathrm{A}}^{*_2} = \mathop{\mathrm{V}}_{1,2} \mathop{\mathrm{A}}^{*_1} \mathop{\mathrm{V}}_{1,2}^\top\] where \(\mathop{\mathrm{V}}_{1,2} = {\mathop{\mathrm{V}}_2}^\top \mathop{\mathrm{V}}_1\) is a \(\left(D-1\right) \times \left(D-1\right)\)-orthogonal matrix, as stated by the remark after Proposition 2.1 on elliptical distributions on the simplex. We do the same with lines Equation 3.2 and Equation 3.3 and we obtain two similar equalities connecting \(\mathop{\mathrm{Q}}_i^{*_2}\) to \(\mathop{\mathrm{Q}}_i^{*_1}\) for \(i \in \left\{1,2\right\}\).
Let us suppose that \(\mathop{\mathrm{Q}}_1 = \mathop{\mathrm{Q}}_2\). Then: \[\mathop{\mathrm{Q}}_2^{*_2} = \mathop{\mathrm{V}}_{1,2} \mathop{\mathrm{Q}}_1^{*_1} \mathop{\mathrm{V}}_{1,2}^\top\] so when replacing \(\mathop{\mathrm{Q}}_2^{*_2}\) by its new expression in the right side equation of line (Equation 3.4): \[\mathop{\mathrm{V}}_{1,2} {\mathop{\mathrm{Q}}_1^{*_1}}^\top \mathop{\mathrm{A}}^{*_1} \mathop{\mathrm{Q}}_1^{*_1} \mathop{\mathrm{V}}_{1,2}^\top\] i.e. using this time the left side equation of line (Equation 3.4): \[\mathop{\mathrm{V}}_{1,2} \Delta^* \mathop{\mathrm{V}}_{1,2}^\top = \Delta^*\]

Since this last equality is generally not true, we can deduce that in general: \[\mathop{\mathrm{Q}}_1 \neq \mathop{\mathrm{Q}}_2\]

The intuition is that diagonalizing the matrix \(\mathop{\mathrm{A}}^{*_i}\) sums up to choosing an orthonormal basis of the simplex, just as picking a contrast matrix. So when deducing \(\mathop{\mathrm{Q}}_i\) from \(Q_i^{*_i}\) we should not use different contrast matrices, but the same one which we will denote \(\mathop{\mathrm{V}}_0\).
Now let us redefine \(\mathop{\mathrm{Q}}_1\) and \(\mathop{\mathrm{Q}}_2\) with this adjustment: \[\begin{aligned} \mathop{\mathrm{Q}}_1 &= \mathop{\mathrm{V}}_1 \mathop{\mathrm{Q}}_1^{*_1} {\mathop{\mathrm{V}}_0}^\top & & & \mathop{\mathrm{Q}}_2 &= \mathop{\mathrm{V}}_2 \mathop{\mathrm{Q}}_2^{*_2} {\mathop{\mathrm{V}}_0}^\top \end{aligned}\] so that: \[\mathop{\mathrm{Q}}_2^{*_2} = \mathop{\mathrm{V}}_{1,2} \mathop{\mathrm{Q}}_1^{*_1}\]

Since the eigenvectors are never unique, we still have in general: \[\mathop{\mathrm{Q}}_1 \neq \mathop{\mathrm{Q}}_2\]

But let us assume that they are unique (which is realistic since under our continuous models the probability that two eigenvalues are equal is zero, so each eigenvector is unique up to its sign, which is still problematic but we could design a process that sets the signs). Then, by going up the previous reasoning we could deduce that \(\mathop{\mathrm{Q}}_1 = \mathop{\mathrm{Q}}_2\).

Example 3.2 Figure 3.4 contains scatter plots of the Invariant Coordinates of the compositional mixture data set, with two different contrast matrices. We notice that because the signs were not set, only the plot of the amalgamation \(\left(\mathtt{IC.1}, \mathtt{IC.2}, *\right)\) is identical between both subfigures. Because of the contrast matrices, one small change (for instance a vector turned into its inverse) on the \(\mathop{\mathrm{Q}}^*\) matrix can induce more important changes on \(\mathop{\mathrm{Q}}\).

Globally, we conclude that this adaptation of the ICS method (using \(\mathop{\mathrm{ilr}}\)) is dependent on the choice of \(\mathop{\mathrm{ilr}}\) for the same reason that in ICS, the expression ‘Invariant Coordinates’ and their definition are slightly improper: because of the non-uniqueness of the eigenvectors.
A way to avoid this problem would be to choose a privileged contrast matrix, for instance picking an orthonormal basis that diagonalizes \(\mathop{\mathrm{A}}\) in the \(\mathop{\mathrm{clr}}\)-space \(\mathop{\mathrm{\mathcal{H}}}\).

3.3 Application to automotive market data & interpretations

To conclude this chapter, let us apply our suggestion of adapted ICS method for compositional data to the BDDSegX data set on the automotive market.

Code

.BDDSegXcomp_ICS <- ics.acomp(BDDSegXcomp)
BDDSegXcomp_ICS_scores <- .BDDSegXcomp_ICS$data

plot_data(
  BDDSegXcomp_ICS_scores,
  col = "Blues",
  width = 14,
  height = 10,
)

Figure 3.5: Invariant Coordinates of the `BDDSegX` data set (compositions).

Table 3.2: [1] “\[" \begin{array}{rrrrrr} & \mathtt{IC.1} & \mathtt{IC.2} & \mathtt{IC.3} & \mathtt{IC.4} & \mathtt{IC.5} \\ \mathtt{S_A} & 0.14 & 0.11 & -0.12 & -0.17 & -0.08 \\ \mathtt{S_B} & 0.02 & 0.01 & -0.05 & 0.02 & -0.02 \\ \mathtt{S_C} & 0.01 & -0.03 & 0.04 & 0.04 & -0.04 \\ \mathtt{S_D} & -0.11 & -0.03 & 0.09 & -0.06 & 0.17 \\ \mathtt{S_E} & -0.32 & 0.05 & 0.17 & -0.03 & 0.30 \\ \end{array} [1] "\]”

Code

print_table(.BDDSegXcomp_ICS$cov_log)

Code

.BDDSegXcomp_ICS_outliers <- norm.acomp(groupparts(.BDDSegXcomp_ICS$data, 4, c(1:3, 5)))

BDDSegXcomp_ICS_distances <- data.frame(cbind(
  BDDSegX$Date,
  .BDDSegXcomp_ICS_outliers
))
colnames(BDDSegXcomp_ICS_distances) <- c("Date", "ICS distances")
BDDSegXcomp_ICS_distances$Date <- BDDSegX$Date

plot_data(
  BDDSegXcomp_ICS_distances,
  grid = TRUE,
  col = "Blues",
  asp = 0,
)

Figure 3.6: Distances of the selected Invariant Coordinates of the `BDDSegX` data set (compositions).

First, the Invariant Coordinates are computed, and the covariances between \(\log\)-coordinates of the original and transformed data set can be found on Table 3.2. It is more difficult to interpret the Invariant Coordinates since we did not define a good notion of correlation on the simplex, and that the covariances are less meaningful because the total variances of the coordinates differ from one another.
The Invariant Coordinates are plotted on Figure 3.5, and what is interesting is that, even without selection, we can see outliers appear, as well as groups (we remind that shades of blue are in chronological order, from darkest to lightest).
But if we try to select some coordinates, the outliers appear neither on the first nor on the last Invariant Coordinates, but rather on the fourth coordinate.
Figure 3.6 is a scatter plot of the ICS distances, i.e. the Aitchison norms of the amalgamation isolating \(\mathtt{IC.4}\) from the other components. We clearly see that between 2008 and 2010 (which are the year of the financial crisis in Europe) is were we get the majority of outliers.
There is a noteworthy difference with the Figure 1.8 (which plots the ICS distances, but for the data set in volumes): after 2014, new outliers appear. This suggests a reinterpretation of the outliers as majors booms of European automotive market, which happened around 2008 and around 2015.