Penalized log-spline density estimation: density_mpl
Model
Given observations with weights () on an interval , and a basis , the density is parameterized by as
The penalized negative log-likelihood minimized is where is the differential operator carried by WfdParobj$Lfd.
Gradient and expected Hessian
Differentiating gives Hence With the penalty, These are exactly loglfnden (gradient) and Varfnden+ (expected Hessian); the integrals , , are computed by Romberg integration in normden.phi, expectden.phi, expectden.phiphit.
Identifiability
For a B-spline (or any partition-of-unity) basis, , so shifting shifts by the constant , which is absorbed into . The likelihood is therefore invariant along the direction . Let be an orthonormal basis of (zerobasis(nbasis)). Reparametrize ; the reduced problem has (since is PD on , and the term is PSD).
Algorithm (Fisher scoring with line search)
- Initialize (from
WfdParobj$fd). - Compute ; reduce to ; Newton step , .
- Backtracking/interpolating line search (
stepit) on subject to box constraints (stepchk). - Update ; stop when or
iterlimreached. - Return and .
Correctness
Convexity. is the log-partition of an exponential family and is convex (its Hessian is PSD). Therefore is convex, and is convex (strictly, on , when ). Any stationary point of the reduced problem in is the unique global minimizer.
Descent direction. Since , so is a strict descent direction for (lines 227–238 check the slope and fall back to if numerical loss makes , line 340–343).
Monotone convergence. The line search returns with (Wolfe-style interpolation in stepit); the sequence is decreasing and bounded below (since is coercive on the identifiable subspace), hence converges. Combined with strict convexity on , , the unique minimizer.
Normalization. At return, is computed to tolerance by Romberg extrapolation, so by construction.
Output
density_mpl returns list(Wfdobj, C, Flist, iternum, iterhist) with the unique maximizer of the penalized log-likelihood in the chosen log-spline family.
Bugs in the R reference
Porting the algorithm to Rust surfaced two issues in fda::density.fd (which is what dda::density_mpl_legacy ports verbatim) that the new density_mpl_rust backend corrects.
1. Hessian scaling for frequency-weighted input
Let be the total weight. Differentiating twice in : The Fisher information is , not . In Varfnden (R/density-mpl-legacy.R) the code computes
Varphi <- nobs*(EDwDwt - outer(EDw,EDw))i.e. it always scales by nobs = length(x), regardless of .
- For a one-column input (,
nobs), this matches the correct formula. - For a two-column input with frequencies normalized to (see the
m == 2branch ofdensity_mpl_legacy), we have butnobs, so the Hessian is over-scaled by a factor of . The Newton step is therefore under-sized by a factor .
In R, fda::stepit masks this by aggressively expanding the trial step (its line search is Wolfe-style with quadratic interpolation). A plain backtracking line search — as in the Rust port — does not expand and the descent stalls. The fix in density.rs::varfnden replaces nobs with :
2. Early termination at non-stationary points
The R loop in density_mpl_legacy has two exit conditions:
if (abs(Flist$f - Foldstr$f) < conv) { ... return }
if (Flist$f >= Foldstr$f) break # not a return: just exitsThe second branch fires whenever the line search fails to find a step that strictly decreases . This is not the same as reaching a stationary point — stepit can fail purely from numerical limitations of its interpolation while is still on the order of to . The function then returns the current regardless.
For test problems where the true minimizer lies close to an active box constraint or where the Hessian is poorly conditioned along weakly identified directions (e.g. B-spline boundary coefficients with no data nearby), density_mpl_legacy terminates with and the returned is not the optimum of the penalized log-likelihood.
The Rust port uses Armijo backtracking on the (correctly scaled) Newton direction and converges to in our tests. Where the R and Rust solutions disagree, the Rust solution has the strictly smaller objective value.
Practical consequence
For un-penalized fits on data that doesn’t cover the basis range (e.g. a sample with ), the two implementations can return densities that disagree by outside the data support — both are local optima of an ill-posed problem and the boundary coefficients of the B-spline basis are weakly identified. Inside the data support the densities agree to under 1% relative difference.