---
title: "Control Polygon Reduction"
author: "Peter E. DeWitt"
output:
rmarkdown::html_vignette:
toc: true
number_sections: false
bibliography: references.bib
vignette: >
%\VignetteIndexEntry{Control Polygon Reduction}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r label = "setup", include = FALSE}
library(qwraps2)
options(qwraps2_markup = "markdown")
knitr::opts_chunk$set(collapse = TRUE)
```
```{r}
library(cpr)
packageVersion("cpr")
```
The purpose of this vignette is to illustrate how the Control Polygon
Reduction (CPR) method can be used to select a set of knots defining B-spline
to get a low degree of freedom and smooth fit to data. We start with a
primer on B-splines and control polygons then the development and use of CPR.
# B-splines and Control Polygons
The term "spline" is likely derived from shipwright or draftsmen splines,
thin wood strips, held in place by weights, used to define curves. These
splines naturally minimize strain energy and the use of additional weights at
strategic locations on the spline are needed to achieve specific curvatures.
Cubic B-splines are not dissimilar.
Splines are piecewise polynomial curves that are differentiable up to a
prescribed order. B-splines are based on a _basis_ of polynomial functions.
Definitions and notation for uni-variable and multi-variable B-splines and
the associated control polygons and control nets are presented in the
following. Two very good references for splines are @deboor2001 and
@prautzsch2002 if you wish to dig into the details.
## B-splines
A curve defined by $f\left(x\right)$ is a spline of order $k$ (degree $k
-1$), with knots $\xi_1, \xi_2, \ldots, \xi_m$ where $\xi_j \leq x_{j+1}$ and
$\xi_{j} < \xi_{j + k} \forall j$ if $f\left(x\right)$ is $k-r-1$ times
differentiable at any $r$-fold knot, and $f\left(x\right)$ is a polynomial of
order $\leq k$ over each interval $x \in \left[\xi_j, \xi_{j+1}\right]$ for
$j = 0, \ldots, m - 1.$
In particular, B-splines, are defined as an affine combination:
\begin{equation}
f \left( x \right) =
\sum_{j} \theta_j B_{j,k,\boldsymbol{\xi}}\left(x\right) =
\boldsymbol{B}_{k, \boldsymbol{\xi}} \left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi}}
\label{eq:f}
\end{equation}
where $B_{j,k,\boldsymbol{\xi}}\left(x\right)$ is the $j^{th}$ basis spline
function, $\boldsymbol{\xi}$ is a sequence of $l$ interior knots and a total
of $2k$ boundary knots, i.e., the cardinality of the knot sequence is
$\left\lvert \boldsymbol{\xi} \right\rvert = 2k + l.$ The value of the $k$
boundary knots is arbitrary, but a common choice is to use $k$-fold knots on
the boundary:
$$ \xi_1 = \xi_2 = \cdots = \xi_k < \xi_{k+1} \leq \cdots \leq \xi_{k + l} =
\xi_{k + l + 1} = \cdots = \xi_{2k + l}.$$
Alternative boundary knots can be used so long as the sequence
$\boldsymbol{\xi}$ is non-decreasing. More on the implications of $k$-fold
boundary knots follow in the next section.
The $j^{th}$ B-spline is defined as:
\begin{equation}
B_{j,k,\boldsymbol{\xi}}\left(x\right) =
\omega_{j,k,\boldsymbol{\xi}} \left(x\right)
B_{j,k-1,\boldsymbol{\xi}}\left(x\right) +
\left(1 - \omega_{j+1,k,\boldsymbol{\xi}} \right)
B_{j+1,k-1,\boldsymbol{\xi}}\left(x\right),
\end{equation}
where
\begin{equation}
B_{j,k,\boldsymbol{\xi}}\left(x\right) = 0 \quad \text{for} \quad x \notin
\left[\xi_j, \xi_{j+k} \right), \quad \quad
B_{j,1,\boldsymbol{\xi}}\left(x\right) = \begin{cases}
1 & x \in \left[\xi_j, \xi_{j+k}\right) \\ 0 & \text{otherwise},
\end{cases}
\end{equation}
and
\begin{equation}
\omega_{j,k,\boldsymbol{\xi}}\left(x\right) = \begin{cases}
0 & x \leq \xi_j \\ \frac{x-\xi_j}{\xi_{j+k-1} - \xi_{j}} & \xi_j < x < \xi_{j+k-1} \\
1 & \xi_{j+k-1} \leq x.
\end{cases}
\end{equation}
For a set of observations, $\boldsymbol{x} = x_1, \ldots, x_n,$ the basis
functions defined above generalize to a matrix:
\begin{equation}
\boldsymbol{B}_{k, \boldsymbol{\xi}} \left(\boldsymbol{x}\right) =
\begin{pmatrix}
B_{1, k, \boldsymbol{\xi}} \left(x_1\right) &
B_{2, k, \boldsymbol{\xi}} \left(x_1\right) &
\ldots &
B_{k + l, k, \boldsymbol{\xi}} \left(x_1\right) \\
\vdots & \vdots & \ddots & \vdots \\
B_{1, k, \boldsymbol{\xi}} \left(x_n\right) &
B_{2, k, \boldsymbol{\xi}} \left(x_n\right) &
\ldots &
B_{k + l, k, \boldsymbol{\xi}} \left(x_n\right).
\end{pmatrix}
\end{equation}
Within the cpr package we can generate a basis matrix thusly:
```{r label = "basis_matrix"}
x <- seq(0, 5.9999, length.out = 5000)
bmat <- bsplines(x, iknots = c(1, 1.5, 2.3, 4, 4.5), bknots = c(0, 6))
bmat
```
Note: the default order for
`r backtick(bsplines) `
is 4, and the default for the boundary knots is the range of
`r backtick(x) %s% "." `
However, relying on the default boundary knots can lead to unexpected
behavior as, by definition the splines on the $k$-fold upper boundary is 0.
We can quickly view the plot of each of these spline functions as well.
```{r label = "plot_bmat", fig.width = 7, fig.height = 4}
plot(bmat, show_xi = TRUE, show_x = TRUE)
```
A few of many important properties of the basis functions:
$$B_{j,k,\boldsymbol{xi}} \left(x\right) \in \left[0, 1\right], \forall x \in \mathbb{R};$$
```{r}
all(bmat >= 0)
all(bmat <= 1)
```
$$B_{j,k,\boldsymbol{\xi}} \left(x\right) > 0 \quad \text{for} \quad x \in \left(\xi_{j}, \xi_{j+k}\right);$$
and
$$\sum_j B_{j,k,\boldsymbol{xi}} \left(x\right) = \begin{cases} 1 & x \in
\left[\xi_1, \xi_{2k+l}\right) \\ 0 & otherwise. \end{cases}$$
```{r}
all.equal(rowSums(bmat), rep(1, nrow(bmat)))
```
### cpr::bsplines vs splines::bs
Part of the base R distribution is the splines package which build B-splines
by calling
`r backtick(bs) %s% "." `
There are three areas where the functions differ:
1. input arguments,
2. attributes of the returned matrix, and
3. behavior at the right boundary knot.
#### Input Arguments
```{r}
args(bsplines)
args(splines::bs)
```
| cpr::bsplines | splines::bs | Notes |
|:-------- |:----------- |:----- |
| x | x | numeric vector; the predictor variable |
| iknots | knots | internal knots |
| bknots | Boundary.knots | boundary knots |
| order | degree | polynomial order = polynomial degree + 1 |
| df | df | degrees of freedom |
| | intercept | |
| | warn.outside | |
`r backtick(bsplines) `
does not have the
`r backtick(intercept) `
nor the
`r backtick(warn.outside) `
arguments because the matrix generated by
`r backtick(bsplines) `
effectively _always_ has
`r backtick("intercept = TRUE", dequote = TRUE) `
and
`r backtick("warn.outside = TRUE", dequote = TRUE) %s% "." `
How values of
`r backtick(x) `
at the right boundary, and outside the boundary are treated also differ
between the two functions.
#### Attributes of the returned matrices
The default call for both B-spline functions returns a basis matrix for a
order 4 (degree 3; cubic) B-spline with boundary knots placed at
`r backtick(range(x)) %s% "." `
However, the returns are not the same.
```{r}
bs_mat <- splines::bs(x, knots = attr(bmat, "iknots"), Boundary.knots = attr(bmat, "bknots"))
str(attributes(bmat))
str(attributes(bs_mat))
```
The
`r backtick(bspline_mat) `
has additional attributes related to the control polygons.
The major difference is the in the dimension of the matrices.
By default
`r backtick(splines::bs) `
omits one column from the basis matrix such that when using using the
function is a regression formula the resulting design matrix is not rank
deficient. Using
`r backtick(bsplines) `
would suggest using a
`r backtick(+0) %s% " or " %s% backtick(-1) `
in the regression formula to omit the intercept (is nuance is handled in
calls to
`r backtick(cp) `
so the end user need not worry about it).
#### Right Continuity
By definition, the $\boldsymbol{B}_{j,k,\boldsymbol{\xi}}\left(x\right)$ are
non-negative right-continuous functions.
`r backtick(bsplines) `
adheres to the definition strictly, whereas
`r backtick(splines::bs) `
uses a pivoting method to allow for non-zero extrapolations outside the
support.
Example: for the
`r backtick(cpr::bsplines) `
call, notice that the first, third, and fifth rows, corresponding to values
outside the support are all zeros as are the row sums. Compare that to the
`r backtick(splines::bs) `
which returns negative values and in the matrix, and all rows sum to 1.
```{r}
bspline_eg <- bsplines(c(0, 1, 2, 5, 6), bknots = c(1, 5))
bs_eg <- splines::bs(c(0, 1, 2, 5, 6), Boundary.knots = c(1, 5), intercept = TRUE )
head(bspline_eg)
rowSums(bspline_eg)
head(bs_eg)
rowSums(bs_eg)
```
# Control Polygons
The spline $f\left(\boldsymbol{x}\right) = \boldsymbol{B}_{k,\boldsymbol{\xi}}\left(\boldsymbol{x}\right)$
is a convex sum of the coefficients $\boldsymbol{\theta}_{\boldsymbol{\xi}}.$
A meaningful geometric
relationship between $\boldsymbol{\xi}$ and
$\boldsymbol{\theta}_{\boldsymbol{\xi}}$ exist in the form of a
__control polygon__,
$CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\boldsymbol{\xi}}},$ a strong
convex hull for
$\boldsymbol{B}_{k,\boldsymbol{\xi}}\left(\boldsymbol{x}\right)
\boldsymbol{\theta}_{\boldsymbol{\xi}}.$
\begin{equation}
CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\xi}} =
\left\{ \left( \xi_{j}^{*}, \theta_{j,\boldsymbol{\xi}} \right) \right
\}_{j=1}^{\left\lvert\boldsymbol{\theta}_{\boldsymbol{\xi}}\right\rvert};
\quad \quad
\xi_{j}^{*} = \frac{1}{k-1} \sum_{i = 1}^{k-1} \xi_{j + i}.
\end{equation}
$CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\boldsymbol{\xi}}}$ is a
sequence of $\left\lvert \boldsymbol{\theta}_{\boldsymbol{\xi}} \right\rvert
= k + l$ control vertices. The control polygon can be thought of as a
piecewise linear function approximating the spline function. Changes in
convexity and other subtle characteristics of the spline function are
exaggerated by the control polygon.
For example, using the basis matrix defined above and the following
coefficients we can easily define a spline function and control polygon.
```{r label = "define_theta"}
theta <- matrix(c(1, 0, 3.5, 4.2, 3.7, -0.5, -0.7, 2, 1.5), ncol = 1)
cp0 <- cp(bmat, theta)
```
Plotting the control polygon and the corresponding spline:
```{r label = "plot_cp", fig.width = 7, fig.height = 4}
plot(cp0, show_spline = TRUE)
```
# Knot Influence
## Spline Spaces and Inserting a Knot
Consider two knot sequences $\boldsymbol{\xi}$ and $\boldsymbol{\xi} \cup
\boldsymbol{\xi}'.$ Then, for a given polynomial order $k,$ the spline space
$\mathbb{S}_{k,\boldsymbol{\xi}} \subset \mathbb{S}_{k,\boldsymbol{\xi} \cup
\boldsymbol{\xi}'}$ [@deboor2001,pg135]. Given this relationship between
spline spaces, and the convex sums generating spline functions, @boehm1980
presented a method for inserting a knots into the knot sequence such that the
spline function is unchanged. Specifically, for
$$
\boldsymbol{\xi}' = \left\{\left. \xi_{j}' \right|
\min\left(\boldsymbol{\xi}\right) < \xi_j <
\max\left(\boldsymbol{\xi}\right) \quad \forall j\right\},
$$
then there exist a $\boldsymbol{\theta}_{\boldsymbol{\xi} \cup
\boldsymbol{\xi}'}$ such that
$$
\boldsymbol{B}_{k,\boldsymbol{\xi}} \left(x\right)\boldsymbol{\theta}_{\boldsymbol{\xi}} =
\boldsymbol{B}_{k,\boldsymbol{\xi}\cup\boldsymbol{\xi}'}\left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi} \cup \boldsymbol{\xi}'}
\quad \forall x \in \left[\min\left(\boldsymbol{\xi}\right), \max\left(\boldsymbol{\xi} \right) \right].
$$
When inserting a singleton $\xi'$ into $\boldsymbol{\xi},$ then
$$
\boldsymbol{\theta}_{\boldsymbol{\xi} \cup \xi'} = \boldsymbol{W}_{k,
\boldsymbol{\xi}}\left(\xi'\right) \boldsymbol{\theta}_{\boldsymbol{\xi}}
$$
where $\boldsymbol{W}_{k,\boldsymbol{\xi}}\left(\xi'\right)$
is a $\left(
\left\lvert \boldsymbol{\theta} \right\rvert + 1\right) \times \left\lvert
\boldsymbol{\theta} \right\rvert$ lower bi-diagonal matrix
$$
\boldsymbol{W}_{k,\boldsymbol{\xi}}\left(\xi'\right) =
\begin{pmatrix}
1 & 0 & \cdots & 0 \\
1 - \omega_{1, k, \boldsymbol{\xi}}\left(\xi'\right) & \omega_{1, k, \boldsymbol{\xi}} \left(\xi'\right) & \cdots & 0 \\
0 & 1 - \omega_{2, k, \boldsymbol{\xi}}\left(\xi'\right) & \cdots 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & 1 - \omega_{\left\lvert \boldsymbol{\theta} \right\rvert - 1, k, \boldsymbol{\xi}} \left(\xi'\right) & \omega_{\left\lvert \boldsymbol{\theta} \right\rvert - 1, k, \boldsymbol{\xi}} \left(\xi'\right) \\
0 & 0 & 0 & 1
\end{pmatrix},
$$
with $\omega_{j,k,\boldsymbol{\xi}}\left(x\right)$ as defined above in the
de~Boor recursive algorithm. Through recursion we can insert as many knots
into $\boldsymbol{\xi}$ without changing the value of the spline function.
For an example, insert a knot $\xi' = 3$ into the control polygon defined
above.
```{r label = "insert_xi_prime3", fig.width = 7, fig.height = 4}
cp1 <- insert_a_knot(cp0, xi_prime = 3)
plot(cp0, cp1, color = TRUE, show_spline = TRUE)
```
## Assessing influence of $\xi_j$
Here we derive a metric for assessing how much influence $\xi_j \in
\boldsymbol{\xi}$ has on $\boldsymbol{B}_{k,\boldsymbol{\xi}}
\left(x\right)\boldsymbol{\theta}_{\boldsymbol{\xi}}.$ Using the relationship
defined by @boehm1980, we can derive a metric for the influence of $\xi_j$
on the spline function.
Start with an defined $k, \boldsymbol{\xi},$ and
$\boldsymbol{\theta}_{k,\boldsymbol{\xi}}.$ The relationship
$$
\boldsymbol{\theta}_{\boldsymbol{\xi}} =
\boldsymbol{W}_{k,\boldsymbol{\xi}\backslash\xi_{j}}\left(\xi_{j}\right)
\boldsymbol{\theta}_{\boldsymbol{\xi}\backslash\xi_{j}}
$$
holds if $\xi_j$ has zero influence. However, in practice, we would expect
that the relationship is
$$
\boldsymbol{\theta}_{\boldsymbol{\xi}} =
\boldsymbol{W}_{k,\boldsymbol{\xi}\backslash\xi_{j}}\left(\xi_{j}\right)
\boldsymbol{\theta}_{\boldsymbol{\xi}\backslash\xi_{j}} + \boldsymbol{d}
$$
for a set of deviations $\boldsymbol{d}$ which would be equal to
$\boldsymbol{0}$ if $\xi_j$ has no influence on the spline.
We can estimate $\boldsymbol{d}$ via least squares. To simplify the notation
in the following we drop some of the subscripts and parenthesis, that is, let
$\boldsymbol{W} =
\boldsymbol{W}_{k,\boldsymbol{\xi}\backslash\xi_{j}}\left(\xi_{j}\right).$
$$
\begin{align}
\boldsymbol{d} &= \boldsymbol{\theta}_{\boldsymbol{\xi}} - \boldsymbol{W} \boldsymbol{\theta}_{\boldsymbol{\xi}\backslash\xi_{j}} \\
&= \boldsymbol{\theta}_{\boldsymbol{\xi}} - \boldsymbol{W} \left(\boldsymbol{W}^{T}\boldsymbol{W}\right)^{-1}\boldsymbol{W}^{T} \boldsymbol{\theta}_{\boldsymbol{\xi}} \\
&= \left(\boldsymbol{I} - \boldsymbol{W} \left(\boldsymbol{W}^{T}\boldsymbol{W}\right)^{-1}\boldsymbol{W}^{T}\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \\
\end{align}
$$
Finally, we define the influence of $\xi_j$ on
$CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\boldsymbol{\xi}}}$ as
$$ w_j = \left\lVert \boldsymbol{d} \right\rVert_{2}^{2}. $$
The influence of knots on the spline used in the above section.
```{r fig.width = 7, fig.height = 7}
x <- influence_of_iknots(cp0)
summary(x)
```
Let's look at the following plots to explore the influence of $\xi_{7}$ (the
third interior knot in a $k=4$ order spline) on the spline. In panel (a)
we see the original control polygon and spline along with the coarsened
control polygon and spline. Note that the there are fewer control points, and
the spline deviates from the original. In panel (b) we see that the restored
control polygon is the result of inserting $\xi_7$ into the coarsened control
polygon of panel (a). These plots are also a good example of the local
support and strong convexity of the control polygons as there are only $k + 1 = 5$ control points which are
impacted by the removal and re-insertion of $\xi_7.$ Lastly, in panel (c) we
see all three control polygons plotted together.
```{r fig.width = 7, fig.height = 7}
ggpubr::ggarrange(
ggpubr::ggarrange(
plot(x, j = 3, coarsened = TRUE, restored = FALSE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "none")
, plot(x, j = 3, coarsened = FALSE, restored = TRUE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "none")
, labels = c("(a)", "(b)")
, nrow = 1
)
, plot(x, j = 3, coarsened = TRUE, restored = TRUE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "bottom")
, labels = c("", "(c)")
, nrow = 2
, ncol = 1
, heights = c(1, 2)
)
```
Next, consider the influence of $\xi_8,$ the fourth interior knot. By the
influence metric defined above, this is the least influential knot in the
sequence. This can be seen easily as the spline between the original and the
coarsened spline are very similar, this is despite the apparent large
difference in the magnitude of the control point ordinates between the
original and coarsened control polygons. However, when re-inserting $\xi_8$
the recovered control polygon is very similar to the original, hence the low
influence of $\xi_8.$
```{r fig.width = 7, fig.height = 7}
ggpubr::ggarrange(
ggpubr::ggarrange(
plot(x, j = 4, coarsened = TRUE, restored = FALSE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "none")
, plot(x, j = 4, coarsened = FALSE, restored = TRUE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "none")
, labels = c("(a)", "(b)")
, nrow = 1
)
, plot(x, j = 4, coarsened = TRUE, restored = TRUE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "bottom")
, labels = c("", "(c)")
, nrow = 2
, ncol = 1
, heights = c(1, 2)
)
```
If you were required to omit an internal knot, it would be preferable to omit
$\xi_8$ over $\xi_5, \xi_6, \xi_7,$ or $\xi_9$ as that will have the least
impact on the spline approximation of the original functional form.
# Fitting B-splines to noisy data
Start with the spline we have been using and add some noise to it.
```{r fig.width = 7, fig.height = 4}
set.seed(42)
x <- seq(0, 5.99999, length.out = 100)
bmat <- bsplines(x, iknots = c(1, 1.5, 2.3, 4, 4.5), bknots = c(0, 6))
theta <- matrix(c(1, 0, 3.5, 4.2, 3.7, -0.5, -0.7, 2, 1.5), ncol = 1)
DF <- data.frame(x = x, truth = as.numeric(bmat %*% theta))
DF$y <- as.numeric(bmat %*% theta + rnorm(nrow(bmat), sd = 0.3))
original_data_ggplot_layers <-
list(
ggplot2::geom_point(data = DF
, mapping = ggplot2::aes(x = x, y = y)
, inherit.aes = FALSE
, color = "#6F263D"
, alpha = 0.2)
,
ggplot2::geom_line(data = DF
, mapping = ggplot2::aes(x = x, y = truth)
, inherit.aes = FALSE
, color = "#6F263D"
, alpha = 0.6)
)
ggplot2::ggplot(DF) + ggplot2::theme_bw() + original_data_ggplot_layers
```
To fit a spline and control polygon to the noisy data use a formula statement
in the
`r qwraps2::backtick(cp) `
call. In this example we will use the known internal knots and add one
extra.
```{r fig.width = 7, fig.height = 4}
initial_cp <-
cp(y ~ bsplines(x, iknots = c(1, 1.5, 2.3, 3.0, 4, 4.5), bknots = c(0, 6))
, data = DF
, keep_fit = TRUE # default is FALSE
)
plot(initial_cp, show_spline = TRUE) + original_data_ggplot_layers
```
The plot above shows the fitted spline is does well at approximating the true
spline function. Just to make it perfectly clear, the regression
coefficients are the estimates of the ordinates for the control polygon:
```{r}
initial_cp$fit |> coef() |> unname()
initial_cp$cp$theta
```
Let's now look at the influence of the internal knots on the fit
```{r}
initial_cp |>
influence_of_iknots() |>
summary()
```
The least influential knot is $\xi_8 = 3.0,$ the extra knot inserted. Good,
we this is the expected result.
How would someone determine if the influence was significant? That is, how
can we test the null hypothesis $$H_0: w_{j} = 0$$
Under the null hypothesis that the knot has zero influence and under standard
ordinary least squares regression assumptions the regression coefficients,
(the ordinates of the control polygon are the regression coefficients), are
realizations of a Gaussian random variable. Then
$$
w_j =
\left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right)^{T}
\left[
\left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right)
\boldsymbol{\Sigma}
\left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right)^{T}
\right]^{+}
\left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right)
$$
where $\boldsymbol{H} = W (W^{T}W)^{-1} W^{T},$ $\boldsymbol{\Sigma}$ is
the variance-covariance matrix for the regression coefficients, and $+$
denotes the Moore-Penrose inverse of the matrix. By construction,
$\left(\boldsymbol{I} - \boldsymbol{H}\right) \boldsymbol{\Sigma} \left(\boldsymbol{I} - \boldsymbol{H}\right)^{T}$
is singular and thus the standard inverse does not exist and a generalized
inverse is necessary. This yields the test statistic following an F
distribution with 1, and $\nu_2$ degrees of freedom,
$$ w_j \sim F_{1,\nu_2}.$$
The second degree of
freedom is dependent on the sample size, the number of regression
parameters, and the type of regression used to estimate the ordinates.
To simplify the work, and generalize the approach, we will use the fact that
the limiting distribution of $F_{1, \nu_2}$ as $\nu_2 \rightarrow \infty$ is
$\chi_{1}^{2},$ that is,
$$ w_j \sim \chi_{1}^{2}.$$
Now, if we are interested in removing the knot with the lowest influence we
are interested in the minimum. So the hypothesis test we are actually
interested in is
$$ H_0: w_{(1)} = 0 $$ which follows the distribution
$$ \Pr \left[ w_{(1)} > w \right] = \sum_{j = 1}^{k+l} \binom{n+l}{j}
\left(F_{\chi_{1}^{2}}\left(w\right)\right)^{j} \left( 1 -
F_{\chi_{1}^{2}}\left(w\right) \right)^{k+l-j}$$
where $F_{\chi_{1}^{2}}\left(w\right)$ is the distribution function of the
chi-square distribution with one degree of freedom.
The results generated by calling
`r qwraps2::backtick(influence_of_iknots) %s% "." `
report two sets of p-values. The first is the p-value is the probability of
observed chisq value greater than reported, and the second p-value is the
probability of the rank order statistic exceeding the observed value.
```{r results = "asis"}
initial_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
```
It is worth remembering how fraught binary classification of statistical
(non)significance can be. Just because the p-value is low does not mean that
the knot is influential, just as a high p-value dose not mean that the knot
is not influential. Sample size, over-fitting, and other factors can/will
lead to poor selection of a model if you only consider these p-values.
That said, consider $\xi_9 = 4.0$ which has the lowest influence weight. Let's
omit that knot and refit the model.
```{r results = "asis"}
first_reduction_cp <-
cp(y ~ bsplines(x, iknots = c(1, 1.5, 2.3, 3, 4.5), bknots = c(0, 6)), data = DF)
first_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
```
After omitting one knot and refitting the model we see that $\xi = 2.3$ is
the least influential. Just for fun, let's omit that knot, and refit. Let's
continue that process all the way down to zero knots.
```{r results = "asis"}
second_reduction_cp <-
cp(y ~ bsplines(x, iknots = c(1, 1.5, 3, 4.5), bknots = c(0, 6)), data = DF)
second_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
```
The least influential knot in the second reduction is $\xi = 1.5$ and that
will be omitted for the third reduction.
```{r results = "asis"}
third_reduction_cp <-
cp(y ~ bsplines(x, iknots = c(1, 3, 4.5), bknots = c(0, 6)), data = DF)
third_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
```
Within the third reduction, the least influential knot is $\xi = 3.0$ and
that knot will be omitted for the fourth reduction.
```{r results = "asis"}
fourth_reduction_cp <-
cp(y ~ bsplines(x, iknots = c(1, 4.5), bknots = c(0, 6)), data = DF)
fourth_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
```
Of the two remaining internal knots, $\xi - 1.0$ is the least influential and
will be omitted for the fifth reduction.
```{r results = "asis"}
fifth_reduction_cp <-
cp(y ~ bsplines(x, iknots = 4.5, bknots = c(0, 6)), data = DF)
fifth_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
```
Only one knot can be omitted from the fifth to the sixth reduction. Having
the sixth reduction, where there are zero internal knots, lets us compare
model fits to a model with just a $k=4$ order polynomial.
```{r}
sixth_reduction_cp <-
cp(y ~ bsplines(x, bknots = c(0, 6)), data = DF)
sixth_reduction_cp |>
influence_of_iknots() |>
summary()
```
Let's compare all the fits. We will start by looking at the control polygons
and splines.
```{r fig.height = 4, fig.width = 7, warning = FALSE, echo = FALSE}
ggpubr::ggarrange(
plot( initial_cp , first_reduction_cp , second_reduction_cp , third_reduction_cp
, fourth_reduction_cp , fifth_reduction_cp , sixth_reduction_cp
, show_spline = FALSE , show_cp = TRUE , color = TRUE
)
,
plot( initial_cp , first_reduction_cp , second_reduction_cp , third_reduction_cp
, fourth_reduction_cp , fifth_reduction_cp , sixth_reduction_cp
, show_spline = TRUE , show_cp = FALSE , color = TRUE
)
, labels = c("(a)", "(b)")
, common.legend = TRUE
)
```
The control polygons, panel (a), we see that the sixth, fifth, and fourth reductions are
different from the initial, first, second, and third reductions, and for the
splines, the fifth and sixth reductions are easily identified as different.
Next, the graphic below will let us compare these models to the truth, and
the observed data.
```{r fig.width = 7, fig.height = 4, echo = FALSE}
ggpubr::ggarrange(
plot(initial_cp, show_spline = TRUE) +
ggplot2::ggtitle("Initial CP") +
ggplot2::coord_cartesian(ylim = c(-1, 5)) +
original_data_ggplot_layers
, plot(first_reduction_cp, show_spline = TRUE) +
ggplot2::ggtitle("First Reduction") +
ggplot2::coord_cartesian(ylim = c(-1, 5)) +
original_data_ggplot_layers
, plot(second_reduction_cp, show_spline = TRUE) +
ggplot2::ggtitle("Second Reduction") +
ggplot2::coord_cartesian(ylim = c(-1, 5)) +
original_data_ggplot_layers
, plot(third_reduction_cp, show_spline = TRUE) +
ggplot2::ggtitle("Third Reduction") +
ggplot2::coord_cartesian(ylim = c(-1, 5)) +
original_data_ggplot_layers
, plot(fourth_reduction_cp, show_spline = TRUE) +
ggplot2::ggtitle("Fourth Reduction") +
ggplot2::coord_cartesian(ylim = c(-1, 5)) +
original_data_ggplot_layers
, plot(fifth_reduction_cp, show_spline = TRUE) +
ggplot2::ggtitle("Fifth Reduction") +
ggplot2::coord_cartesian(ylim = c(-1, 5)) +
original_data_ggplot_layers
, common.legend = TRUE
)
```
The dashed black line is the spline fitted to the data (light burgundy dots) and
the true value of the target function is the burgundy line. In the fifth
reduction there is an easily noticeable difference between the fitted spline
and the target function. Between the initial control polygon and the first
three reductions it is difficult to visually discern any meaningful difference
between the fits.
Thus, I would argue that the third reduction is the preferable model as it
has the fewest degrees of freedom while providing a good quality of fit.
This conclusion is supported by looking at the residual standard error (rse)
$$ rse = \sqrt{ \frac{1}{df} \sum_{i=1}^{n} \left(y_i - f\left(x_i\right)\right)^2 },$$
where the degrees of freedom, $df,$ is the sample size $n$ minus the number
of regression parameters.
```{r results = 'asis'}
list( initial_cp , first_reduction_cp , second_reduction_cp , third_reduction_cp
, fourth_reduction_cp , fifth_reduction_cp , sixth_reduction_cp) |>
rev() |>
lapply(summary) |>
do.call(what = rbind, args = _) |>
cbind(data.frame(reduction = seq(6, 0, by = -1))) |>
knitr::kable(row.names = TRUE)
```
The
`r backtick(wiggle) `
is one measure of wiggliness defined as
$$
\int_{\min\left(\boldsymbol{\xi}\right)}^{\max\left(\boldsymbol{\xi}\right)}
\left( \frac{d^2}{dx^2} f\left(x\right) \right)^2 dx.
$$
`r backtick(fdsc) `
reports the number of times the first derivative has a sign change.
# Control Polygon Reduction
The exercise above of manually identifying and omitting the knot with the
smallest influence in each model would be tedious when working with a large
set of initial knots. Fortunately, the process has been automated. Calling
`r qwraps2::backtick(cpr) `
on a
`r qwraps2::backtick(cpr_cp) `
object defined by a function will automatically omit the internal knot with
the lowest influence.
## Example with known knots
Apply CPR to the
`r qwraps2::backtick(initial_cp) `
from the above example.
```{r results = 'hide', message = FALSE}
cpr0 <- cpr(initial_cp)
```
```{r}
cpr0
```
There are
`r length(cpr0) `
control polygons within the
`r backtick(cpr_cpr) `
object,
`r backtick(length(initial_cp$iknots) + 1) %s% "." `
The indexing is set such at that the $i^{th}$ element has $i-1$ internal
knots.
Before exploring the results, let's just verify that the results of the call
to
`r backtick(cpr) `
are the same as the manual results found about.
There are some differences in the metadata of the objects, but the important
parts, like the control polygons, are the same.
```{r}
all.equal( cpr0[["cps"]][[7]][["cp"]], initial_cp[["cp"]])
# some attributes are different with the last cp due to how the automation
# creates the call vs how the call was created manually.
call_idx <- which(names(cpr0[["cps"]][[6]]) == "call")
all.equal( cpr0[["cps"]][[6]][-call_idx], first_reduction_cp [-call_idx])
all.equal( cpr0[["cps"]][[5]][-call_idx], second_reduction_cp[-call_idx])
all.equal( cpr0[["cps"]][[4]][-call_idx], third_reduction_cp [-call_idx])
all.equal( cpr0[["cps"]][[3]][-call_idx], fourth_reduction_cp[-call_idx])
all.equal( cpr0[["cps"]][[2]][-call_idx], fifth_reduction_cp [-call_idx])
all.equal( cpr0[["cps"]][[1]][-call_idx], sixth_reduction_cp [-call_idx], check.attributes = FALSE)
```
In the manual process we identified
`r backtick(third_reduction_cp) `
as the preferable model. For the
`r backtick(cpr0) `
object we can quickly see a similar result as we did for the manual process.
```{r result = "asis"}
s0 <- summary(cpr0)
knitr::kable(s0, row.names = TRUE)
```
The additional columns in this summary,
`r backtick("loglik_elbow", dequote = TRUE) %s% " and " %s% backtick(rse_elbow) %s% "," `
indicate a location in the plot for either the loglik or rse by model
index (degrees of freedom) where the trade-off between additional degrees of
freedom and improvement in the fix statistic is negligible. See plot below.
This is determined by finding the breakpoint such that a continuous, but not
differentiable at breakpoint, quadratic spline fits the plot with minimal
residual standard error.
```{r fig.height = 7, fig.width = 7}
ggpubr::ggarrange(
ggpubr::ggarrange(
plot(cpr0, color = TRUE)
, plot(cpr0, show_cp = FALSE, show_spline = TRUE, color = TRUE)
, ncol = 1
, common.legend = TRUE
)
,
ggpubr::ggarrange(
plot(s0, type = "rse")
, plot(s0, type = "rss")
, plot(s0, type = "loglik")
, plot(s0, type = "wiggle")
, plot(s0, type = "fdsc")
, plot(s0, type = "Pr(>w_(1))")
, common.legend = TRUE
)
, widths = c(2, 3)
)
```
## Example when knots are unknown
In practice it is be extremely unlikely to know where knots should be placed.
Analytic solutions are difficult, if not impossible to derive
[@jupp1978approximation].
However, an optimal solution may not be necessary.
From @deboor2001 (page 106)
"...a B-spline doesn't change much if one changes its $k+1$ knots a little
bit. Therefore, if one has multiple knots, then it is very easy to find
B-spline almost like with simple knots: Simply replace knot of multiplicity
$r > 1$ by $r$ simple knots nearby."
That is,
$$
\boldsymbol{B}_{k, \boldsymbol{\xi}}\left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \approx
\boldsymbol{B}_{k, \boldsymbol{\xi'}}\left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi'}}
$$
where
$$
\left\lvert \xi_j - \xi_j' \right\rvert < \delta \quad \text{for small} \quad \delta > 0.
$$
So, in the case when we are looking for a good set of knots (a good set of
knots should be parsimonious, and provide a good model fit) we start with an
initial knot sequence with a high cardinality, this is not without precedent
[@eilers1996,@eilers2010].
We then apply the CPR algorithm to find a good set of knots.
For example we will use 50 internal knots. Not surprisingly we have a fit
that is more "connect-the-dots" than a smooth fit.
```{r fig.width = 7, fig.height = 4}
initial_cp <- cp(y ~ bsplines(x, df = 54, bknots = c(0, 6)), data = DF)
ggpubr::ggarrange(
plot(initial_cp, show_cp = TRUE, show_spline = FALSE) + original_data_ggplot_layers
,
plot(initial_cp, show_cp = FALSE, show_spline = TRUE) + original_data_ggplot_layers
)
```
Apply CPR to the
`r backtick(initial_cp) `
and look at the summary. Only the first 10 of 51 rows are provided here.
```{r results = "hide", message = FALSE}
cpr1 <- cpr(initial_cp)
```
```{r}
x <- summary(cpr1)
knitr::kable(head(x, 10))
```
From this, the preferable model is suggested to be index 5, the model with
four internal knots. Inspection of the rse by index plot, I would argue from
a manual selection that index 5 is preferable overall.
```{r}
plot(x, type = "rse")
```
Let's compare the models in indices 3, 4, and 5.
```{r fig.width = 7, fig.height = 4}
ggpubr::ggarrange(
plot(cpr1[[3]], cpr1[[4]], cpr1[[5]], show_cp = TRUE, show_spline = FALSE, color = TRUE)
,
plot(cpr1[[3]], cpr1[[4]], cpr1[[5]], show_cp = FALSE, show_spline = TRUE, color = TRUE)
,
common.legend = TRUE
)
```
The noticeable differences are, for the most part, located on the left side
of the support between between 0 and 1. For the fifth index, there is change
in convexity of the spline. Knowing that the models at index 3 and 4 are good
fits too, then it would be easy not select index 5 due to extra "wiggle" in
the spline.
In practice I would likely pick the model in index 4 to have a smooth (small
wiggle) and low rse. Compare the selected model to the original data.
```{r fig.width = 7, fig.height = 4}
ggpubr::ggarrange(
plot(cpr1[[3]], show_cp = FALSE, show_spline = TRUE) +
ggplot2::ggtitle("Model Index 3") +
original_data_ggplot_layers
, plot(cpr1[[4]], show_cp = FALSE, show_spline = TRUE) +
ggplot2::ggtitle("Model Index 4") +
original_data_ggplot_layers
, plot(cpr1[[5]], show_cp = FALSE, show_spline = TRUE) +
ggplot2::ggtitle("Model Index 5") +
original_data_ggplot_layers
, nrow = 1
, legend = "none"
)
```
# Extensions to higher dimensions
CPR works for uni-variable B-splines. By taking tensor products of
B-splines, and building a control-net, the higher-dimensional analog of a
control polygon, we can apply similar methods to estimate a surface.
Details on the Control Net Reduction method are presented in
```{r echo = TRUE, eval = FALSE}
vignette(topic = "cnr", package = "cpr")
```
# References