This vignette assumes some familiarity on the part of the reader with principal component analysis1, canonical variate analysis2, and/or linear discriminant analysis3. For a more thorough treatment, see the appropriate references. Here we will show a brief mathematical motivation for reduced-rank regression and then show that principal component analysis, canonical variate analysis, and linear discriminant analysis are special cases of reduced-rank regression. For a more thorough treatment of reduced-rank regression and its special cases, see Modern Multivariate Statistical Techniques.4 The mathematical formulation given below is a treatment of the theory to be minimally sufficient for anyone in need of introduction to reduced-rank regression to use in practice and is not intended to be a rigorous take on the concept. There are other packages that carry out principal component analysis, canonical variate analysis, and linear discrimnant analysis. As such, this package would prove most useful to readers of the above text, as it adopts the same hierarchical framework of reduced-rank regression.
##
## Attaching package: 'rrr'
## The following object is masked from 'package:stats':
##
## residuals
Let X = (X1, X2, …, Xr)τ and Y = (Y1, Y2, …, Ys)τ be jointly distributed random vectors with
$$ \mathrm{E}\left\{ \begin{pmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{pmatrix} \right\} = \begin{pmatrix} \boldsymbol{\mu}_X \\ \boldsymbol{\mu}_Y \\ \end{pmatrix}, \quad \mathrm{cov}\left\{ \begin{pmatrix} \mathbf{X} \\ \mathbf{Y} \\ \end{pmatrix} \right\} = \begin{pmatrix} \mathbf{\Sigma}_{XX} & \mathbf{\Sigma}_{XY} \\ \mathbf{\Sigma}_{YX} & \mathbf{\Sigma}_{YY} \\ \end{pmatrix} $$
The classical multivariate regression model is given by
$$ \overset{s \times 1}{\mathbf{Y}} = \overset{s \times 1}{\boldsymbol{\mu}} + \overset{s \times r}{\mathbf{C}} \; \overset{r \times 1}{\mathbf{X}} + \overset{s \times 1}{\varepsilon} $$
with
E(ε) = 0, cov(ε) = Σεε
and ε distributed independently of X.
To estimate μ and C we minimize the least-squares criterion
E[(Y − μ − CX)(Y − μ − CX)τ],
with expecation taken over the joint distribution of (Xτ, Yτ), with the assumption that ΣXX is nonsingular, and therefore invertible.
This is minimized when
$$ \begin{aligned} \boldsymbol{\mu} & = \boldsymbol{\mu}_Y - \mathbf{C} \boldsymbol{\mu}_X \\ \mathbf{C} & = \mathbf{\Sigma}_{YX} \mathbf{\Sigma}_{XX}^{-1} \end{aligned} $$
The least-squares estimator of C is given by
$$ \hat{\mathbf{C}} = \hat{\mathbf{\Sigma}}_{YX} \hat{\mathbf{\Sigma}}_{XX}^{-1} $$
Note that C – and hence $\hat{\mathbf{C}}$ – contains no term that takes into the account the correlation of the Yis. This is a surprising result, since we would expect correlation among the responses.
In other words, to find the least-squares estimate $\hat{\mathbf{C}}$ of C, one need only regress X separately on each Yi and concatenate those multiple-regression coefficient vectors into a matrix to construct the estimated coefficient matrix $\hat{\mathbf{C}}$.
In some very important sense, the classical multivariate regression model is not truly multivariate.
tobacco
Data Set##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: `as_data_frame()` was deprecated in tibble 2.0.0.
## ℹ Please use `as_tibble()` (with slightly different semantics) to convert to a
## tibble, or `as.data.frame()` to convert to a data frame.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
We see that the tobacco
data set5 has 9 variables and 25
observations. There are 6 Xi predictor
variables – representing the percentages of nitrogen, chlorine,
potassium, phosphorus, calcium, and magnesium, respectively – and 3
Yj
response variables – representing cigarette burn rates in inches per
1,000 seconds, percent sugar in the leaf, and percent nicotine in the
leaf, respectively.
Below we see that there is not only correlation among the Xis but also among the Yis. The classical multivariate model will not capture that information.
We can get a good idea of the correlation structure using
GGally::ggcorr
. GGally is a package that extends the
functionality of the package ggplot2
and has been utilized
in rrr
to create pairwise plots, as seen below.
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
We can see correlation between the Xis, which will be accounted for in the classical multivariate regression model.
There is clearly correlation in the Yis, especially between percent nicotine and percent sugar. Any regression model that we fit should take this into account.
## multivariate regression
x <- as.matrix(tobacco_x)
y <- as.matrix(tobacco_y)
multivar_reg <- t(cov(y, x) %*% solve(cov(x)))
## separate multiple regression
lm1 <- lm(y[,1] ~ x)$coeff
lm2 <- lm(y[,2] ~ x)$coeff
lm3 <- lm(y[,3] ~ x)$coeff
As expected, the multivariate coefficients are the same as the multiple regression coefficients of each of the Yis
## Y1.BurnRate Y2.PercentSugar Y3.PercentNicotine
## X1.PercentNitrogen 0.06197282 -4.3186662 0.5521620
## X2.PercentChlorine -0.16012848 1.3262863 -0.2785609
## X3.PercentPotassium 0.29211810 1.5899470 0.2175877
## X4.PercentPhosphorus -0.65798016 13.9526510 -0.7231067
## X5.PercentCalcium 0.17302593 0.5525913 0.3230914
## X6.PercentMagnesium -0.42834825 -3.5021083 2.0048603
## lm1 lm2 lm3
## (Intercept) 1.41113730 13.6329133 -1.5648236
## xX1.PercentNitrogen 0.06197282 -4.3186662 0.5521620
## xX2.PercentChlorine -0.16012848 1.3262863 -0.2785609
## xX3.PercentPotassium 0.29211810 1.5899470 0.2175877
## xX4.PercentPhosphorus -0.65798016 13.9526510 -0.7231067
## xX5.PercentCalcium 0.17302593 0.5525913 0.3230914
## xX6.PercentMagnesium -0.42834825 -3.5021083 2.0048603
One way to introduce a multivariate component into the model is to allow for the possibility that C is deficient, or of reduced-rank t.
rank(C) = t ≤ min(r, s)
In other words, we allow for the possibility that there are unknown linear constraints on C.
Without loss of generality, we consider the case when r > s, i.e., t < s.
When t = s, the regression model is full-rank, and can be fit using multiple regression on each Yi ∈ Y as seen above. When t < s, C can be decomposed into non-unique matrices As × t and Bt × r, such that C = AB, and the multivariate regression model is given by
$$ \overset{s \times 1}{\mathbf{Y}} = \overset{s \times 1}{\boldsymbol{\mu}} + \overset{s \times t}{\mathbf{A}} \; \overset{t \times r}{\mathbf{B}} \; \overset{r \times 1}{\mathbf{X}} + \overset{s \times 1}{\varepsilon} $$
Estimating μ, A, B, and the reduced-rank regression coefficient C(t), is done by minimizing the weighted sum-of-squares criterion
E[(Y − μ − ABX)τΓ(Y − μ − ABX)]
where Γ is a positive-definite symmetric (s × s)-matrix of weights. This expectation is taken over the joint distribution (Xτ, Yτ)τ. The weighted sum-of-squares criterion is minimized when
$$ \begin{aligned} \boldsymbol{\mu}^{\left(t\right)} & = \boldsymbol{\mu}_Y - \mathbf{A}^{\left(t\right)}\mathbf{B}^{\left(t\right)}\boldsymbol{\mu}_X \\ \mathbf{A}^{\left(t\right)} & = \mathbf{\Gamma}^{-1/2}\mathbf{V}_t \\ \mathbf{B}^{\left(t\right)} & = \mathbf{V}_t^\tau \boldsymbol{\Gamma}^{-1/2}\mathbf{\Sigma}_{YX}\mathbf{\Sigma}_{XX}^{-1} \\ \end{aligned} $$
where Vt = (v1, …, vt) is an (s × t)-matrix, with vj the eigenvector associated with the jth largest eigenvalue of
Γ1/2ΣYXΣXX−1ΣXYΓ1/2
We try out different values of Γ in applications. Two popular choices – and ones that lead to interesting results as we will see – are Γ = Ir and Γ = ΣYY−1. The following is equivalent to performing canonical variate analysis.
Since the reduced-rank regression coefficient relies on inverting ΣXX and, possibly, ΣYY, we want to take into consideration the cases when ΣXX, ΣYY are singular or difficult to invert.
We can perturb the diagonal of the covariance matrices by some small constant, k. This will ensure that the covariance matrix is invertible by – only slightly – altering the data. The motivation for is taken from ridge regression6 in the multiple-regression context, and from the idea of softly-shrunken reduced-rank regression.7 Thus, we carry out the reduced-rank regression procedure using
$$ \begin{aligned} \hat{\boldsymbol{\Sigma}}_{XX}^{\left(k\right)} & = \hat{\boldsymbol{\Sigma}}_{XX} + k \mathbf{I}_r \\ \hat{\boldsymbol{\Sigma}}_{YY}^{\left(k\right)} & = \hat{\boldsymbol{\Sigma}}_{YY} + k \mathbf{I}_r \end{aligned} $$
rank_trace()
.## function (x, y, type = "identity", k = 0, plot = TRUE, interactive = FALSE)
## NULL
$\hat{\mathbf{C}}$ is calculated using sample observations. Therefore its mathematical rank will always be full, but it will have a statistical rank t which is an unknown hyperparameter that needs to be estimated.
One method of estimating t is to plot the rank trace. Along the X-axis, we plot a measure of the difference between the rank-t coefficient matrix and the full-rank coefficient matrix for each value of t. Along the Y-axis, we plot the reduction in residual covariance between the rank-t residuals and the full-rank residuals for each value of t.
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## ℹ Please use `tibble()` instead.
## ℹ The deprecated feature was likely used in the rrr package.
## Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Set plot = FALSE
to print data frame of rank trace
coordinates.
## # A tibble: 4 × 3
## ranks dC dEE
## <int> <dbl> <dbl>
## 1 0 1 1
## 2 1 0.202 0.0119
## 3 2 0.0842 0.00310
## 4 3 0 0
When the weight matrix, Γ, takes on a more
complicated form, the rank trace may plot points outside the unit
square, or may not be a smooth monotic curve. When this is the case, we
can change the value of k
to smooth the rank trace. This
value of k is then used as the
ridge perturbation, k,
described above.
The main function in the rrr
package is
rrr()
which fits a reduced-rank regression model and
outputs the coefficients.
rrr()
## function (x, y, type = "identity", rank = "full", k = 0)
## NULL
rrr()
takes as inputs the data frames, or matrices, of
input and response variables, the type of reduced-rank regression
procedure to perform, the rank (defaulted to full rank), and the ridge
constant k. The argument
type
defaults to type = "identity"
, which sets
Γ = I
but can be set to type = "pca"
, type = "cva"
,
or type = "lda"
to perform principal component analysis,
canonical variate analysis, or linear discriminant analysis,
respectively
rrr()
outputs the appropriate coefficients depending on
the type of reduced-rank regression performed.
## $mean
## [,1]
## Y1.BurnRate 1.411137
## Y2.PercentSugar 13.632913
## Y3.PercentNicotine -1.564824
##
## $A
## [,1] [,2] [,3]
## Y1.BurnRate 0.03107787 -0.4704307 0.8818895
## Y2.PercentSugar -0.97005030 0.1984637 0.1400521
## Y3.PercentNicotine 0.24090782 0.8598297 0.4501736
##
## $B
## X1.PercentNitrogen X2.PercentChlorine X3.PercentPotassium
## [1,] 4.3242696 -1.35864835 -1.4808316
## [2,] -0.4114869 0.09903401 0.3652138
## [3,] -0.3016163 -0.08086722 0.5782436
## X4.PercentPhosphorus X5.PercentCalcium X6.PercentMagnesium
## [1,] -13.729424 -0.4528289 3.86689562
## [2,] 2.456879 0.3060762 1.23030547
## [3,] 1.048309 0.3754285 0.03430168
##
## $C
## X1.PercentNitrogen X2.PercentChlorine X3.PercentPotassium
## Y1.BurnRate 0.06197282 -0.1601285 0.2921181
## Y2.PercentSugar -4.31866620 1.3262863 1.5899470
## Y3.PercentNicotine 0.55216201 -0.2785609 0.2175877
## X4.PercentPhosphorus X5.PercentCalcium X6.PercentMagnesium
## Y1.BurnRate -0.6579802 0.1730259 -0.4283482
## Y2.PercentSugar 13.9526510 0.5525913 -3.5021083
## Y3.PercentNicotine -0.7231067 0.3230914 2.0048603
##
## $eigenvalues
## [1] 3.28209974 0.03782978 0.01015996
We can see that rrr()
with rank = "full"
and k = 0
returns the classical multivariate regression
coefficients as above. They differ only by a transpose, and is presented
this way in rrr
as a matter of convention. It is this form
that is presented in the literature.8
residuals()
## function (x, y, type = "identity", rank = "full", k = 0, plot = TRUE)
## NULL
We can visually check the model assumptions with
residuals()
. The leftmost column of the scatter plot can be
used to look for serial patterns in the residuals. The diagonal can be
used to look at the distribution and visually assess whether or not it
is symmetric, has a mean of zero, etc.
To print a data frame of the residuals, set
plot = FALSE
.
## # A tibble: 25 × 3
## Y1.BurnRate Y2.PercentSugar Y3.PercentNicotine
## <dbl> <dbl> <dbl>
## 1 -0.154 3.92 -0.902
## 2 -0.210 0.698 -0.697
## 3 -0.0730 3.35 -0.950
## 4 -0.201 2.98 -0.199
## 5 -0.124 1.64 -0.362
## 6 -0.0495 0.320 -1.24
## 7 -0.0419 2.08 -0.339
## 8 -0.145 2.75 -0.190
## 9 -0.0979 1.53 -0.588
## 10 -0.355 2.61 -0.333
## # ℹ 15 more rows
Set
$$ \begin{aligned} \mathbf{Y} & \equiv \mathbf{X} \\ \mathbf{\Gamma} & = \mathbf{I}_r \end{aligned} $$
Then, the least squares criterion
E[(X − μ − ABX)(X − μ − ABX)τ]
is minimized when
$$ \begin{aligned} \mathbf{A}^{\left(t\right)} & = \left(\mathbf{v}_1, \dots, \mathbf{v}_t\right) \\ \mathbf{B}^{\left(t\right)} & = \mathbf{A}^{\left(t\right) \tau} \\ \boldsymbol{\mu}^{\left(t\right)} & = \left(\mathbf{I}_r - \mathbf{A}^{\left(t\right)}\mathbf{B}^{\left(t\right)}\right)\boldsymbol{\mu}_X \\ \end{aligned} $$
where vj = vj(ΣXX) is the eigenvector associated with the jth largest eigenvalue of ΣXX.
The best reduced-rank approximation to the original X is
$$ \begin{aligned} \hat{\mathbf{X}}^{\left(t\right)} & = \boldsymbol{\mu}^{\left(t\right)} + \mathbf{A}^{\left(t\right)}\mathbf{B}^{\left(t\right)} \mathbf{X} \\ & \mathrm{or} \\ \hat{\mathbf{X}} & = \mathbf{A}^{\left(t\right)}\mathbf{B}^{\left(t\right)}\mathbf{X}_c \\ \end{aligned} $$
where Xc is the vector X after mean-centering.
pendigits
Data SetForty-four writers hand-wrote the digits 0-9 250 times in random order on a in 5 500x500 pixel boxes on a pressure-sensitive tablet with integrated LCD screen. The first 10 digits were thrown out – without telling the writers – to ignore variation from the writers gaining familiarity with the device.9 The raw data of the coordinates was cleaned and translated.
We can get a good visualization of the correlation structure using
GGally::ggcorr
. Below we see that there is very heavy
correlation among the variables.
The ratio
$$ \frac{\lambda_{t + 1} + \cdots \lambda_r}{\lambda_1 + \cdots \lambda_r} $$
is a goodness-of-fit measure of how well the last r − t principal components explain the totoal variation in X
The function rrr()
(see below) outputs this
goodness-of-fit measure
## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
## `.name_repair` is omitted as of tibble 2.0.0.
## ℹ Using compatibility `.name_repair`.
## ℹ The deprecated feature was likely used in the tibble package.
## Please report the issue at <https://github.com/tidyverse/tibble/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## PC1 PC2 PC3 PC4 PC5
## 7.168777e-01 4.680599e-01 3.144671e-01 2.243293e-01 1.664009e-01
## PC6 PC7 PC8 PC9 PC10
## 1.181315e-01 8.737814e-02 6.065970e-02 4.138801e-02 2.763994e-02
## PC11 PC12 PC13 PC14 PC15
## 1.896530e-02 1.222037e-02 7.761240e-03 3.816942e-03 1.958674e-03
## PC16 PC17 PC18 PC19 PC20
## 3.133204e-04 1.279543e-04 4.290656e-17 2.262607e-17 3.834998e-18
## PC21 PC22 PC23 PC24 PC25
## -1.182099e-17 -2.301542e-17 -3.065407e-17 -3.661062e-17 -4.133512e-17
## PC26 PC27 PC28 PC29 PC30
## -4.099315e-17 -4.034315e-17 -3.913493e-17 -3.545734e-17 -3.149124e-17
## PC31 PC32 PC33 PC34
## -2.610128e-17 -1.990436e-17 -1.032623e-17 0.000000e+00
rank_trace()
Print data frame of rank trace coordinates by setting
plot = FALSE
.
## # A tibble: 35 × 3
## rank delta_C delta_residuals
## <int> <dbl> <dbl>
## 1 0 1 1
## 2 1 0.985 0.717
## 3 2 0.970 0.468
## 4 3 0.955 0.314
## 5 4 0.939 0.224
## 6 5 0.924 0.166
## 7 6 0.907 0.118
## 8 7 0.891 0.0874
## 9 8 0.874 0.0607
## 10 9 0.857 0.0414
## # ℹ 25 more rows
pairwise_plot()
## function (x, y, type = "pca", pair_x = 1, pair_y = 2, rank = "full",
## k = 0, interactive = FALSE, point_size = 2.5)
## NULL
A common PCA method of visualization is to plot the jth sample PC scores against the kth PC scores,
$$ \begin{aligned} \left(\xi_{ij}, \xi_{ik}\right) & \\ = \left(\hat{\mathbf{v}}_j^\tau \mathbf{X}_i, \hat{\mathbf{v}}_k^\tau \mathbf{X}_i\right)&, \quad i = 1,2, \dots, n \end{aligned} $$
Since the first two principal components will capture the most
variance – and hence the most useful information – of all possible pairs
of principal components, we typically would set j = 1, k = 2 and plot the
first two sample PC scores against each other. In rrr
this
is the default.
We can set the x- and y-axes to whichever pairs of PC
scores we would like to plot by changing the pair_x
and
pair_y
arguments.
allpairs_plot()
## function (x, y, type = "pca", rank, k = 0)
## NULL
Alternatively, we can look at structure in the data by plotting all
PC pairs, along with some other visual diagnostics with
allpairs_plot()
. Along with plotting principal component
scores against each other, the plot matrix also shows histograms and box
plots to show how the points are distributed along principal component
axes.
rrr()
## $means
## V1 V2 V3 V4 V5 V6 V7 V8
## 38.814320 85.120269 40.605622 83.774199 49.770378 65.573144 51.220251 44.498999
## V9 V10 V11 V12 V13 V14 V15 V16
## 56.868541 33.695961 60.516376 34.826510 55.022289 34.937045 47.287482 28.845342
## V17 V18 V19 V20 V21 V22 V23 V24
## 4.431587 5.341794 38.814320 85.120269 40.605622 83.774199 49.770378 65.573144
## V25 V26 V27 V28 V29 V30 V31 V32
## 51.220251 44.498999 56.868541 33.695961 60.516376 34.826510 55.022289 34.937045
## V33 V34
## 47.287482 28.845342
##
## $C
## # A tibble: 34 × 34
## V1 V2 V3 V4 V5 V6 V7 V8
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.152 0.0420 0.0576 0.0328 -0.0831 -0.00510 -0.129 -0.0361
## 2 0.0420 0.0131 0.0106 0.00544 -0.0279 -0.0101 -0.0393 -0.0212
## 3 0.0576 0.0106 0.0571 0.0430 0.00514 0.0486 -0.0471 0.0255
## 4 0.0328 0.00544 0.0430 0.0348 0.0144 0.0412 -0.0307 0.0193
## 5 -0.0831 -0.0279 0.00514 0.0144 0.0836 0.0544 0.0700 0.0567
## 6 -0.00510 -0.0101 0.0486 0.0412 0.0544 0.0743 0.0120 0.0654
## 7 -0.129 -0.0393 -0.0471 -0.0307 0.0700 0.0120 0.125 0.0568
## 8 -0.0361 -0.0212 0.0255 0.0193 0.0567 0.0654 0.0568 0.0899
## 9 -0.0438 -0.0180 -0.0366 -0.0382 -0.00298 -0.0138 0.0765 0.0514
## 10 -0.0516 -0.0206 -0.0229 -0.0228 0.0200 0.00720 0.0746 0.0570
## # ℹ 24 more rows
## # ℹ 26 more variables: V9 <dbl>, V10 <dbl>, V11 <dbl>, V12 <dbl>, V13 <dbl>,
## # V14 <dbl>, V15 <dbl>, V16 <dbl>, V17 <dbl>, V18 <dbl>, V19 <dbl>,
## # V20 <dbl>, V21 <dbl>, V22 <dbl>, V23 <dbl>, V24 <dbl>, V25 <dbl>,
## # V26 <dbl>, V27 <dbl>, V28 <dbl>, V29 <dbl>, V30 <dbl>, V31 <dbl>,
## # V32 <dbl>, V33 <dbl>, V34 <dbl>
##
## $PC
## # A tibble: 34 × 3
## PC1 PC2 PC3
## <dbl> <dbl> <dbl>
## 1 0.0286 0.143 -0.362
## 2 0.0468 0.0335 -0.0990
## 3 -0.139 -0.0467 -0.189
## 4 -0.103 -0.0832 -0.132
## 5 -0.160 -0.197 0.139
## 6 -0.241 -0.118 -0.0513
## 7 -0.106 -0.0317 0.336
## 8 -0.290 0.00231 0.0778
## 9 -0.117 0.263 0.215
## 10 -0.144 0.151 0.191
## # ℹ 24 more rows
##
## $goodness_of_fit
## PC1 PC2 PC3
## 0.7168777 0.4680599 0.3144671
Canonical Variate Analysis10 is a method of linear dimensionality reduction that turns the original X and Y into new variables ξ and ω, respectively. Canonical variate analysis can be performed as a special case of reduced-rank regression.
Set Γ = ΣYY−1. Then, the t new pairs of canonical variables (ξi, ωi), i = 1, …, t are calculated by fitting a reduced-rank – rank t – regression equation. The canonical variate scores are given by
ξ(t) = G(t)X, ω(t) = H(t)Y,
with
$$ \begin{aligned} \mathbf{G}^{\left(t\right)} & = \mathbf{B}^{\left(t\right)} \\ \mathbf{H}^{\left(t\right)} & = \mathbf{A}^{\left(t\right)-} \\ \end{aligned} $$
where A(t), B(t) are the matrices from the reduced-rank regression formulation above.
Note that H(t) = A(t)− is the generalized inverse of A(t). When t = s, H(s) = A(t)+ is the unique Moore-Penrose generalized inverse of A(t).
COMBO17
Data Set### COMBO-17 galaxy data
data(COMBO17)
galaxy <- as_data_frame(COMBO17) %>%
select(-starts_with("e."), -Nr, -UFS:-IFD) %>%
na.omit()
This data set11 comes from a public catalogue of objects in the Chandra Deep Field South, an area of the sky. This subset of the catalogue is all the objects classified as “Galaxies”, with only observations that do not have any missing values.12
We can see above the heavy correlation among the Xis and among the Yis. This data set, therefore, makes for a good candidate to perform canonical variate analysis.
Estimate t and k with rank_trace()
Plot and print residuals with residuals()
, setting
type = "cva"
.
## # A tibble: 3,462 × 3
## index CV1 CV2
## <int> <dbl> <dbl>
## 1 1 -0.320 0.561
## 2 2 -0.337 -0.324
## 3 3 0.234 0.224
## 4 4 0.219 0.671
## 5 5 -0.718 1.53
## 6 6 -1.02 0.187
## 7 7 0.278 0.592
## 8 8 -0.124 0.689
## 9 9 0.254 0.270
## 10 10 -0.686 -0.169
## # ℹ 3,452 more rows
Plot canonical variate scores with pairwise_plot()
## `geom_smooth()` using formula = 'y ~ x'
Choose which pair of canonical variate scores to plot by changing the
argument pair_x
.
## `geom_smooth()` using formula = 'y ~ x'
Fit model with rrr()
, setting
type = "cva"
.
## $mean
## [,1]
## Rmag 28.523043
## ApDRmag 1.130048
## mumax 26.803027
## Mcz -1.194108
## MCzml -1.050103
## chi2red 2.088291
##
## $G
## UjMAG BjMAG VjMAG usMAG gsMAG rsMAG
## [1,] 0.1315681 0.003160348 0.3435075 0.1039602 0.06062877 -0.2369377
## [2,] -0.9238507 0.030925869 -2.0418261 1.2086276 0.38691864 1.9345899
## UbMAG BbMAG VnMAG S280MAG W420FE W462FE
## [1,] 0.1732176 -0.11817306 0.01467586 0.006356947 -0.4377527 -0.4795137
## [2,] -0.6174437 0.01606138 -0.07708084 0.039012047 -3.9767464 -2.3909886
## W485FD W518FE W571FS W604FE W646FD W696FE
## [1,] -0.4020248 -0.2363471 -0.1188914 0.04050373 0.1684356 0.1456478
## [2,] -1.3831963 0.6639545 -0.2649454 0.42582774 -0.1510068 -0.8822436
## W753FE W815FS W856FD W914FD W914FE
## [1,] 0.29307078 0.0239559 0.0266074 -0.2535550 -0.06707906
## [2,] 0.08978215 -1.7740911 0.2191533 0.5857208 0.77233519
##
## $H
## Rmag ApDRmag mumax Mcz MCzml chi2red
## [1,] 0.5596114 0.47045788 -0.1373677 -1.4375179 -1.3869182 0.5791827
## [2,] 0.2922679 -0.09798891 0.4366839 0.5862785 0.5703691 -0.2163637
##
## $canonical_corr
## [1] 0.927311439 0.354708001 0.063114244 0.018542243 0.005976463 0.003749350
Print canonical variate scores with scores()
, setting
type = "cva"
.
## $xi
## # A tibble: 3,462 × 2
## xi1 xi2
## <dbl> <dbl>
## 1 0.108 0.185
## 2 -0.0439 1.46
## 3 -0.938 0.0593
## 4 0.0512 0.231
## 5 0.0486 -0.283
## 6 -0.572 0.374
## 7 0.447 0.486
## 8 0.527 0.228
## 9 0.112 0.179
## 10 3.05 -0.252
## # ℹ 3,452 more rows
##
## $omega
## # A tibble: 3,462 × 2
## omega1 omega2
## <dbl> <dbl>
## 1 -0.212 0.746
## 2 -0.381 1.13
## 3 -0.704 0.284
## 4 0.270 0.902
## 5 -0.670 1.25
## 6 -1.59 0.561
## 7 0.726 1.08
## 8 0.403 0.917
## 9 0.366 0.450
## 10 2.37 -0.421
## # ℹ 3,452 more rows
##
## $canonical_corr
## [1] 0.927311439 0.354708001 0.063114244 0.018542243 0.005976463 0.003749350
Linear discriminant analysis is a classification procedure. We can turn it into a regression procedure – specifically a reduced-rank canonical variate procedure – in the following way.
Let each i = 1, 2, …, n observation belong to one, and only one, of K = s + 1 distinct classes.
We can construct an indicator response matrix, Y where each row i is an indicator response vector for the ith observation. The vector will have a 1 in the column that represents that class to which the observation belongs and will be 0 elsewhere.
We then regress this Y binary response matrix against the matrix X of predictor variables.
Linear discriminant analysis requires the assumptions that each class is normally distributed and that the covariance matrix of each class is equal to all others.
While these assumptions will not be met in all cases, when they are – and when the classes are well separated – linear discriminant analysis is a very efficient classification method.
iris
Data SetThis is R.A. Fisher’s classic iris
data set that comes
packaged in base R
.
Assessing the rank t of this reduced-rank regression is equivalent to determining the number of linear discriminant functions that best discriminate between the K classes, with min(r, s) = min(r, K − 1) maximum number of linear discriminant functions.
Generally, plotting linear discriminant functions against each other, i.e., the first and second linear discriminant functions, is used to determine whether sufficient discrimination is obtained.
Plotting techniques are discussed in the following section.
Plot LDA pairs with pairwise_plot()
, setting
type = "lda"
.
A typical graphical display for multiclass LDA is to plot the jth discriminant scores for the n points against the k discriminant scores.
Fit LDA model with rrr()
, setting
type = "lda"
.
## $G
## # A tibble: 4 × 2
## LD1 LD2
## <dbl> <dbl>
## 1 0.143 -0.00971
## 2 0.264 -0.906
## 3 -0.379 0.388
## 4 -0.483 -1.18
##
## $H
## # A tibble: 2 × 2
## LD1 LD2
## <dbl> <dbl>
## 1 1.12 0.332
## 2 0.265 1.11
scores()
Print linear discriminant scores and the class means with
scores()
, setting type = "lda"
## $scores
## # A tibble: 150 × 3
## LD1 LD2 class
## <dbl> <dbl> <fct>
## 1 1.39 -0.126 setosa
## 2 1.23 0.329 setosa
## 3 1.29 0.111 setosa
## 4 1.17 0.280 setosa
## 5 1.40 -0.215 setosa
## 6 1.33 -0.611 setosa
## 7 1.24 -0.149 setosa
## 8 1.31 0.00479 setosa
## 9 1.13 0.425 setosa
## 10 1.26 0.396 setosa
## # ℹ 140 more rows
##
## $class_means
## # A tibble: 3 × 3
## LD1 LD2 class
## <dbl> <dbl> <fct>
## 1 1.35 -0.405 setosa
## 2 -0.324 1.37 versicolor
## 3 -1.03 -0.966 virginica
Hotelling, H. (1936). Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology, 24, 417-411, 498-520.↩︎
Hotelling, H. (1936). Relations between two sets of variates, Biometrika, 28, 321-377.↩︎
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems, Annals of Eugencis, 7, 179-188.↩︎
Izenman, A.J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer.↩︎
Anderson, R.L and Bancroft, T.A (1952). Statistical Theory in Research, New York: McGraw-Hill. p. 205. ↩︎
Hoerl, A.E. and Kennard, R. (1970). Ridge regression: Biased estimation for non-orthogonal problems. Technometrics 12: 55-67. Reprinted in Technometrics, 42 (2000), 80-86.↩︎
Aldrin, Magne. “Multivariate Prediction Using Softly Shrunk Reduced-Rank Regression.” The American Statistician 54.1 (2000): 29. Web. ↩︎
Izenman, A.J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer.↩︎
Alimoglu, F. (1995). Combining multiple classifiers for pen-based handwritten digit recognition, M.Sc. thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University, Istanbul, Turkey.↩︎
Hotelling, H. (1936). Relations between two sets of variates, Biometrika, 28, 321-377.↩︎
Wolf, C., Meisenheimer, M., Kleinheinrich, M., Borch, A., Dye, S., Gray, M., Wisotski, L., Bell, E.F., Rix, H.W., Cimatti, A., Hasinger, G., and Szokoly, G. (2004). A catalogue of the Chandra Deep Field South with multi-colour classification and photometric redshifts from COMBO-17, Astronomy & Astrophysics, https://arxiv.org/pdf/astro-ph/0403666.pdf↩︎
Donald Richards in the Department of Statistics at Pennsylvania State University helped Dr. Izenman in understanding the variables in this data set. The package author, by extension, would also like to acknowledge Dr. Richards for help in understanding the data set used in the text Modern Multivariate Statistical Techniques.↩︎