The Covariance Matrix and relation with PCA

Description of this Post
Author
Published

October 12, 2023

The Covariance Matrix and relation with PCA

Description of this Post
Author
Published

October 12, 2023

Recall: Variance measures the variation of a single random variable (like the height of a person in a population), whereas covariance is a measure of how much two random variables vary together (like the height of a person and the weight of a person in a population)

In a 2D dimensional feature space with 10 data points, the covariance matrix provides a measure of the relationship between the two features (dimensions) and how they vary together. The covariance matrix is a 2x2 matrix that quantifies the degree to which the two features change together

Let’s say you have two features, \(\textbf{x}\) and \(\textbf{y}\), and you have 10 data points with values (x_1, y_1), (x_2, y_2), …, (x_10, y_10). The covariance matrix Σ is calculated as:

\[ \Sigma = \begin{bmatrix} \text{cov}(\textbf{x}, \textbf{x}) & \text{cov}(\textbf{x}, \textbf{y}) \\ \text{cov}(\textbf{y}, \textbf{x}) & \text{cov}(\textbf{y}, \textbf{y}) \end{bmatrix} \]

Where:

The covariances of two random variables (the two features) are calculated using the formula:

\[ \text{cov}(\textbf{x}, \textbf{y}) = \frac{1}{N} \sum_{n=1}^{N} (x_n - {\bar{x}})({y}_n - \bar{y}) \]

Where:

Note:

The calculation for the covariance matrix can be also expressed as:

\[ \Sigma = \frac{1}{N} \sum_{n=1}^{N} (\textbf{x}_n - \mathbf{\bar{x}})(\textbf{x}_n - \mathbf{\bar{x}})^T \]

Where:

Below some example:


1 Python Example

Code
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

plt.style.use('ggplot')
# plt.rcParams['figure.figsize'] = (12, 8)

# Normal distributed x and y vector with mean 0 and standard deviation 1
x = np.random.normal(0, 1, 500)
y = np.random.normal(0, 1, 500)
X = np.vstack((x, y)).T

plt.scatter(X[:, 0], X[:, 1])
plt.title('Generated Data')
plt.axis('equal')
plt.show()

If two feature vectors are independent (or uncorrelated) the matrix matrix would be:

\[ \Sigma = \begin{bmatrix} \text{cov}(\textbf{x}, \textbf{x}) & 0 \\ 0 & \text{cov}(\textbf{y}, \textbf{y}) \end{bmatrix} \]

If this data was generated with unit cov(x,x) and unit cov(y,y) then we have a Identity covariance matrix


  • As x1 increases x2 increases too, because we have 0.8. We have a positive correlation on the data.

2 Relation with PCA

When you multiply the \(\Sigma\) covariance matrix times a vector and then again and again you end up with a vector that points to directions where you have the greatest variance in the data. So like where all points are spread out.


  • If I multiply the vector \(e2\) by the cov matrix it will not turn but will get longer and longer but will point in the same direction
  • Thus we want to find some vector \(e\) that when we multiply it with the cov matrix do not change direction. The vectors are called eigenvectors and the lambdas so the scalar version will be called eigenvalues.
  • The later point will be called our principal components which are the eigenvectors with the largest eigenvalues