What is the difference between sample and population covariance?

Population covariance divides by N, where N is the total number of observations in the population. Sample covariance divides by N − 1 to correct bias when you only observe a sample. In practice, sample covariance is used when you estimate from data.

How should I format the input data?

Each row should be one observation, and each column should be one variable. You can separate values within a row by commas, semicolons or whitespace. All rows must have the same number of numeric entries and you need at least two variables and two observations.

When should I use the covariance matrix instead of the correlation matrix?

The covariance matrix keeps the original units and is needed for computations such as multivariate normal densities, Kalman filters, and portfolio optimisation. The correlation matrix standardises variances to 1 and is useful to compare strength and direction of linear relationships independently of the scale of each variable.

Why can the covariance matrix be singular or ill-conditioned?

If one variable is a linear combination of others or the number of observations is too small relative to the number of variables, the covariance matrix can become singular or nearly singular. In that case, some multivariate methods such as inversion or PCA need regularisation or dimensionality reduction.

Covariance Matrix Calculator – Variance & Correlation from Data

Q: What is a covariance matrix?

A covariance matrix is a square matrix that summarises the covariances between all pairs of components of a random vector. Its diagonal entries are variances of individual variables and off-diagonal entries are covariances between variables.

Q: What does this covariance matrix calculator do?

The calculator takes tabular data with multiple variables, computes the sample or population means of each variable, and then returns the covariance matrix and optionally the corresponding correlation matrix. It also reports basic summary statistics such as the number of observations and the variances of each variable.

Paste or type your multivariate data and this tool computes the covariance matrix and, optionally, the corresponding correlation matrix. Choose between sample and population covariance and get summary statistics that are ready for PCA, portfolio theory, multivariate normal models and more.

What is a covariance matrix?

Suppose you have a random vector \[ X = (X_1, X_2, \dots, X_p)^\top. \] The covariance matrix of \(X\) is the \(p \times p\) matrix \[ \Sigma = \operatorname{Cov}(X) = \begin{bmatrix} \operatorname{Var}(X_1) & \operatorname{Cov}(X_1, X_2) & \dots & \operatorname{Cov}(X_1, X_p) \\ \operatorname{Cov}(X_2, X_1) & \operatorname{Var}(X_2) & \dots & \operatorname{Cov}(X_2, X_p) \\ \vdots & \vdots & \ddots & \vdots \\ \operatorname{Cov}(X_p, X_1) & \operatorname{Cov}(X_p, X_2) & \dots & \operatorname{Var}(X_p) \end{bmatrix}. \]

It compactly summarises all pairwise linear relationships between your variables: variances on the diagonal, covariances off the diagonal.

Sample vs population covariance matrix

In practice you estimate covariance from a data matrix \(X\) with \(n\) observations (rows) and \(p\) variables (columns). Let \(x_{i j}\) be the value of variable \(j\) in observation \(i\), and let \(\bar{x}_j\) be the sample mean of variable \(j\).

Sample covariance matrix

For \(n\) observations and \(p\) variables, the sample covariance between variables \(j\) and \(k\) is \[ s_{j k} = \frac{1}{n - 1} \sum_{i=1}^n (x_{i j} - \bar{x}_j)\,(x_{i k} - \bar{x}_k). \] The sample covariance matrix \(S\) collects all \(s_{j k}\) in a \(p \times p\) matrix.

Dividing by \(n - 1\) gives an unbiased estimator of the population covariance when data are independent and identically distributed.

Population covariance matrix

If you can treat the data as the entire population, you may use \[ \sigma_{j k} = \frac{1}{n} \sum_{i=1}^n (x_{i j} - \bar{x}_j)\,(x_{i k} - \bar{x}_k), \] i.e. divide by \(n\) instead of \(n - 1\). This is the population covariance.

From covariance matrix to correlation matrix

The covariance between variables depends on the measurement units (for example, metres vs kilometres). To remove the effect of scale you can compute the correlation matrix, whose entries are the Pearson correlation coefficients: \[ \rho_{j k} = \frac{\operatorname{Cov}(X_j, X_k)}{\sqrt{\operatorname{Var}(X_j)\operatorname{Var}(X_k)}}. \]

In matrix form, if \(D\) is the diagonal matrix of standard deviations, the correlation matrix is \[ R = D^{-1} \Sigma D^{-1}. \] The diagonal entries of \(R\) are all 1, and each off-diagonal element lies between −1 and +1.

Applications of the covariance matrix

Principal component analysis (PCA) — PCA diagonalises the covariance matrix to find directions of maximum variance.
Portfolio theory — in quantitative finance, the covariance matrix of asset returns is central to risk modelling and optimisation.
Multivariate normal models — the covariance matrix parameterises the spread and shape of multivariate Gaussian distributions.
State estimation & Kalman filtering — process and measurement noise covariances are key design inputs.
Machine learning & data preprocessing — covariance is used in whitening, feature scaling and understanding feature interactions.

Numerical considerations

If the number of variables \(p\) is close to or larger than the number of observations \(n\), the covariance matrix may become singular or highly unstable.
Strong collinearity (nearly linear relationships between variables) can cause near-singularity and large condition numbers, which affect matrix inversion and eigen-decomposition.
For high-dimensional or ill-conditioned problems, consider regularised estimators (for example, shrinkage covariance) rather than the plain sample covariance.

Covariance Matrix Calculator

Data input & configuration

Results