Covariance Matrix Calculator
Multivariate variance & correlationPaste or type your multivariate data and this tool computes the covariance matrix and, optionally, the corresponding correlation matrix. Choose between sample and population covariance and get summary statistics that are ready for PCA, portfolio theory, multivariate normal models and more.
Data input & configuration
What is a covariance matrix?
Suppose you have a random vector \[ X = (X_1, X_2, \dots, X_p)^\top. \] The covariance matrix of \(X\) is the \(p \times p\) matrix \[ \Sigma = \operatorname{Cov}(X) = \begin{bmatrix} \operatorname{Var}(X_1) & \operatorname{Cov}(X_1, X_2) & \dots & \operatorname{Cov}(X_1, X_p) \\ \operatorname{Cov}(X_2, X_1) & \operatorname{Var}(X_2) & \dots & \operatorname{Cov}(X_2, X_p) \\ \vdots & \vdots & \ddots & \vdots \\ \operatorname{Cov}(X_p, X_1) & \operatorname{Cov}(X_p, X_2) & \dots & \operatorname{Var}(X_p) \end{bmatrix}. \]
It compactly summarises all pairwise linear relationships between your variables: variances on the diagonal, covariances off the diagonal.
Sample vs population covariance matrix
In practice you estimate covariance from a data matrix \(X\) with \(n\) observations (rows) and \(p\) variables (columns). Let \(x_{i j}\) be the value of variable \(j\) in observation \(i\), and let \(\bar{x}_j\) be the sample mean of variable \(j\).
Sample covariance matrix
For \(n\) observations and \(p\) variables, the sample covariance between variables \(j\) and \(k\) is \[ s_{j k} = \frac{1}{n - 1} \sum_{i=1}^n (x_{i j} - \bar{x}_j)\,(x_{i k} - \bar{x}_k). \] The sample covariance matrix \(S\) collects all \(s_{j k}\) in a \(p \times p\) matrix.
Dividing by \(n - 1\) gives an unbiased estimator of the population covariance when data are independent and identically distributed.
Population covariance matrix
If you can treat the data as the entire population, you may use \[ \sigma_{j k} = \frac{1}{n} \sum_{i=1}^n (x_{i j} - \bar{x}_j)\,(x_{i k} - \bar{x}_k), \] i.e. divide by \(n\) instead of \(n - 1\). This is the population covariance.
From covariance matrix to correlation matrix
The covariance between variables depends on the measurement units (for example, metres vs kilometres). To remove the effect of scale you can compute the correlation matrix, whose entries are the Pearson correlation coefficients: \[ \rho_{j k} = \frac{\operatorname{Cov}(X_j, X_k)}{\sqrt{\operatorname{Var}(X_j)\operatorname{Var}(X_k)}}. \]
In matrix form, if \(D\) is the diagonal matrix of standard deviations, the correlation matrix is \[ R = D^{-1} \Sigma D^{-1}. \] The diagonal entries of \(R\) are all 1, and each off-diagonal element lies between −1 and +1.
Applications of the covariance matrix
- Principal component analysis (PCA) — PCA diagonalises the covariance matrix to find directions of maximum variance.
- Portfolio theory — in quantitative finance, the covariance matrix of asset returns is central to risk modelling and optimisation.
- Multivariate normal models — the covariance matrix parameterises the spread and shape of multivariate Gaussian distributions.
- State estimation & Kalman filtering — process and measurement noise covariances are key design inputs.
- Machine learning & data preprocessing — covariance is used in whitening, feature scaling and understanding feature interactions.
Numerical considerations
- If the number of variables \(p\) is close to or larger than the number of observations \(n\), the covariance matrix may become singular or highly unstable.
- Strong collinearity (nearly linear relationships between variables) can cause near-singularity and large condition numbers, which affect matrix inversion and eigen-decomposition.
- For high-dimensional or ill-conditioned problems, consider regularised estimators (for example, shrinkage covariance) rather than the plain sample covariance.