Cross-Correlation Calculator – Time Series Correlation by Lag
Paste two time series, choose a maximum lag, and compute the cross-correlation function to see how strongly they move together as one series is shifted in time.
Cross-correlation calculator
Paste numbers separated by commas, spaces, tabs or line breaks. X is treated as the leading series.
Y is shifted forward/backward relative to X to compute correlation at each lag.
Lags will run from −maxLag to +maxLag, limited by series length.
Max correlation
–
Lag: –
Zero-lag correlation
–
Pearson r at lag 0
Overlapping pairs
–
Series lengths: –
| Lag k | Overlapping pairs | rXY(k) |
|---|
What is cross-correlation?
Given two time series \( X_t \) and \( Y_t \), the cross-correlation function (CCF) tells you how strongly they move together when one series is shifted forward or backward in time.
For a lag \( k \), we correlate \( X_t \) with \( Y_{t+k} \). A positive lag means that Y follows X; a negative lag means that Y tends to lead X.
The calculator implements this sample correlation per lag, and also offers a cross-covariance mode where the numerator \(\sum (X_i - \bar X_k)(Y_{i+k} - \bar Y_k)\) is reported directly.
Zero-lag correlation vs. cross-correlation
The usual Pearson correlation coefficient between \( X \) and \( Y \) pairs each value at time \( t \) in the first series with the value at the same time in the second series. This is the lag 0 entry of the cross-correlation function.
Cross-correlation generalizes this idea by exploring a range of lags: \(-k_{\max}, \dots, -1, 0, 1, \dots, k_{\max}\). This is crucial when:
- One series is a delayed response to the other (e.g., downstream sensor data).
- You suspect lead–lag effects between markets or economic indicators.
- You are aligning signals in engineering (vibration, audio, radar, seismology, etc.).
How to use this cross-correlation calculator
- Paste your data. Put the first time series in the X box and the second in the Y box. They may have different lengths; the tool only uses overlapping indices for each lag.
- Choose maximum lag. A conservative starting point is 10–20% of the shorter series length. Very large lags can produce noisy estimates with very few pairs.
- Choose normalization. For most statistical applications, use correlation coefficient (Pearson r). Use cross-covariance if you are interested in absolute energy of the joint fluctuations.
- Run the calculator and interpret. Look at the lag where the absolute value of the correlation is strongest, and compare it to the zero-lag correlation.
Interpreting the sign and lag
- A positive correlation at lag \( k \) suggests that when X is above its mean, Y (shifted by \( k \)) also tends to be above its mean.
- A negative correlation at lag \( k \) means that high values of X are associated with low values of Y at that lag (and vice versa).
- A maximum at positive lag means X tends to lead Y (Y follows after some delay).
- A maximum at negative lag means Y tends to lead X.
Remember that even a high correlation may still be due to chance or shared external drivers. Always combine cross-correlation analysis with domain knowledge, plots of the original series, and where appropriate, formal hypothesis testing.
FAQ – cross-correlation calculator
My correlation values are greater than 1 in magnitude. Is that possible?
That can happen only if you are viewing cross-covariance instead of correlation coefficients, since covariance is not bounded between −1 and 1. If you see values outside [−1, 1] while in correlation mode, check that you have enough overlapping points and that your data do not contain non-numeric values.
How many data points do I need for reliable cross-correlation?
There is no universal rule, but with very few overlapping points (e.g., n < 10 per lag) correlations become unstable and highly variable. This tool reports how many overlapping pairs are used at each lag so you can judge the robustness of each estimate.
Does this tool account for non-stationarity or trends?
No. It computes classical sample cross-correlation on the raw series segments. In serious time-series work you would normally inspect for non-stationarity, detrend or difference the data, and consider more advanced models (ARIMA, transfer functions, state-space models, etc.).
Are missing values supported?
The current implementation expects numeric values only. Any non-numeric entries will trigger an error. If your series contains missing data, remove or impute them before using the calculator.