Finance14 min read·

Granger Causality: What It Is & How to Test for It 2026

A practical guide to Granger causality - what it really means, how to run the test in Python, how to interpret results, and applications in finance and trading.

What Is Granger Causality?

Granger causality is a statistical test that determines whether one time series is useful for forecasting another. If past values of series ( X ) improve predictions of series ( Y ) beyond what ( Y )'s own past values already provide, then ( X ) is said to "Granger-cause" ( Y ). The concept was introduced by the Nobel laureate Clive Granger in 1969, and it remains one of the most widely used tools in econometrics and quantitative finance in 2026.

The intuition is straightforward. Suppose you're forecasting tomorrow's FTSE 100 return. You build a model using only past FTSE 100 returns, and it produces a mean squared error of 0.04. Then you add past S&P 500 returns to the model, and the error drops to 0.03. The S&P 500 contains information about the FTSE 100's future that the FTSE 100 itself doesn't capture. In Granger's framework, the S&P 500 Granger-causes the FTSE 100.

Formally, consider two stationary time series ( X_t ) and ( Y_t ). We say ( X ) Granger-causes ( Y ) if:

[ P(Y_{t+1} | Y_t, Y_{t-1}, \ldots, X_t, X_{t-1}, \ldots) \neq P(Y_{t+1} | Y_t, Y_{t-1}, \ldots) ]

In other words, the conditional distribution of ( Y ) changes when you include the history of ( X ). In practice, this is tested by comparing two regression models - one with only lagged ( Y ) values as predictors, and one with both lagged ( Y ) and lagged ( X ) values - and asking whether the added ( X ) terms are jointly significant.

A critical point that trips up many beginners: Granger causality is not real causation. It's predictive causality. If ( X ) Granger-causes ( Y ), it means ( X ) contains predictive information about ( Y ) - nothing more. It doesn't mean ( X ) actually causes ( Y ) in any physical or economic sense. We'll explore this distinction in detail below.


Granger Causality vs Real Causation

The name "Granger causality" is arguably misleading, and Granger himself acknowledged this. What the test detects is temporal precedence combined with predictive power - not a causal mechanism. Understanding this distinction is essential before applying the test.

Real causation implies a mechanism: raising interest rates causes borrowing to become more expensive, which reduces spending. Granger causality only asks: do past interest rate changes help predict future spending, after accounting for spending's own history? The answer could be "yes" for reasons that have nothing to do with a direct causal link.

Here are three common ways Granger causality can mislead:

Confounding variables. Suppose variable ( Z ) causes both ( X ) and ( Y ), but affects ( X ) slightly earlier than ( Y ). Then ( X ) will appear to Granger-cause ( Y ), even though the actual driver is ( Z ). For example, weather forecasts (( Z )) might influence both agricultural futures (( X )) and food retail stocks (( Y )), with futures reacting faster. You'd find that agricultural futures Granger-cause retail stocks, but the real driver is weather.

Spurious regression. If both series are non-stationary (they have unit roots), the Granger causality test can produce spurious results - finding significant relationships where none exist. This is why the test requires stationary data, and you should always check for stationarity before running it.

Feedback loops. In many economic systems, ( X ) Granger-causes ( Y ) and ( Y ) Granger-causes ( X ) simultaneously. Stock prices and trading volume are a classic example - high volume can predict future price moves, and large price moves can predict future volume. Bilateral Granger causality doesn't mean both variables cause each other; it can simply mean they're both responding to the same underlying dynamics with different speeds.

The bottom line: treat Granger causality as a test for predictive information flow, not as evidence of a causal mechanism. It's a useful screening tool for identifying potential lead-lag relationships, but any causal claims require economic theory and further analysis.


The Granger Causality Test

The standard Granger causality test is built on a vector autoregressive (VAR) model framework. Here's how it works step by step.

Step 1: Set Up the VAR Model

To test whether ( X ) Granger-causes ( Y ), you estimate two models. The restricted model includes only lagged values of ( Y ):

[ Y_t = \alpha_0 + \sum_{i=1}^{p} \alpha_i Y_{t-i} + \epsilon_t ]

The unrestricted model adds lagged values of ( X ):

[ Y_t = \alpha_0 + \sum_{i=1}^{p} \alpha_i Y_{t-i} + \sum_{j=1}^{p} \beta_j X_{t-j} + \epsilon_t ]

Here, ( p ) is the number of lags included in the model.

Step 2: Test Whether the X Lags Are Jointly Significant

The null hypothesis is that ( X ) does not Granger-cause ( Y ):

[ H_0: \beta_1 = \beta_2 = \cdots = \beta_p = 0 ]

Under the null, the lagged ( X ) terms add no predictive power. You test this using an F-test that compares the residual sum of squares (RSS) from the restricted and unrestricted models:

[ F = \frac{(RSS_{\text{restricted}} - RSS_{\text{unrestricted}}) / p}{RSS_{\text{unrestricted}} / (T - 2p - 1)} ]

where ( T ) is the number of observations. If the F-statistic is large enough (or equivalently, the p-value is below your significance level), you reject the null and conclude that ( X ) Granger-causes ( Y ).

Choosing the Lag Order

The lag order ( p ) matters. Too few lags and you miss genuine predictive relationships. Too many and you lose statistical power and risk overfitting. The standard approach is to select lags using an information criterion:

  • AIC (Akaike Information Criterion): tends to select more lags; useful when you want to minimise prediction error.
  • BIC (Bayesian Information Criterion): penalises extra parameters more heavily; tends to select fewer lags.

In practice, fit a VAR model across a range of lag orders (say 1 to 12 for monthly data, or 1 to 20 for daily data) and pick the lag that minimises your chosen criterion. Many researchers report results across multiple lag orders as a robustness check.

You can also use the likelihood ratio test to compare nested VAR models at different lag orders, though information criteria are more common in applied work.

Assumptions and Prerequisites

Before running the Granger causality test, check that:

  1. Both series are stationary. Apply unit root tests (ADF, KPSS) first. If series are non-stationary, difference them or consider a cointegration framework instead.
  2. No serial correlation in residuals. Residual autocorrelation violates the assumptions of the F-test. Use the Durbin-Watson test or Ljung-Box test to check.
  3. Adequate sample size. The test requires enough observations to estimate ( 2p + 1 ) parameters reliably. A common rule of thumb is at least 50 observations per variable, though more is better.

Running the Granger Causality Test in Python

The statsmodels library provides a convenient function for Granger causality testing. Here's a complete example using both a built-in function and a manual implementation.

Using statsmodels.tsa.stattools.grangercausalitytests

import numpy as np import pandas as pd from statsmodels.tsa.stattools import grangercausalitytests, adfuller def check_stationarity(series: pd.Series, name: str) -> bool: """Check if a series is stationary using the ADF test.""" result = adfuller(series, autolag="AIC") p_value = result[1] is_stationary = p_value < 0.05 print(f"{name}: ADF stat = {result[0]:.4f}, " f"p-value = {p_value:.4f}, " f"stationary = {is_stationary}") return is_stationary def run_granger_test( data: pd.DataFrame, cause_col: str, effect_col: str, max_lag: int = 10, significance: float = 0.05, ) -> dict: """ Run the Granger causality test. Parameters ---------- data : pd.DataFrame DataFrame with at least two columns of stationary data. cause_col : str Name of the column hypothesised to be the cause. effect_col : str Name of the column hypothesised to be the effect. max_lag : int Maximum number of lags to test. significance : float Significance level for the test. Returns ------- dict with results for each lag order. """ # grangercausalitytests expects [effect, cause] column order test_data = data[[effect_col, cause_col]].dropna() print(f"\nTesting: does '{cause_col}' Granger-cause " f"'{effect_col}'?") print(f"Observations: {len(test_data)}") print("-" * 55) results = grangercausalitytests(test_data, maxlag=max_lag, verbose=False) summary = {} for lag in range(1, max_lag + 1): f_test = results[lag][0]["ssr_ftest"] f_stat = f_test[0] p_value = f_test[1] granger_causes = p_value < significance summary[lag] = { "f_statistic": f_stat, "p_value": p_value, "granger_causes": granger_causes, } marker = "*" if granger_causes else "" print(f"Lag {lag:2d}: F = {f_stat:8.4f}, " f"p = {p_value:.6f} {marker}") return summary # --- Example: simulated lead-lag relationship --- np.random.seed(42) n = 1000 # X leads Y by one period noise_x = np.random.normal(0, 1, n) noise_y = np.random.normal(0, 1, n) x = np.zeros(n) y = np.zeros(n) for t in range(1, n): x[t] = 0.5 * x[t - 1] + noise_x[t] y[t] = 0.3 * y[t - 1] + 0.6 * x[t - 1] + noise_y[t] data = pd.DataFrame({"X": x, "Y": y}) # Verify stationarity print("Stationarity checks:") check_stationarity(data["X"], "X") check_stationarity(data["Y"], "Y") # Test if X Granger-causes Y results_xy = run_granger_test(data, cause_col="X", effect_col="Y", max_lag=5) # Test the reverse direction results_yx = run_granger_test(data, cause_col="Y", effect_col="X", max_lag=5)

This example creates two stationary series where ( X ) genuinely leads ( Y ) by one period. The test should find that ( X ) Granger-causes ( Y ) (significant p-values, especially at lag 1), but ( Y ) should not Granger-cause ( X ) (p-values above 0.05).

A Real-World Granger Causality Example with Financial Data

Here's a more practical example testing whether oil price returns Granger-cause equity market returns, a relationship that's been studied extensively in the finance literature.

import numpy as np import pandas as pd from statsmodels.tsa.stattools import grangercausalitytests, adfuller from statsmodels.tsa.api import VAR def select_lag_order( data: pd.DataFrame, max_lag: int = 15 ) -> dict: """Select optimal lag order using information criteria.""" model = VAR(data) results = {} for criterion in ["aic", "bic", "hqic"]: selected = model.select_order(maxlags=max_lag) results[criterion] = getattr(selected, criterion) print("Optimal lag orders:") for criterion, lag in results.items(): print(f" {criterion.upper()}: {lag}") return results def bilateral_granger_test( data: pd.DataFrame, col_a: str, col_b: str, max_lag: int = 5, significance: float = 0.05, ) -> dict: """ Run bilateral Granger causality tests between two series. Tests both directions: A -> B and B -> A. """ results = {} for cause, effect in [(col_a, col_b), (col_b, col_a)]: direction = f"{cause} -> {effect}" test_data = data[[effect, cause]].dropna() gc_results = grangercausalitytests( test_data, maxlag=max_lag, verbose=False ) min_p = min( gc_results[lag][0]["ssr_ftest"][1] for lag in range(1, max_lag + 1) ) best_lag = min( range(1, max_lag + 1), key=lambda lag: gc_results[lag][0]["ssr_ftest"][1], ) results[direction] = { "min_p_value": min_p, "best_lag": best_lag, "granger_causes": min_p < significance, } status = "YES" if min_p < significance else "NO" print(f"{direction}: {status} " f"(p = {min_p:.6f} at lag {best_lag})") return results # --- Generate realistic financial return series --- np.random.seed(123) n = 2000 # Simulate oil returns and equity returns with a lead-lag oil_shocks = np.random.normal(0, 0.02, n) equity_shocks = np.random.normal(0, 0.015, n) oil_returns = np.zeros(n) equity_returns = np.zeros(n) for t in range(2, n): oil_returns[t] = ( 0.05 * oil_returns[t - 1] + oil_shocks[t] ) equity_returns[t] = ( 0.03 * equity_returns[t - 1] - 0.15 * oil_returns[t - 1] - 0.08 * oil_returns[t - 2] + equity_shocks[t] ) data = pd.DataFrame({ "oil_returns": oil_returns, "equity_returns": equity_returns, }) # Check stationarity print("Stationarity checks:") for col in data.columns: adf_result = adfuller(data[col], autolag="AIC") print(f" {col}: ADF p-value = {adf_result[1]:.6f}") # Select lag order print() select_lag_order(data) # Bilateral Granger causality test print("\nBilateral Granger causality test:") bilateral_granger_test(data, "oil_returns", "equity_returns", max_lag=5)

In this simulation, oil returns have a negative effect on equity returns with a one-to-two-period delay - reflecting the real-world observation that oil price spikes tend to hurt stock markets with a lag. The bilateral test should confirm that oil Granger-causes equities but not the reverse.


Interpreting Granger Causality Results

Getting the test output is easy. Interpreting it correctly requires more care.

Reading the P-Values

The Granger causality test produces a p-value for each lag order tested. A p-value below your significance threshold (typically 0.05) means you reject the null hypothesis of no Granger causality at that lag. But don't just look at individual lags in isolation.

If the p-value is significant at lag 1 but not at lags 2 through 5, the predictive relationship is concentrated in the most recent observation - ( X ) from yesterday helps predict ( Y ) today, but ( X ) from two or more days ago doesn't add anything beyond that. This is common in highly efficient financial markets.

If p-values are significant at lags 1 through 3 but not beyond, the information in ( X ) takes about three periods to be fully absorbed into ( Y ). This suggests a slower transmission mechanism - perhaps the kind you'd see in macroeconomic data where policy changes take time to affect the real economy.

If p-values are only significant at higher lags (say lag 4 or 5 but not 1 or 2), be cautious. This pattern is often an artefact of overfitting or seasonal effects rather than a genuine predictive relationship.

Bilateral Testing

Always test both directions. If ( X ) Granger-causes ( Y ) but not vice versa, you have a clear directional relationship - ( X ) leads ( Y ). If both directions are significant, you have a feedback relationship, and neither variable is cleanly "leading" the other. If neither direction is significant, there's no detectable lead-lag relationship at the tested frequencies.

Bilateral Granger causality is especially common in financial markets where information flows quickly between related assets. For example, the spot and futures prices of the same commodity almost always Granger-cause each other, since both reflect the same information with slightly different timing.

Sensitivity to Lag Selection

Results can change dramatically with different lag orders. A relationship that looks strong at lag 2 might vanish at lag 5, or vice versa. This is why it's good practice to:

  1. Use information criteria (AIC/BIC) to select the lag order formally.
  2. Report results across a range of lags as a robustness check.
  3. Be sceptical of results that only show up at a single, arbitrary lag order.

If your conclusion flips depending on whether you use 3 or 4 lags, the evidence is weak regardless of any individual p-value.


Applications in Finance and Trading

Granger causality is widely used across finance, from academic research to production trading systems. Here are the most common applications.

Lead-Lag Relationships Between Markets

One of the earliest and most studied applications is testing whether one market leads another. The US equity market, being the largest and most liquid, often Granger-causes smaller markets. Research has consistently shown that S&P 500 returns help predict next-day returns in European and Asian indices, but the reverse is weaker.

This matters for traders because lead-lag relationships are potential alpha sources. If overnight S&P 500 futures moves predict FTSE 100 opening moves, a systematic strategy can trade on that signal. The Granger causality test is the standard first step in identifying whether such a relationship exists and how many periods the lead extends.

Macroeconomic Forecasting

Do interest rates Granger-cause GDP growth? Does money supply lead inflation? These are classic questions in macroeconomics, and the Granger causality test was originally developed to address them. Central bank researchers routinely use Granger causality to examine the transmission of monetary policy to the real economy.

A well-known finding is that the term spread (difference between long-term and short-term interest rates) Granger-causes GDP growth and recession probabilities. This relationship has been remarkably stable across decades, and many economists consider the yield curve one of the best recession predictors available.

Oil Prices and Stock Returns

The relationship between oil prices and equity returns has been tested using Granger causality in hundreds of academic papers. The general finding is that oil price shocks Granger-cause stock returns - particularly for energy-intensive sectors - but the reverse is weaker. This is consistent with oil acting as a cost input for firms: when oil prices rise, profit margins shrink, and stock prices fall with a lag.

Volatility Spillovers

Granger causality isn't limited to returns. You can test whether volatility in one market predicts volatility in another. For example, does the VIX (US equity volatility) Granger-cause VSTOXX (European equity volatility)? These tests are central to understanding how financial stress propagates across global markets.

Pairs Trading and Statistical Arbitrage

While cointegration is the primary tool for pairs trading, Granger causality plays a supporting role. If stock A Granger-causes stock B, you know that movements in A precede movements in B. This information can improve entry timing for a pairs trade - rather than waiting for the spread to widen, you can anticipate the spread movement based on A's recent behaviour.

Cryptocurrency Markets

Granger causality has become a popular tool for analysing cryptocurrency markets, where lead-lag structures shift rapidly. Researchers test whether Bitcoin returns Granger-cause altcoin returns, whether social media sentiment Granger-causes crypto prices, and whether stablecoin flows predict broader market moves. The results tend to be less stable than in traditional markets, reflecting the faster evolution of crypto market microstructure.


Limitations and Pitfalls

The Granger causality test is useful, but it's not without significant limitations. Being aware of these will stop you from over-interpreting results.

It's Not Real Causation

This point bears repeating. Finding that ( X ) Granger-causes ( Y ) does not mean ( X ) causes ( Y ). A third variable might be driving both, with ( X ) simply reacting faster. In financial data, where thousands of variables are correlated, this is more the rule than the exception.

Omitted Variable Bias

If a relevant variable is left out of the model, the Granger causality test can give misleading results - either finding causality where none exists or failing to detect a genuine relationship. For instance, if you test whether gold prices Granger-cause the US dollar without controlling for interest rates, you might get a spurious result because interest rates drive both.

Multivariate Granger causality tests (using a full VAR model with all relevant variables) partially address this, but you can never include every relevant variable.

Non-Stationarity

The test requires stationary data. If your series have unit roots and you run the test on levels, the standard F-test distributions are wrong, and your p-values are unreliable. Always test for stationarity first. If series are non-stationary, either difference them or use the Toda-Yamamoto approach, which modifies the VAR to accommodate integrated series.

Be particularly careful with price-level data. Stock prices, exchange rates, and commodity prices are almost always non-stationary. You need to work with returns (first differences of log prices) or apply other transformations before testing.

Lag Selection Sensitivity

Different lag orders can produce contradictory conclusions. A relationship might appear significant at lag 3 but not at lag 5, or vice versa. There's no universally "correct" lag order, and information criteria sometimes disagree with each other (AIC might suggest 4 lags while BIC suggests 2).

The best practice is to report results across a range of lags and only trust relationships that are consistent. If your finding disappears when you change the lag from 3 to 4, it's fragile.

Multiple Testing Problem

If you test 20 pairs of variables for Granger causality at the 5% level, you'd expect one significant result by chance alone. Financial researchers often test dozens or hundreds of pairs, making false discoveries likely. Apply a Bonferroni correction or false discovery rate (FDR) adjustment when testing multiple relationships.

Structural Breaks

Economic relationships change over time. A Granger causal relationship that held during 2010-2020 might not hold in 2021-2026. The global financial crisis, COVID-19, and shifts in monetary policy all represent structural breaks that can alter lead-lag dynamics. Consider testing for Granger causality in rolling windows to check whether the relationship is stable.

Linear Assumption

The standard Granger causality test only captures linear predictive relationships. If ( X ) predicts ( Y ) through a nonlinear function, the test might miss it entirely. Nonlinear Granger causality tests exist (based on kernel methods or neural networks), but they require more data and are more complex to implement and interpret.


Granger Causality and Cointegration

Granger causality and cointegration are related but distinct concepts, and understanding how they connect is important for applied work.

The Granger Representation Theorem states that if two I(1) series are cointegrated, there must be Granger causality in at least one direction. This makes intuitive sense: if two series share a long-run equilibrium, at least one of them must be adjusting when the spread deviates from equilibrium, and the other series' past values provide information about that adjustment.

However, the reverse doesn't hold. Granger causality can exist between stationary series that aren't cointegrated (because cointegration only applies to integrated series). And two non-stationary series can Granger-cause each other without being cointegrated.

In practice, the workflow for analysing financial data is:

  1. Test for stationarity using ADF or KPSS tests.
  2. If series are I(1), test for cointegration. If cointegrated, use a vector error correction model (VECM) which automatically captures Granger causal relationships through the error correction term.
  3. If series are stationary (or differenced to stationarity), apply the Granger causality test directly using a VAR model.

Running a Granger causality test on non-stationary, non-cointegrated series is invalid and will produce unreliable results.


Frequently Asked Questions

What does Granger causality actually tell you?

Granger causality tells you whether past values of one variable contain useful information for predicting another variable, after controlling for that variable's own history. If ( X ) Granger-causes ( Y ), then knowing what ( X ) did in the past improves your forecast of ( Y ) today. It doesn't tell you why - the mechanism could be a direct causal effect, a confounding variable, or simply that ( X ) reacts faster to common shocks. Treat it as evidence of predictive information flow, not as proof of a causal mechanism.

How do you choose the right number of lags?

Use information criteria. Fit a VAR model with increasing lag orders and pick the lag that minimises the AIC or BIC. AIC tends to select more lags and is better when your priority is prediction accuracy. BIC penalises complexity more and is better when you want a parsimonious model. When they disagree, report both and check whether your conclusions change. As a robustness exercise, always test across a range of lags (say 1 to 10) and be wary of results that only appear at a single lag order.

Can you use Granger causality on non-stationary data?

Not with the standard test. The F-test used in Granger causality assumes stationary data, and applying it to series with unit roots produces invalid p-values. You have two options: difference the data to make it stationary (the most common approach for financial returns), or use the Toda-Yamamoto procedure, which estimates a VAR in levels with extra lags to correct the asymptotic distribution. If your non-stationary series are cointegrated, you should use a VECM instead, which incorporates both the long-run equilibrium and the short-run Granger causal dynamics.

What is the difference between Granger causality and correlation?

Correlation measures the contemporaneous linear association between two variables - when one goes up, does the other tend to go up at the same time? Granger causality measures whether the past of one variable helps predict the future of another. Two series can be highly correlated but have no Granger causal relationship (they move together simultaneously but neither leads the other). Conversely, two series can have low contemporaneous correlation but strong Granger causality (one predicts the other with a delay). For trading strategies, Granger causality is often more useful because it identifies lead-lag relationships that can generate trading signals.

Is Granger causality useful for trading?

Yes, but with caveats. Granger causality can identify lead-lag relationships between assets, which are potential sources of alpha. If asset A's returns predict asset B's returns with a one-day lag, a systematic strategy can trade on that signal. However, in liquid, efficient markets, strong Granger causal relationships tend to be short-lived - once enough participants identify and trade on them, the edge disappears. The test is most useful as a screening tool: test hundreds of pairs, identify candidates with strong lead-lag relationships, then validate those candidates with out-of-sample testing and economic reasoning before putting real capital at risk.

Want to go deeper on Granger Causality: What It Is & How to Test for It 2026?

This article covers the essentials, but there's a lot more to learn. Inside Quantt, you'll find hands-on coding exercises, interactive quizzes, and structured lessons that take you from fundamentals to production-ready skills — across 50+ courses in technology, finance, and mathematics.

Free to get started · No credit card required