Finance20 min read·9 April 2026

Pairs Trading: Complete Strategy Guide with Python 2026

Q: What Python libraries do I need for pairs trading?

The core stack is `pandas` for data handling, `numpy` for numerical computation, and `statsmodels` for the cointegration tests (`coint` for Engle-Granger, `coint_johansen` for Johansen) and OLS regression. For data retrieval, `yfinance` is the quickest way to pull historical stock prices. For visualisation, `matplotlib` or `plotly` will let you plot spreads, z-scores, and equity curves. If you move to dynamic hedge ratios, `pykalman` or a manual Kalman filter implementation (as shown earlier) handles the state estimation. All of these libraries are open-source, well-maintained, and widely used in quantitative finance in 2026.

A hands-on guide to pairs trading - how to find cointegrated pairs, calculate the spread, build entry and exit signals.

What Is Pairs Trading?

Pairs trading is a market-neutral strategy that profits from the relative movement between two related securities. You identify two stocks (or other instruments) whose prices move together over time, wait for their relationship to temporarily break down, then go long the underperformer and short the outperformer. When the spread between them reverts to its historical norm, you close both positions and pocket the difference.

The strategy doesn't care whether the broader market goes up or down. Because you hold one long and one short position simultaneously, your exposure to the overall market direction is close to zero. What matters is the relative movement - whether the gap between the two securities narrows or widens. This market neutrality is why pairs trading has been a staple at quantitative hedge funds since the 1980s, when Nunzio Tartaglia's group at Morgan Stanley first systematised the approach.

Here's a simple example. Suppose Shell and BP both trade at £25. Over the next month, Shell drops to £22 while BP stays flat. If the historical relationship between them is stable, this £3 gap is likely temporary. You'd buy Shell (expecting it to recover) and short BP (as a hedge). When the spread narrows back to normal - perhaps Shell rises to £24 and BP dips to £24.50 - you close both legs and profit from the convergence.

The critical question is: how do you know the spread will revert? That's where the statistical foundation comes in, and it rests on a concept called cointegration.

The Theory Behind Pairs Trading

Three ideas underpin pairs trading: cointegration, mean reversion of the spread, and market neutrality. Understanding all three is essential before writing a single line of code.

Cointegration

Two price series are cointegrated if, despite each wandering randomly on its own, a linear combination of them is stationary - it fluctuates around a stable mean rather than drifting off without bound. This is a stronger condition than correlation. Two stocks can be highly correlated (they move in the same direction most days) yet not cointegrated (the gap between their prices widens steadily over time). Conversely, two stocks can have modest daily correlation but a spread that reliably snaps back to a mean.

Formally, if stock ( Y_t ) and stock ( X_t ) are both integrated of order one (I(1) - i.e. random walks), they're cointegrated if there exists a coefficient ( \beta ) such that:

[ Z_t = Y_t - \beta X_t ]

is stationary (I(0)). The series ( Z_t ) is the spread, and ( \beta ) is the hedge ratio. For a full treatment of the underlying statistics, see the cointegration guide.

Mean Reversion of the Spread

If the spread is stationary, it has a well-defined mean and variance, and deviations from the mean are temporary. This is the property that makes the strategy work. When the spread is unusually wide (large positive ( Z_t )), you expect it to narrow - so you short the spread. When it's unusually narrow (large negative ( Z_t )), you expect it to widen - so you go long the spread. The theory of mean reversion gives you the statistical tools to quantify "unusually" and estimate how quickly the spread will revert.

Market Neutrality

By holding offsetting long and short positions, pairs trading aims to be dollar-neutral (equal capital on each side) and beta-neutral (equal market exposure on each side). If the market drops 5%, your long leg loses roughly 5% and your short leg gains roughly 5%, leaving the overall position close to flat. Your profit or loss comes entirely from the relative performance of the two securities.

In practice, perfect neutrality is hard to achieve. The two stocks will have slightly different betas, sector exposures, and factor loadings. But the residual market exposure is far smaller than for a directional strategy, which is why institutional investors value pairs trading as a source of uncorrelated returns.

How to Find Pairs

Finding good pairs is the hardest part of the strategy. You need pairs that are cointegrated - not just correlated - and where the cointegrating relationship is economically sensible and likely to persist. There are three main approaches.

Sector-Based Screening

Start with stocks in the same industry or sector. Companies that face the same cost inputs, compete for the same customers, and are subject to the same regulatory environment are natural candidates. Oil majors (Shell and BP), beverage companies (Coca-Cola and Pepsi), and banks (Barclays and Lloyds) are classic examples. The economic logic is clear: shared fundamental drivers create a long-run equilibrium between their prices.

Sector screening reduces the search space dramatically. Instead of testing every possible pair in a 500-stock universe (124,750 pairs), you might test pairs within 10 sectors of 50 stocks each (12,250 pairs) - a 90% reduction that also improves the quality of results since same-sector pairs are more likely to be genuinely cointegrated.

Cointegration Testing

The formal test is the Engle-Granger two-step method. For each candidate pair, run an OLS regression of one price series on the other, then test the residuals for stationarity using the augmented Dickey-Fuller (ADF) test. If the residuals are stationary (p-value below 0.05), the pair is cointegrated.

When testing many pairs, you must correct for multiple comparisons. Testing 1,000 pairs at the 5% level will produce roughly 50 false positives even if no true cointegration exists. Apply a Bonferroni or Holm-Bonferroni correction, or control the false discovery rate (FDR) using the Benjamini-Hochberg procedure.

Distance Method

An older, simpler approach that doesn't require cointegration testing. Normalise each stock's price to start at 1, then measure the sum of squared differences between each pair of normalised price series over a formation period. Pairs with the smallest distance (most similar price paths) are selected. You then trade these pairs over a subsequent trading period when their normalised prices diverge.

The distance method is easier to implement but less statistically grounded than cointegration testing. It works as a screening tool but shouldn't replace formal stationarity tests on the spread.

Building a Pairs Trading Strategy Step by Step

Here's the systematic process that quant teams follow when constructing a pairs trading strategy.

Step 1: Screen the Universe

Define your tradeable universe. For UK equities, this might be the FTSE 350. For US equities, the S&P 500 or Russell 1000. Filter for adequate liquidity (minimum average daily volume) and shortability (the stock must be available for borrowing). Group stocks by sector or industry using a classification scheme like GICS or ICB.

Step 2: Test for Cointegration

For each candidate pair within a sector, run the Engle-Granger cointegration test on a training window (typically 1 - 3 years of daily closing prices). Reject pairs where the p-value exceeds 0.05 after multiple testing correction. From the surviving pairs, further filter by economic sensibility - does it make fundamental sense that these two companies share a long-run equilibrium?

Step 3: Estimate the Hedge Ratio

The hedge ratio ( \beta ) from the cointegrating regression tells you the position sizing for each leg. If ( \beta = 1.2 ), you hold £1 of stock Y for every £1.20 of stock X (in notional terms). Some teams use a rolling or expanding OLS estimate to let the hedge ratio adapt. Others use the Kalman filter, which updates the hedge ratio as each new data point arrives and provides a natural measure of estimation uncertainty.

Step 4: Calculate the Z-Score of the Spread

Compute the spread ( Z_t = Y_t - \beta X_t ), then standardise it:

[ z_t = \frac{Z_t - \bar{Z}}{\sigma_Z} ]

where ( \bar{Z} ) and ( \sigma_Z ) are the rolling mean and standard deviation of the spread over a lookback window. The z-score tells you how many standard deviations the current spread is from its recent average. A z-score of +2 means the spread is two standard deviations above its mean - stock Y is "too expensive" relative to stock X.

Step 5: Define Entry and Exit Rules

Standard thresholds for a pairs trading strategy:

Signal	Z-Score Threshold	Action
Long entry	z < -2.0	Buy Y, short X
Short entry	z > +2.0	Short Y, buy X
Exit (long)	z > -0.5	Close both legs
Exit (short)	z < +0.5	Close both legs
Stop loss	\|z\| > 4.0	Close both legs

These thresholds are starting points. The optimal values depend on the specific pair's spread dynamics - its half-life, volatility, and transaction costs. You can optimise them on a training set, but be careful of overfitting. A simple, slightly suboptimal rule that's stable out of sample is better than a finely tuned rule that breaks on new data.

Full Python Implementation

Here's a complete pairs trading system in Python, from pair selection through signal generation and backtesting. It uses pandas, numpy, and statsmodels - the standard toolkit for quantitative analysis in Python.

Pair Selection and Cointegration Screening

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint, adfuller
from itertools import combinations


def screen_pairs(
    prices: pd.DataFrame,
    significance: float = 0.05,
) -> list[dict]:
    """
    Screen all pairs in a price DataFrame for cointegration.

    Parameters
    ----------
    prices : pd.DataFrame
        DataFrame where each column is a stock's daily closing price.
    significance : float
        P-value threshold after which pairs are rejected.

    Returns
    -------
    List of dicts with pair names, test statistics, and hedge ratios.
    """
    tickers = prices.columns.tolist()
    n_pairs = len(tickers) * (len(tickers) - 1) // 2
    bonferroni_sig = significance / n_pairs

    results = []

    for ticker_y, ticker_x in combinations(tickers, 2):
        y = prices[ticker_y].dropna()
        x = prices[ticker_x].dropna()
        common_idx = y.index.intersection(x.index)
        y, x = y.loc[common_idx], x.loc[common_idx]

        if len(y) < 250:
            continue

        score, pvalue, _ = coint(y, x)

        if pvalue < bonferroni_sig:
            x_const = sm.add_constant(x)
            model = sm.OLS(y, x_const).fit()
            hedge_ratio = model.params.iloc[1]
            spread = y - hedge_ratio * x
            hl = _half_life(spread)

            results.append({
                "ticker_y": ticker_y,
                "ticker_x": ticker_x,
                "coint_pvalue": pvalue,
                "coint_statistic": score,
                "hedge_ratio": hedge_ratio,
                "half_life": hl,
            })

    results.sort(key=lambda r: r["coint_pvalue"])
    return results


def _half_life(spread: pd.Series) -> float:
    """Estimate mean-reversion half-life from a spread series."""
    lagged = spread.shift(1)
    delta = spread - lagged
    lagged_const = sm.add_constant(lagged)

    mask = delta.notna() & lagged.notna()
    model = sm.OLS(delta[mask], lagged_const[mask]).fit()
    lam = model.params.iloc[1]

    if lam >= 0:
        return float("inf")
    return -np.log(2) / lam

Spread Calculation and Signal Generation

class PairsTradingStrategy:
    """
    Z-score-based pairs trading strategy with rolling spread
    statistics and cointegration monitoring.
    """

    def __init__(
        self,
        lookback: int = 60,
        entry_z: float = 2.0,
        exit_z: float = 0.5,
        stop_z: float = 4.0,
    ):
        self.lookback = lookback
        self.entry_z = entry_z
        self.exit_z = exit_z
        self.stop_z = stop_z

    def compute_spread(
        self,
        y: pd.Series,
        x: pd.Series,
        hedge_ratio: float,
    ) -> pd.Series:
        return y - hedge_ratio * x

    def compute_z_score(self, spread: pd.Series) -> pd.Series:
        roll_mean = spread.rolling(window=self.lookback).mean()
        roll_std = spread.rolling(window=self.lookback).std()
        return (spread - roll_mean) / roll_std

    def generate_positions(
        self, z_score: pd.Series
    ) -> pd.Series:
        position = pd.Series(0.0, index=z_score.index)

        for i in range(1, len(z_score)):
            z = z_score.iloc[i]
            prev = position.iloc[i - 1]

            if np.isnan(z):
                position.iloc[i] = 0.0
                continue

            if prev == 0:
                if z < -self.entry_z:
                    position.iloc[i] = 1.0
                elif z > self.entry_z:
                    position.iloc[i] = -1.0
                else:
                    position.iloc[i] = 0.0
            elif prev > 0:
                if z > -self.exit_z or z < -self.stop_z:
                    position.iloc[i] = 0.0
                else:
                    position.iloc[i] = prev
            elif prev < 0:
                if z < self.exit_z or z > self.stop_z:
                    position.iloc[i] = 0.0
                else:
                    position.iloc[i] = prev

        return position

Backtesting Engine with Transaction Costs

def backtest_pair(
    y: pd.Series,
    x: pd.Series,
    hedge_ratio: float,
    lookback: int = 60,
    entry_z: float = 2.0,
    exit_z: float = 0.5,
    stop_z: float = 4.0,
    cost_bps: float = 5.0,
) -> pd.DataFrame:
    """
    Backtest a pairs trading strategy on a single pair.

    Parameters
    ----------
    y, x : pd.Series
        Price series for the two legs.
    hedge_ratio : float
        Hedge ratio from cointegrating regression.
    lookback : int
        Rolling window for z-score calculation.
    entry_z, exit_z, stop_z : float
        Z-score thresholds.
    cost_bps : float
        One-way transaction cost in basis points.

    Returns
    -------
    pd.DataFrame with spread, z-score, positions, PnL, and
    cumulative PnL.
    """
    strategy = PairsTradingStrategy(
        lookback=lookback,
        entry_z=entry_z,
        exit_z=exit_z,
        stop_z=stop_z,
    )

    spread = strategy.compute_spread(y, x, hedge_ratio)
    z_score = strategy.compute_z_score(spread)
    position = strategy.generate_positions(z_score)

    # PnL from spread changes
    spread_returns = spread.diff()
    gross_pnl = position.shift(1) * spread_returns

    # Transaction costs on position changes
    notional = y + hedge_ratio * x
    turnover = position.diff().abs() * notional
    costs = turnover * cost_bps / 10_000
    net_pnl = gross_pnl - costs

    return pd.DataFrame({
        "spread": spread,
        "z_score": z_score,
        "position": position,
        "gross_pnl": gross_pnl,
        "costs": costs,
        "net_pnl": net_pnl,
        "cumulative_pnl": net_pnl.cumsum(),
    })


# --- Full example with simulated cointegrated stocks ---
np.random.seed(42)
n = 1000

common_factor = np.cumsum(np.random.normal(0, 1, n)) + 100
noise_y = np.random.normal(0, 1.0, n)
noise_x = np.random.normal(0, 0.8, n)

stock_y = common_factor + noise_y + 50
stock_x = 0.75 * common_factor + noise_x + 30

y = pd.Series(stock_y, name="Stock_Y")
x = pd.Series(stock_x, name="Stock_X")

# Verify cointegration
score, pvalue, _ = coint(y, x)
print(f"Cointegration p-value: {pvalue:.6f}")

# Estimate hedge ratio
x_const = sm.add_constant(x)
model = sm.OLS(y, x_const).fit()
hedge_ratio = model.params.iloc[1]
print(f"Hedge ratio: {hedge_ratio:.4f}")

# Run backtest
results = backtest_pair(
    y, x,
    hedge_ratio=hedge_ratio,
    lookback=60,
    entry_z=2.0,
    exit_z=0.5,
    stop_z=4.0,
    cost_bps=5.0,
)

# Performance summary
total_pnl = results["net_pnl"].sum()
total_costs = results["costs"].sum()
n_trades = (results["position"].diff().abs() > 0).sum()
trade_pnl = results.loc[results["net_pnl"] != 0, "net_pnl"]
win_rate = (trade_pnl > 0).mean()
sharpe = (
    results["net_pnl"].mean()
    / results["net_pnl"].std()
    * np.sqrt(252)
)

print(f"\nBacktest Results")
print(f"{'='*40}")
print(f"Total PnL (net):   {total_pnl:.2f}")
print(f"Total costs:       {total_costs:.2f}")
print(f"Number of trades:  {n_trades}")
print(f"Win rate:          {win_rate:.1%}")
print(f"Annualised Sharpe: {sharpe:.2f}")

This implementation is deliberately straightforward. A production system would add rolling hedge ratio re-estimation, dynamic lookback windows, and real-time cointegration monitoring that reduces exposure when the ADF test statistic weakens. But the core logic - compute spread, normalise to z-score, trade the extremes - is the same at every scale.

Risk Management for Pairs Trading

Pairs trading looks low-risk on paper because it's market-neutral. In practice, the strategy has specific risks that can produce large losses if not managed properly.

Spread Divergence Risk

The biggest danger is that the spread doesn't revert. You enter a trade expecting convergence, but the spread continues to widen - the underperformer keeps falling and the outperformer keeps rising. This can happen because the fundamental relationship between the two companies has changed (a new product launch, a regulatory shift, management changes) or because the market is pricing in information that your model hasn't captured.

The defence is a strict stop loss. If you enter at a z-score of 2 and set a stop at 4, you're capping the maximum loss per trade at roughly two standard deviations of the spread beyond your entry. Without a stop, a single pair breakdown can wipe out months of small gains.

Regime Change

Cointegrating relationships are not permanent. A pair that was reliably cointegrated for five years can break down permanently because of a structural change in one or both companies. Oil companies might decouple if one pivots to renewables. Banks might diverge after a merger. Tech companies might separate after a product cycle shifts.

Monitor the rolling ADF test statistic on the spread. If the p-value rises above 0.10 - 0.15, the cointegration evidence is weakening. Reduce position size or exit the pair entirely. Re-test periodically (monthly or quarterly) and be willing to drop pairs that no longer pass.

Position Sizing

Size positions based on spread volatility, not individual stock volatility. If stock Y has daily volatility of 2% and stock X has daily volatility of 1.8%, but the spread's daily volatility is only 0.5%, the relevant risk measure is the 0.5% figure. A common approach: size so that a 2-standard-deviation adverse spread move represents 1 - 2% of total portfolio equity.

Also consider the concentration risk from having many pairs in the same sector. If you run 10 pairs, all in UK banks, a sector-wide shock will hit every pair simultaneously. Diversify across sectors and geographies where possible.

Stop Losses

There are two common stop-loss approaches for pairs trading:

Z-score stop. Exit if the z-score exceeds a fixed threshold (typically 3.5 - 4.0). This is natural and consistent with the entry logic.

Time stop. Exit if the trade hasn't converged within a set number of days (for example, twice the estimated half-life of the spread). If reversion should take 10 days on average and it hasn't happened after 20 days, the pair may have broken down.

Using both together provides a belt-and-braces approach. Exit on whichever stop triggers first.

Real-World Pairs Trading Examples

Coca-Cola and Pepsi

The most cited pairs trading example in textbook after textbook. Both companies dominate the global beverage market, face nearly identical input costs (sugar, aluminium, logistics), and compete for the same shelf space. Their stock prices individually follow random walks, but the spread between them has historically been mean-reverting with a half-life in the range of 15 - 30 trading days. When one stock gets "too cheap" relative to the other - often after an earnings miss or a short-term news event - the gap tends to close within a few weeks.

This pair illustrates both the appeal and the limits of the strategy. The spread does revert, but the magnitude of typical dislocations has shrunk over the years as more traders have targeted this exact pair. The average profit per trade is small, so transaction costs and execution quality matter enormously.

Gold Miners and Gold ETFs

Gold mining companies (e.g. Barrick Gold, Newmont) derive revenue from selling gold, so their share prices are tied to the gold spot price. A cointegrating relationship often exists between individual miners and a gold ETF like GLD or the gold spot price itself. The spread captures the market's assessment of mining costs, operational efficiency, and management quality - factors that fluctuate but tend to mean-revert.

This pair is interesting because the hedge ratio isn't 1:1. Mining companies have operational gearing (fixed costs amplify the effect of gold price changes), so the hedge ratio is typically greater than 1. Getting the hedge ratio right is critical; a small error leads to residual gold-price exposure that overwhelms the pairs signal.

Cross-Listed Stocks

Companies listed on multiple exchanges provide near-arbitrage opportunities. A stock trading on both the NYSE and the LSE should, after adjusting for the exchange rate, trade at the same price. Deviations create a spread that reverts as arbitrageurs act. This is one of the cleanest forms of pairs trading because the cointegrating relationship is enforced by a no-arbitrage condition rather than a statistical tendency.

In practice, the deviations are small (a few basis points for liquid names) and require low-latency infrastructure to capture. But for less liquid cross-listings, the deviations can be larger and slower to correct, creating opportunities for strategies with longer holding periods.

Common Pitfalls

In-Sample Overfitting

The most common mistake in pairs trading research. You test 10,000 pairs, find 200 that are "cointegrated", optimise the z-score thresholds on the same data, and produce a beautiful backtest. Then the strategy fails in live trading because most of those "cointegrated" pairs were statistical coincidences and the optimised parameters don't generalise.

The fix: always split data into formation (training) and trading (test) periods. Find pairs and estimate parameters on the first half, then trade on the second half without any re-optimisation. If the strategy doesn't work out of sample, it won't work live.

Ignoring Transaction Costs

Pairs trading generates frequent trades with small average profits. A strategy that returns 0.05% per trade before costs might return -0.02% after costs - turning a winner into a loser. You need to model bid-ask spreads, commissions, borrowing costs for the short leg, and market impact for larger positions.

Borrowing costs deserve special attention. Shorting a stock requires borrowing it, and the borrow fee can be anything from 0.5% per annum for a large-cap FTSE 100 stock to 10%+ for a small-cap with limited lendable supply. A pair where the short leg has high borrow costs needs a correspondingly larger spread dislocation to be worth trading.

Pair Breakdown

Even well-researched pairs can break down. A classic example: two energy companies are cointegrated for years, then one announces a major acquisition that transforms its business model. The spread widens permanently, and a trader who keeps adding to the position (expecting reversion) suffers escalating losses.

The solution is disciplined risk management (stop losses, as discussed above) combined with fundamental awareness. If a material corporate event occurs that could plausibly alter the long-run relationship, exit the pair and re-evaluate from scratch rather than assuming the statistical relationship will hold through the disruption.

Ignoring Execution Risks

Pairs trades require simultaneous execution of two legs. If you fill the long leg but can't immediately fill the short leg (because the stock is hard to borrow or the order book is thin), you're temporarily running an unhedged directional position. In volatile markets, this "leg risk" can produce unexpected losses.

Use limit orders or algorithmic execution to manage leg risk. Some traders place both orders as a package using a broker's pairs execution algorithm, which attempts to fill both legs simultaneously.

Advanced Topics

Dynamic Hedge Ratios with the Kalman Filter

A static hedge ratio estimated from a single OLS regression can go stale as the relationship between the two stocks evolves. The Kalman filter provides a way to update the hedge ratio continuously, treating it as a hidden state variable that changes over time.

import numpy as np
import pandas as pd


def kalman_hedge_ratio(
    y: pd.Series,
    x: pd.Series,
    delta: float = 1e-4,
    ve: float = 1e-3,
) -> pd.DataFrame:
    """
    Estimate a time-varying hedge ratio using the Kalman filter.

    Parameters
    ----------
    y, x : pd.Series
        Price series for the two assets.
    delta : float
        Transition covariance scaling (controls how fast
        the hedge ratio can change).
    ve : float
        Observation noise variance.

    Returns
    -------
    pd.DataFrame with columns 'hedge_ratio' and 'spread'.
    """
    n = len(y)
    hedge_ratios = np.zeros(n)
    spread = np.zeros(n)

    # State: [intercept, hedge_ratio]
    theta = np.zeros(2)
    P = np.eye(2)
    R = np.array([[ve]])
    Q = delta * np.eye(2)

    for t in range(n):
        F = np.array([[1.0, x.iloc[t]]])

        # Prediction
        theta_pred = theta
        P_pred = P + Q

        # Update
        y_hat = F @ theta_pred
        e = y.iloc[t] - y_hat[0]
        S = F @ P_pred @ F.T + R
        K = P_pred @ F.T / S[0, 0]

        theta = theta_pred + K.flatten() * e
        P = P_pred - K @ F @ P_pred

        hedge_ratios[t] = theta[1]
        spread[t] = e

    return pd.DataFrame({
        "hedge_ratio": hedge_ratios,
        "spread": spread,
    }, index=y.index)


# Example usage
np.random.seed(42)
n = 500
common = np.cumsum(np.random.normal(0, 1, n)) + 100
y = pd.Series(common + np.random.normal(0, 1.0, n) + 20)
x = pd.Series(0.8 * common + np.random.normal(0, 0.8, n) + 10)

kf_result = kalman_hedge_ratio(y, x)
print(f"Final hedge ratio: {kf_result['hedge_ratio'].iloc[-1]:.4f}")
print(f"Hedge ratio range: [{kf_result['hedge_ratio'].min():.4f}, "
      f"{kf_result['hedge_ratio'].max():.4f}]")

The Kalman filter is especially useful for pairs where the fundamental relationship shifts gradually - for example, when one company's revenue mix changes over time. The filter adapts the hedge ratio smoothly without the abrupt jumps you get from rolling OLS windows.

Rolling Cointegration Monitoring

Rather than testing cointegration once and assuming it holds forever, production systems test it continuously. Here's a function that tracks cointegration strength over time:

def rolling_cointegration(
    y: pd.Series,
    x: pd.Series,
    window: int = 252,
    step: int = 21,
) -> pd.DataFrame:
    """
    Track cointegration strength over rolling windows.

    Parameters
    ----------
    y, x : pd.Series
        Price series for the pair.
    window : int
        Size of rolling window in trading days.
    step : int
        Step size between tests (default: monthly).

    Returns
    -------
    pd.DataFrame with test dates, p-values, and hedge ratios.
    """
    records = []

    for end in range(window, len(y), step):
        start = end - window
        y_win = y.iloc[start:end]
        x_win = x.iloc[start:end]

        score, pvalue, _ = coint(y_win, x_win)

        x_const = sm.add_constant(x_win)
        model = sm.OLS(y_win, x_const).fit()

        records.append({
            "date": y.index[end] if hasattr(y, "index") else end,
            "p_value": pvalue,
            "coint_stat": score,
            "hedge_ratio": model.params.iloc[1],
        })

    return pd.DataFrame(records)

When the rolling p-value rises above 0.10, reduce position size. When it rises above 0.20, exit the pair. This adaptive approach prevents you from staying in a pair whose cointegrating relationship has deteriorated.

Pairs Trading Strategy Checklist

Before going live with a pairs trading strategy, work through this checklist:

Economic rationale. Can you explain why these two securities should maintain a long-run equilibrium? If the answer is purely statistical, the relationship is more likely to be spurious.
Cointegration evidence. Does the pair pass the Engle-Granger test at the 5% level after multiple-testing correction? Does the Johansen test agree?
Half-life. Is the half-life of the spread between 5 and 60 trading days? Shorter is better for active trading. Longer than 60 days may not be practical.
Out-of-sample validation. Does the strategy produce positive returns on data the model hasn't seen?
Transaction costs. Is the average profit per trade large enough to cover bid-ask spreads, commissions, and borrow costs?
Risk controls. Are stop losses and position sizing rules defined and implemented?
Monitoring. Do you have a system to track rolling cointegration strength and alert you when it weakens?

This checklist won't guarantee success, but skipping any item significantly raises the odds of failure.

Frequently Asked Questions

What is a pairs trading example in simple terms?

Suppose Tesco shares are at £3.00 and Sainsbury's shares are at £2.50. Over the past two years, the ratio between them has averaged about 1.20 and has always reverted to that level after temporary deviations. Today the ratio is 1.35 - Tesco looks expensive relative to Sainsbury's. A pairs trader would short Tesco and buy Sainsbury's, betting that the ratio falls back toward 1.20. If Tesco drops to £2.90 and Sainsbury's rises to £2.55, the ratio returns to about 1.14 and the trader profits from both legs. The key point is that it doesn't matter whether the overall market goes up or down - what matters is the relative movement between the two stocks.

How do you find cointegrated pairs for pairs trading?

Start by grouping stocks within the same sector or industry - companies that share the same economic drivers are the best candidates. Then, for each pair, run the Engle-Granger cointegration test using a training window of 1 - 3 years of daily prices. Reject pairs where the p-value exceeds 0.05 after adjusting for multiple comparisons (since you're testing many pairs). From the remaining pairs, check the half-life of the spread - anything between 5 and 60 trading days is practical. Finally, apply a sanity check: does the pair make economic sense? A statistical test alone isn't enough. You need a fundamental reason to believe the relationship will persist.

Is pairs trading still profitable in 2026?

Yes, but it's harder than it was 20 years ago. The classic textbook pairs (Coca-Cola/Pepsi, Shell/BP) have been traded so heavily that the average spread dislocation has shrunk and the profit per trade is slim. Profitability in 2026 comes from three sources: finding less obvious pairs that aren't widely traded, using better statistical methods (Kalman filter hedge ratios, rolling cointegration monitoring), and executing more efficiently to minimise transaction costs. Pairs trading also works well in combination with other quantitative strategies as one component of a diversified systematic portfolio.

What is the difference between pairs trading and statistical arbitrage?

Pairs trading is the simplest form of statistical arbitrage. It involves exactly two securities - you go long one and short the other based on their cointegrating relationship. Statistical arbitrage (stat arb) is the broader category that includes multi-leg trades, basket strategies, and portfolio-level approaches. A stat arb fund might simultaneously trade hundreds of pairs or construct optimised long-short portfolios across an entire stock universe, using factor models and machine learning alongside cointegration. Pairs trading is where most people start; stat arb is where they end up after scaling the approach.

What are the main risks of pairs trading?

The five key risks are: (1) spread divergence - the spread keeps widening instead of reverting, causing losses on both legs; (2) regime change - the cointegrating relationship breaks down permanently because of a structural change in one company; (3) execution risk - you fill one leg but not the other, leaving an unhedged position; (4) crowding - too many traders target the same pair, compressing returns and amplifying losses during unwinds; and (5) borrowing risk - the short leg becomes expensive or impossible to borrow, forcing a premature exit. Strict stop losses, diversification across multiple pairs and sectors, and continuous cointegration monitoring are the standard defences.

What Python libraries do I need for pairs trading?

The core stack is pandas for data handling, numpy for numerical computation, and statsmodels for the cointegration tests (coint for Engle-Granger, coint_johansen for Johansen) and OLS regression. For data retrieval, yfinance is the quickest way to pull historical stock prices. For visualisation, matplotlib or plotly will let you plot spreads, z-scores, and equity curves. If you move to dynamic hedge ratios, pykalman or a manual Kalman filter implementation (as shown earlier) handles the state estimation. All of these libraries are open-source, well-maintained, and widely used in quantitative finance in 2026.

Want to go deeper on Pairs Trading: Complete Strategy Guide with Python 2026?

This article covers the essentials, but there's a lot more to learn. Inside Quantt, you'll find hands-on coding exercises, interactive quizzes, and structured lessons that take you from fundamentals to production-ready skills — across 50+ courses in technology, finance, and mathematics.

Free to get started · No credit card required

Keep Reading

Finance