Networking11 min read·

Network Speeds and Latency in Financial Systems

Why latency matters in trading, how to measure it, where the bottlenecks are, and what firms do to minimise it — from co-location to kernel bypass.

Why Nanoseconds Matter

In most software, a few milliseconds of latency is invisible. In trading, it can be the difference between profit and loss. If your system receives a market data update 1 millisecond after a competitor and both systems want to trade on it, you lose.

This creates an arms race where trading firms invest heavily in reducing latency at every level — network, hardware, software, even physical location. Understanding where latency comes from helps you make informed decisions about where to optimise and when the effort is (or is not) worth it.


Where Latency Lives

Speed of Light

The absolute physical limit. Light travels about 200km in a fibre optic cable per millisecond. London to New York is roughly 5,500km — about 27ms one way through undersea cables. Nothing can make this faster except a shorter physical path.

RouteDistanceOne-Way Latency
London ↔ New York~5,500 km~27 ms
Chicago ↔ New York~1,200 km~6 ms
Same data centre~1 m~5 ns
Same rack~0.5 m~2 ns

This is why co-location exists: trading firms place their servers in the same data centre as the exchange. At the speed of light, being 1km closer saves 5 microseconds. Over millions of trades, that adds up.

Network Stack

Even within a data centre, the network stack adds latency:

Application code         ~100 ns - 10 us
System call overhead     ~1-5 us
Kernel network stack     ~5-15 us
Network interface card   ~1-5 us
Switch/router            ~1-5 us
Cable propagation        ~5 ns per metre

For a standard TCP request within a data centre, total round-trip latency is typically 50-200 microseconds. For low-latency trading, even this is too much, which is why firms invest in:

Kernel Bypass

The OS kernel network stack is generic — it handles every type of traffic equally. Kernel bypass techniques (DPDK, Solarflare OpenOnload, Mellanox VMA) let applications read network data directly from the network card, skipping the kernel entirely. This can cut latency from ~50us to ~5us.

FPGA Network Cards

Taking it further, some firms use FPGAs built into the network card itself to parse market data messages before they even reach the CPU. The hardware acceleration guide covers this in more detail.


Measuring Latency

You cannot improve what you do not measure. The key metrics:

Median latency (p50) — typical performance. Important but not sufficient.

Tail latency (p99, p99.9) — worst-case performance. This is where problems hide. A system with 100us median but 50ms p99 has occasional catastrophic slowdowns.

Jitter — variance in latency. High jitter means unpredictable performance, which is worse than consistently high latency in many trading strategies.

import numpy as np latencies = measure_latencies(n_samples=10000) print(f"Median (p50): {np.percentile(latencies, 50):.1f} us") print(f"p90: {np.percentile(latencies, 90):.1f} us") print(f"p99: {np.percentile(latencies, 99):.1f} us") print(f"p99.9: {np.percentile(latencies, 99.9):.1f} us") print(f"Max: {np.max(latencies):.1f} us") print(f"Std dev: {np.std(latencies):.1f} us")

Software Latency Optimisation

Before investing in exotic hardware, software optimisations often provide the biggest gains:

Memory Allocation

Dynamic memory allocation (malloc/new) is slow and unpredictable. Low-latency systems pre-allocate all memory at startup:

// Bad: allocating on the hot path void process_order(const Message& msg) { auto order = new Order(msg); // Heap allocation — unpredictable latency // ... delete order; } // Good: use a pre-allocated pool class OrderPool { std::array<Order, 10000> pool_; size_t next_ = 0; public: Order* acquire() { return &pool_[next_++]; } void release() { next_--; } };

Lock-Free Data Structures

Mutex locks cause threads to sleep and wake — adding microseconds of latency. Lock-free queues using atomic operations avoid this entirely.

Avoid System Calls

Every system call (file I/O, network I/O, memory mapping) involves a context switch between user space and kernel space. On the hot path, minimise or eliminate them.

CPU Pinning and Isolation

Dedicate specific CPU cores to latency-critical threads. Prevent the OS from scheduling other work on those cores:

# Isolate CPU cores 2 and 3 from the OS scheduler # (kernel boot parameter) isolcpus=2,3 # Pin a process to specific cores taskset -c 2 ./trading_engine

The Latency Hierarchy in Practice

For most financial applications (not HFT), the priorities are:

  1. Architecture — are you making unnecessary network calls? Can you cache data locally?
  2. Database queries — are your queries optimised with proper indexes?
  3. Serialisation — are you using efficient data formats? JSON is slower than binary formats.
  4. Connection management — are you reusing connections or opening new ones for each request?
  5. Code efficiency — are your algorithms appropriate? Using appropriate languages for hot paths?

Only HFT firms need to worry about kernel bypass, FPGA, and CPU cache optimisation. But understanding the full picture helps everyone make better design decisions. Even reducing API latency from 200ms to 20ms by adding a cache can transform the user experience of a trading application.

The difference between "fast enough" and "needs hardware optimisation" depends entirely on your use case. Understanding networking fundamentals helps you identify where the bottleneck actually is before investing in solutions.

Want to go deeper on Network Speeds and Latency in Financial Systems?

This article covers the essentials, but there's a lot more to learn. Inside Quantt, you'll find hands-on coding exercises, interactive quizzes, and structured lessons that take you from fundamentals to production-ready skills — across 50+ courses in technology, finance, and mathematics.

Free to get started · No credit card required