What Quant Developer Interviews Actually Test
Quant developer interviews differ from FAANG-style software engineering interviews in three ways. First, the bar is higher - top quant firms expect harder algorithmic problems solved cleaner and faster, plus depth on systems design that FAANG only tests at senior levels. Second, the interviews are heavily weighted toward low-latency systems thinking - cache effects, lock-free programming, the C++ memory model - in a way most product engineering interviews aren't. Third, interviewers are typically working senior engineers, not recruiters, so the conversation goes deep from minute one.
This guide collects 25 worked examples from recent Hudson River Trading, Citadel Securities, Two Sigma, Jane Street, Jump Trading and DRW developer interviews. For broader context, see our quant developer career guide, C++ in quantitative finance, and hardware acceleration for quant.
Section 1: Systems Design (Questions 1-7)
1. Order book design
Design an in-memory order book supporting add(), cancel() and get_top_of_book() with O(log N) per operation.
Answer: Two sorted containers (e.g., std::map): bids descending by price, asks ascending. Each price level holds a doubly-linked list of orders (preserves time priority). Hash map from order_id to a tuple (side, price, list_node) gives O(1) cancellation lookup; the actual list erase is O(1); the level removal (if list becomes empty) is O(log N) via the sorted map. Top-of-book is O(1) via begin().
2. Real-time risk service
Design a service that consumes a real-time order flow feed, computes per-trader and per-strategy P&L and risk exposures, and alerts when limits are breached. Latency budget: 1ms per event.
Answer: Single-writer threading per shard; lock-free queue from feed reader to risk engine. Pre-allocated risk objects, no allocation on hot path. Limit checks via precomputed sorted set per shard. Alerts go to a separate slow-path thread to avoid blocking. For multi-shard aggregation, periodic snapshot with eventual consistency.
3. Market data normaliser
Design a service that consumes raw exchange feeds (e.g., NYSE, NASDAQ, BATS) and produces a normalised internal feed.
Approach: Per-exchange parser threads; common output format. Sequence numbering and gap detection per exchange. Deduplication if multiple subscribers. Snapshot+delta pattern - publish a periodic full snapshot for late joiners, then deltas. For ultra-low-latency, prefer UDP multicast with a separate TCP gap-fill channel.
4. Time series database
You need to store and query market data: trades, quotes, order book snapshots. Read-heavy workload with rare writes (data ingested once per day at end-of-day). What architecture?
Approach: Columnar storage (Parquet, kdb+, ClickHouse). Partition by date and symbol. Compress per column. For low-latency reads, keep recent data (last week) in memory; older data in cold storage with compressed format. Query layer should support time-range and symbol-range filters efficiently.
5. Distributed configuration
You have 100 trading processes that need to share configuration (limits, symbol lists, parameters). How do you distribute updates?
Approach: Centralised config service (Zookeeper, etcd, Consul) for source-of-truth. Local in-memory cache per process with subscription to changes. Versioned updates. For latency-critical paths, push updates eagerly; for non-critical, periodic refresh works.
6. Trade reconciliation
You have an internal order entry log and an exchange execution report. They disagree (one trade in your log isn't in the exchange's, or vice versa). How do you reconcile?
Approach: Sort by timestamp; match trades by quantity, price, symbol, side. Allow small timestamp tolerance. Three buckets: matched, internal-only (likely failed sends or never-acked), exchange-only (likely slow/lost ack). For each unmatched, surface to ops; mark as resolved when investigated.
7. Multi-exchange smart order router
A client wants to buy 1M shares of XYZ; price is the same across exchanges, but liquidity is fragmented. How do you split the order?
Approach: (1) Snapshot top-of-book from each exchange. (2) Send fragmented orders proportional to displayed liquidity, with timing offset to avoid signalling. (3) For larger orders, use VWAP-like algorithms. (4) Be aware of post-trade transparency requirements (some venues report immediately; others delayed). (5) Monitor fill rates per venue; adjust allocations adaptively.
Section 2: Low-Latency C++ (Questions 8-14)
8. False sharing
Two threads each increment their own counter. The counters happen to be in the same cache line. What goes wrong, and how do you fix it?
Answer: Cache line ping-pong. Each write invalidates the other thread's cached copy, forcing a memory fetch. Performance can be 10-100x slower than expected. Fix: pad the counters to separate cache lines (typically 64 bytes), or use alignas(64).
9. Memory ordering
What's the difference between memory_order_relaxed, memory_order_acquire, memory_order_release, and memory_order_seq_cst?
Answer: Relaxed: atomicity only, no ordering. Acquire: prevents subsequent reads/writes from being reordered before the load. Release: prevents prior reads/writes from being reordered after the store. Seq_cst: total global ordering across all atomic operations. Use the weakest sufficient. Acquire-release pairs are the standard for producer-consumer patterns.
10. Lock-free SPSC queue
Implement a single-producer single-consumer ring buffer.
template <typename T, size_t N> class SPSCQueue { alignas(64) std::atomic<size_t> head{0}; alignas(64) std::atomic<size_t> tail{0}; T buffer[N]; public: bool push(const T& item) { size_t h = head.load(std::memory_order_relaxed); size_t next = (h + 1) % N; if (next == tail.load(std::memory_order_acquire)) return false; buffer[h] = item; head.store(next, std::memory_order_release); return true; } };
Key points: head and tail on different cache lines (alignas 64); release-store on producer-side head; acquire-load on consumer-side head.
11. Cache-friendly data layout
Struct of arrays (SoA) vs Array of structs (AoS) - when do you choose each?
Answer: SoA when you frequently access only a subset of fields - the unused fields don't pollute the cache. Common in numerical workloads (e.g., loops over coordinates of points). AoS when you typically access all fields together (e.g., processing a single object end-to-end). For trading systems, AoS for individual order processing; SoA for batch analytics over fields.
12. Branch prediction
You have a loop with an if that's true less than 1% of the time. How to optimise?
Answer: Mark with [[unlikely]] (C++20) or __builtin_expect (GCC/Clang). Restructure to avoid the branch entirely if possible (e.g., arithmetic instead of conditional). Modern CPUs predict well, so always profile first - the optimisation may not be needed.
13. Object pool
Why use a custom memory pool instead of malloc on a hot path?
Answer: (1) Latency consistency: malloc has unpredictable tail latency from fragmentation, system calls, lock contention. (2) Cache locality: pool-allocated objects are co-located. (3) No global synchronisation: thread-local pool avoids cross-thread contention. The trade-off: complexity and reduced flexibility (can't easily resize).
14. Inlining
When does inlining hurt performance?
Answer: When inlining causes: (1) instruction cache pressure (inlined function is large; it's now duplicated at every call site). (2) Loss of branch prediction across function calls (some compilers can't predict through inlined branches as effectively). (3) Loss of code locality if the inlined code is rarely executed. The general rule: inline small hot functions; don't inline large or rare ones.
Section 3: Algorithms (Questions 15-20)
15. Top-K stream
Given a stream of integers, return the K largest seen on demand. O(log K) per insert; O(K log K) per read.
Answer: Min-heap of size K. On each new element, if larger than heap top, pop and push. On read, sort the heap (or use sorted output if needed sorted).
16. Stream median
Implement add(x) and median() in O(log n) and O(1) respectively. (Hudson River Trading classic.)
Answer: Two heaps - max-heap of lower half, min-heap of upper half. Keep them balanced. Median is at the top of one (odd total) or average of both tops (even total).
17. Concurrent map
Design a thread-safe hash map for read-heavy workloads.
Answer: RW lock at the bucket level. Or fully lock-free: open addressing with atomic CAS. For latency-critical reads, RCU (read-copy-update) - readers never block, writers create new versions and let readers continue with old ones until they leave the critical section.
18. Sliding window maximum
Given a stream and window size N, return the rolling maximum in O(n) amortised.
Answer: Monotonic deque. Each element is added and removed at most once. Pop elements from the back while smaller than current; push current; pop from front if outside window.
19. LRU cache with TTL
Implement a cache with capacity K and time-to-live per entry. Both put and get O(1).
Answer: Two data structures. (1) Doubly-linked list ordered by recency for LRU eviction. (2) Sorted structure (e.g., bucketed by expiry second) for TTL eviction. Hash map maps keys to (list_node, bucket_node) pairs.
20. Median of two sorted arrays
Given two sorted arrays of total length n, find the median in O(log n).
Answer: Binary search on the partition point in the smaller array; the partition point in the larger is determined. Carefully handle edge cases (empty arrays, partitions at boundaries). This is a notoriously hard "easy idea" problem - worth practising the implementation.
Section 4: Real Engineering Scenarios (Questions 21-25)
21. Debugging a latency spike
Your trading system has p99 latency of 50µs steady-state, but every few minutes you see a 5ms spike. How do you investigate?
Approach: (1) GC pauses (if using a GC'd language) - profile with GC logs. (2) System scheduler interference (kernel preemption) - pin threads to isolated cores, isolate cores via kernel boot params. (3) NUMA effects - check whether memory is on the right NUMA node. (4) Network jitter - check NIC queue drops, interrupt processing. (5) Page faults - mlock memory. (6) Other processes on the same machine - run perf top during a spike.
22. Memory leak in production
A trading process's RSS grows by 100MB per day. It's running for weeks; eventually it gets OOM-killed. How do you debug?
Approach: (1) Heap profiling - use jemalloc / heaptrack / valgrind. (2) Check for std::vector growth - vectors push_back-ing without ever clear()ing. (3) Check for stale references in containers (caches that never evict). (4) Check kernel-level allocations (file descriptors, memory mappings). (5) Take heap snapshots over time and diff.
23. Thread pinning
Why do trading systems pin specific threads to specific CPU cores?
Answer: (1) Cache locality - the thread's working set stays in the same L1/L2 cache. (2) Predictable latency - no preemption by other processes. (3) NUMA affinity - keep memory access local. (4) Avoid cache thrashing from migration. Pinning is typically combined with isolating those cores from the OS scheduler (isolcpus=...) and disabling things like irqbalance for the pinned cores.
24. Why is my code slow?
"My critical loop runs at 100M iterations per second on my dev machine but 30M on production. What could cause this?"
Approach: (1) CPU frequency differences (dev runs at 4 GHz, prod at 2.5 GHz turbo-throttled). (2) Different cache sizes. (3) Production has other workloads competing. (4) Different compiler flags or optimisation level. (5) Different memory layout (NUMA, hugepages). (6) Production data is different (cold cache, branch mispredictions on real distribution).
25. Pre-mortem
You're about to deploy a new market-making strategy. Walk me through what you'd check before going live.
Answer: (1) Backtest passes with realistic transaction costs. (2) Out-of-sample test on most recent unseen month. (3) Code review by another engineer. (4) Stress test - what happens at 10x normal flow? At 0 flow? With malformed messages? (5) Risk limits configured and tested. (6) Kill switch tested. (7) Phased rollout - start with very small position limits, scale up over days. (8) Monitoring dashboards in place. (9) Runbook for common issues. (10) Designated person on-call during initial trading sessions.
How to Use This Guide
For Hudson River Trading, Citadel Securities and Jump Trading, the C++ depth questions (Section 2) are the most common single source of failure. For Jane Street and Two Sigma, Section 1 (systems design) and Section 4 (real engineering scenarios) are weighted more heavily.
For broader prep:
- Quant interview questions hub
- Quant developer career guide
- C++ in quantitative finance
- Hardware acceleration for quant
- Networking fundamentals for developers
For firm-specific developer interview content:
Practise the questions Quant Developer Interview Questions: 25 Real Examples 2026 actually asks
Reading about the interview is one thing - sitting one is another. Quantt's interactive coding tests are modelled on the same problem types that show up in firms like Jane Street, Citadel, Hudson River and Optiver. Run real Python in the browser, get instant feedback, and benchmark yourself against the bar.
Free to start - no credit card required