A technical comparison for Capital Market distributed systems

For technical leaders and architects building high-performance, fault-tolerant distributed systems in capital markets, the choice of underlying technology is critical. These systems demand low and consistent latency, high throughput, resilience and robust auditability to meet strict regulatory and business requirements.

While both Apache Kafka and Aeron Sequencer leverage a distributed log as a foundational component, their architectural philosophies and performance characteristics make them suited for very different use cases.
This post will take a look at both technologies, highlighting their strengths and ideal use cases to help you make an informed decision for your next distributed system architecture.

The core architectural pattern: State Machine Replication

Before comparing Aeron and Kafka, it’s crucial to understand the architectural pattern they both enable (to different degrees): State Machine Replication (SMR). State Machine replication is fundamental to building the high-performance, always-on, fault-tolerant systems required in modern capital markets.

In a State Machine replication architecture, multiple nodes maintain an identical copy of a state machine (e.g., the business logic). They deterministically process the exact same sequence of inputs, in the exact same order, which guarantees they all produce the exact same outputs and end in the exact same state.

Why State Machine Replication is critical for capital markets:

  • Handling Contended State: SMRs excel at managing complex, contended data structures, like an exchange’s order book for instance. Logic and state can be co-located in-memory, eliminating the massive performance overhead of writing to an external database after every single transaction.
  • Simplified, Deterministic Logic:The State Machine replication model allows for a powerful separation of concerns. Developers can write single-threaded business logic that acts directly on in-memory data structures (avoiding complex object-relational mapping). This separates the core application logic from the underlying infrastructural challenges (like replication and consensus). The result is a system that is far simpler to reason about, test, and debug, which is a massive advantage in complex financial domains.
  • Fault Tolerance & Recovery: If a node fails, it can be restored by simply replaying the log of inputs. For 24/7 systems, snapshots (or checkpoints) can be used to dramatically shorten this recovery time, avoiding a full replay from the beginning.
  • Consistency & Auditability: By processing a single, totally ordered stream of events, all replicas are guaranteed to be consistent. This deterministic logic is simpler to debug, and the log itself becomes a perfect, redundant, and auditable record, which is essential for meeting regulatory requirements like MiFID II.

The “streaming technology” in this context is the distributed log, it’s the core component that provides this sequenced, redundant, and persistent stream of inputs. The performance and design of this log (e.g., Kafka or Aeron Sequencer) directly dictates the performance, latency, and reliability of the entire system.

Kafka: a strong foundation for stream processing

Apache Kafka, originally developed by LinkedIn, is a distributed stream-processing platform known for its high throughput and scalability. Its architecture is built around topics, which are divided into multiple partitions. These partitions are ordered, immutable sequences of records distributed across brokers to handle high data volumes and provide fault tolerance.

Key characteristics of Kafka:

  • Partition-based Ordering: Kafka provides a strong ordering guarantee, but only within a single partition. This is a crucial distinction for SMRs, which require a totally ordered stream of inputs for all state machines. This partition-level ordering can limit the scalability of SMRs, as a single state machine must process messages from a single partition, capping its maximum throughput.
  • Throughput & Latency: While Kafka is known for its high overall throughput, it achieves this by sharding data across partitions. A single partition’s throughput is often around 10MB/sec, which equates to roughly 35,000 messages per second (for 288-byte messages). Its latency is generally in the range of tens to hundreds of milliseconds.
  • High Availability: Kafka uses consumer groups to manage high availability and scale message consumption. If a consumer fails, partitions are automatically reassigned to other consumers. However, achieving an active/active producer setup for deterministic processing is complex and can reduce capacity.
  • Durability and Reliability: Kafka ensures durability by replicating messages across multiple brokers and uses acknowledgments (acks) to confirm message delivery. Log compaction also helps maintain the latest state of a record.

 

Ideal Use Cases

Kafka is a robust solution for use cases like real-time analytics, log aggregation, and general stream processing where global ordering across all messages is not a hard requirement and where latency in the milliseconds range is acceptable.

 

Aeron Sequencer: engineered for low-latency finance

Aeron is an open-source, high-performance messaging and clustering technology. Aeron includes several key components:

  • Aeron Transport is a layer 4 message transport for reliable and efficient message delivery via unicast, multicast, and IPC.
  • Aeron Archive provides persistent storage and replay capabilities for streams sent over Aeron Transport.
  • Aeron Cluster is a fault-tolerant, RAFT-based framework that leverages the Transport and Archive to create a distributed log, enabling State Machine Replication.

Aeron Sequencer is built directly on top of Aeron Cluster. It provides a single, strongly consistent, and replicated log, optimized specifically for the demands of capital markets. It allows for multiple, independently-deployed SMR applications to connect to and use this log, ensuring a totally ordered sequence of events for all participants.

Key characteristics of Aeron Sequencer:

  • Total Ordering and Throughput: Aeron Sequencer provides a totally ordered log that can process millions of messages per second. This design is perfectly suited for SMR-based architectures, allowing them to handle the highest-volume workloads. For example, it can sequence millions of messages/sec with capacity to spare, enough to power the world’s largest exchanges.
  • High Availability: Aeron Sequencer offers built-in primitives for running applications in both active/active and active/passive modes. An active/active configuration allows redundant services to compete to write messages to the log, with the fastest one winning. This design effectively mitigates the performance impact of issues like garbage collection (GC) pauses or other system-level events, ensuring minimal to no downtime.
  • Performance & Latency: Designed from the ground up for financial workloads, Aeron Sequencer delivers consistent sub-100 microsecond (µs) RTT at p99.9. This is two orders of magnitude faster than typical Kafka latencies. Its use of UDP transport is particularly effective for bursty financial workloads, avoiding the flow control limitations of TCP.
  • State Machine Support: As infrastructure specifically designed for SMRs, Aeron Sequencer provides essential features not natively offered by Kafka, such as checkpoints and snapshots, which dramatically reduce recovery time. It also offers a framework for composing and orchestrating state machines, enabling developers to co-locate logic and reduce network hops.
  • Safety and Auditability: Aeron is designed to be safe-by-default. The Sequencer log is redundantly stored across nodes and managed by a formally proven RAFT consensus algorithm. The message flow between the Sequencer and state machines is guaranteed to be exactly once. This philosophy contrasts with systems that often present a trade-off between performance and safety; for instance, Kafka’s highest performance is achieved with default safety settings turned off, which must be reconfigured for full safety. Aeron’s approach ensures critical systems are durable and auditable out of the box, meeting strict regulatory requirements (SEC, MiFID II).
  • Adaptability: Aeron’s design is resilient to modern infrastructure realities, such as hardware changes in cloud environments. It supports rolling upgrades for both applications and the underlying infrastructure without downtime, and canary SMRs allow for testing new software versions with real-time data.

Conclusion: A comparison of philosophies

Ultimately, the choice between Aeron Sequencer and Kafka hinges on the specific problem you are trying to solve.

  • Kafka is a widely adopted solution for general stream processing, where high throughput and scalability are needed, and where the latency profile is acceptable. Its partition-based architecture makes it a good fit for building data pipelines, event sourcing, and analytics platforms where data can be parallelized and processed at a scale that’s appropriate for many business applications.
  • Aeron Sequencer is a specialized, performance-oriented solution for mission-critical capital market applications. Its unique design, centered around a single, totally ordered log and built-in State Machine Replication primitives, makes it the superior choice for systems like exchanges and trading algorithms where very low and consistent latency, deterministic processing, and uncompromising availability are non-negotiable requirements.

For organizations building the next generation of financial trading systems, understanding this technical distinction is key to choosing an architecture that provides a real competitive advantage.

 

Architecture Checklist – choosing the right tool for the job

FeatureAeron SequencerKafka
OrderingA single, totally ordered log for all inputs.Ordering is guaranteed only within a single partition.
LatencyConsistent sub-100µs at p99.9.10s-100s of milliseconds.
ThroughputMillions of messages/sec on a single log.Up to ~35k messages/sec per partition.
High AvailabilityBuilt-in primitives for active/active or active/passive SMRs.Requires external components for active/active producers.
State Machine SupportDesigned for SMRs with snapshots and checkpoints.Requires custom implementation of features like snapshots.
Ideal Use CaseHigh-performance, fault-tolerant systems in capital markets.Real-time analytics, log aggregation, and stream processing.

 

James Watson

James Watson
Head of Platform Engineering at Aeron