The RAFT consensus algorithm and its implementation in Aeron Cluster

Contrary to popular belief, achieving consensus across a distributed set of stateful nodes can be fast.

At the recent Aeron MeetUp held in London in November 2024, Martin Thompson, co-creator of Aeron, gave an introduction to the RAFT consensus algorithm and its implementation in Aeron Cluster. This blog summarizes some of the key findings. If you want to access the recording and detailed explanation instead, you can access it here ↓

There are two major issues in trading system design which can be described as 1) the complexity of multi-threaded domain models and 2) the need for fault tolerance in high-throughput systems.

The problem with concurrency

Concurrency in software systems is notoriously difficult to manage. Many developers have attempted to scale applications by adding more threads to a domain model, only to encounter significant challenges. These include race conditions, deadlocks, and the complexity of debugging multi-threaded applications. The core issue with concurrency is that it introduces unpredictability and complexity. When multiple threads attempt to access and modify shared data simultaneously, it can lead to inconsistent states and hard-to-reproduce bugs. This is particularly problematic in high-throughput systems where performance and reliability are critical.

Aeron Cluster addresses these issues by using a single-threaded state machine model, which simplifies the design and ensures consistent behavior. This approach eliminates the need for complex concurrency control mechanisms, making the system more reliable and easier to maintain.

State machine replication and RAFT consensus

Active/Passive and Primary/Secondary set-ups have been around for a long time. However, state machine replication is another way of addressing common system challenges. State machine replication is a technique where multiple copies of a state machine run in parallel to ensure fault tolerance. If one copy fails, another can take over seamlessly in an automated fashion. The RAFT consensus algorithm plays a crucial role in this process by coordinating the state machines and ensuring they remain in sync.

RAFT [AKA Replicated And Fault Tolerant] was chosen for its simplicity and understandability. Unlike other consensus algorithms like Paxos, RAFT is designed to be more accessible to developers. However, implementing RAFT is not without its challenges. The original RAFT paper provides a high-level overview but lacks detailed specifications, requiring developers to fill in the gaps.

For a deeper dive into the RAFT consensus algorithm and its implementation in the Aeron Cluster,
watch the full recording of Martin Thompson’s talk

← The RAFT consensus explained by Martin Thompson

Challenges and innovations in RAFT implementation [Index, batching, performance]

One of the key challenges in implementing RAFT is ensuring high performance. Traditional consensus algorithms can be slow, making them unsuitable for high-throughput applications. The implementation of RAFT in Aeron Cluster addresses this by using techniques like batching, byte indexing, asynchronous operations, and pipelining to ensure high performance.

Batching involves grouping multiple messages together and processing them as a single unit. This reduces the overhead associated with handling each message individually and improves throughput. Asynchronous operations allow different parts of the system to work in parallel, further enhancing performance.

Aeron Cluster also uses byte indexing instead of message indexing. This approach simplifies the design and improves efficiency by treating each byte as a message. This innovation unlocks significant performance gains, enabling Aeron Cluster to handle millions of events per second with low latency.

Pipelining is another critical technique used in Aeron Cluster. By breaking down tasks into smaller stages and processing them in a pipeline, the system can achieve higher throughput and lower latency.

To bring these concepts to life, consider the example of the “append entries” operation in RAFT. In a traditional synchronous setup, appending entries involves multiple steps: writing to disk, waiting for acknowledgment, passing the data to the service, and then responding back. Each of these steps adds to the overall latency and increases service time. However, by using asynchronous operations and pipelining, Aeron Cluster can perform these steps in parallel, significantly reducing the service time and increasing throughput.

RAFT election process: canvassing/pre-votes and vetos

The election process in RAFT is critical for maintaining consistency and fault tolerance. When a leader fails, a new leader must be elected quickly to ensure the system remains operational. Aeron Cluster introduces a canvassing phase in the election process, where nodes gather opinions before initiating an election. This reduces the likelihood of failed elections and ensures a smoother (faster and more reliable) transition of leadership.

In addition to voting for a leader, nodes can also vote against a candidate or veto an election. This prevents scenarios where outdated nodes could disrupt the system by electing an unsuitable leader. The canvassing phase and veto mechanism enhance the robustness of the election process, ensuring the system remains stable even in the face of failures.

Snapshots and the pitfalls of dynamic membership

Snapshots are essential for maintaining the state of the system, ensuring fault tolerance and fast recovery. In Aeron Cluster, any node can take a snapshot, not just the leader. This avoids latency pauses and ensures the system can recover quickly from failures. The snapshot mechanism is inspired by the work on viewstamped replication and has been adapted to fit the needs of Aeron Cluster.

Dynamic membership, the ability to add or remove nodes from the cluster, is another complex feature. While it seems like a useful capability, it can introduce significant risks. Aeron Cluster has deprecated this feature in favor of more reliable methods, such as using standby nodes for replacement. This approach ensures the system remains stable and avoids the pitfalls associated with dynamic membership.

Dealing with modern networking challenges

Modern networking environments, such as Kubernetes, present unique challenges. When applications start, the networking infrastructure may not be fully established, leading to issues like unresolved hostnames and missing network interfaces. Aeron Cluster addresses these challenges by continuously resolving the network state and adapting to changes dynamically.

The discipline of building consensus systems

Bringing it back to our session title: building a consensus system is akin to sailing in open water—it requires discipline and adherence to best practices. Aeron Cluster leverages decades of research and practical experience to deliver a robust, high-performance solution for distributed systems. By addressing the challenges of concurrency, state machine replication, and modern networking, Aeron Cluster provides a reliable foundation for mission-critical applications.

For a deeper dive into the RAFT consensus algorithm and its implementation in the Aeron Cluster, watch the full recording of Martin Thompson’s talk at the Aeron MeetUp. Gain insights into the challenges and innovations that make Aeron Cluster a robust solution for high-throughput, fault-tolerant systems. Watch the Recording ↑

Martin Thompson Co-Creator
Aeron
LinkedIn profile

The RAFT consensus algorithm and its implementation in Aeron Cluster

The problem with concurrency

State machine replication and RAFT consensus

Challenges and innovations in RAFT implementation [Index, batching, performance]

RAFT election process: canvassing/pre-votes and vetos

Snapshots and the pitfalls of dynamic membership

Dealing with modern networking challenges

The discipline of building consensus systems

Martin Thompson Co-Creator
Aeron

Further reading

Aeron Cluster Performance Aeron Cluster Performance

Asynchronous Systems Asynchronous Systems

Aeron Cluster Standby Aeron Cluster Standby

The RAFT consensus algorithm and its implementation in Aeron Cluster

The problem with concurrency

State machine replication and RAFT consensus

Challenges and innovations in RAFT implementation [Index, batching, performance]

RAFT election process: canvassing/pre-votes and vetos

Snapshots and the pitfalls of dynamic membership

Dealing with modern networking challenges

The discipline of building consensus systems

Martin Thompson Co-CreatorAeron

Further reading

Aeron Cluster Performance Aeron Cluster Performance

Asynchronous Systems Asynchronous Systems

Aeron Cluster Standby Aeron Cluster Standby

Martin Thompson Co-Creator
Aeron