Consensus Algorithms (Raft, Paxos)

Nov 02, 2025

Understanding Consensus Algorithms: How Raft and Paxos Work

Every time you book an Uber, update a Google Doc with teammates, or post a message in Slack that instantly appears for everyone, consensus algorithms are quietly ensuring all servers agree on the exact order of events. Without these algorithms, distributed systems would be chaos; servers would disagree about which ride request came first, whose edit won, or what messages were sent. Consensus algorithms like Raft and Paxos are the invisible foundation that makes modern distributed applications reliable and consistent.

What Are Consensus Algorithms?

Consensus algorithms solve a deceptively simple problem: getting multiple unreliable computers to agree on a single sequence of decisions, even when some servers crash or networks fail. This agreement is called “quorum consensus,” and it requires maintaining at least 2f+1 replicas to tolerate f failures.

Consensus Algorithms (Raft, Paxos) - Basic Concept

Here’s why this math matters: with 3 servers, you need 2 to agree (tolerating 1 failure). With 5 servers, you need 3 to agree (tolerating 2 failures). The magic is that any two majorities must overlap in at least one server that “remembers” prior decisions, preventing conflicting choices from being committed.

Consensus systems prioritize two critical properties. Safety ensures the system never commits conflicting decisions—this is absolute and never violated. Liveness ensures the system eventually makes progress when a majority is available—this can be temporarily suspended during network partitions.

Both Raft and Multi-Paxos maintain a replicated, ordered log across servers, but they differ in approach. Multi-Paxos evolved as an optimization of basic Paxos, where a stable leader handles proposals after an initial election. However, the original literature leaves many implementation details unspecified, making it notoriously difficult to implement correctly.

Raft explicitly breaks the problem into three clear components: leader election using randomized timeouts, log replication where the leader manages indexed entries, and membership changes. When followers don’t receive heartbeats within 150-300 milliseconds (in wide-area networks), they start a new election. This clarity is why systems like etcd, Consul, and CockroachDB chose Raft over Paxos.

Real-World Implementation

Google Spanner uses Paxos groups with 5 replicas spread across 3+ regions, accepting 50-200 milliseconds of commit latency for exceptional durability. Each Paxos group manages a shard of data, and Spanner runs thousands of these groups to scale horizontally rather than trying to scale a single consensus group.

The performance characteristics are predictable: a 3-node cluster in a single availability zone with NVMe storage achieves 2-6 milliseconds p50 latency. This breaks down to roughly 0.2-2ms for the leader’s fsync (durably writing to disk), 0.5-1ms for network transmission, and 0.2-2ms for follower fsync, plus protocol overhead.

Cross-region deployments face stark trade-offs. Multi-zone configurations within one region add only 1-2ms round-trip time while protecting against zone failures—an excellent balance for most applications. Cross-region setups sacrifice latency (70-100ms coast-to-coast, 150-250ms transoceanic) but provide resilience against entire region outages.

Production systems scale throughput by sharding data across independent consensus groups, each with its own leader. Within a group, batching small entries amortizes fsync costs, but batches must complete within 10-50ms to keep tail latencies acceptable.

Key Takeaways

Consensus algorithms enable distributed systems to maintain consistency despite failures. The choice between 3 and 5 replicas directly impacts both availability (how many failures you tolerate) and performance (how many acknowledgments you wait for).

The fundamental trade-off is geographic: single-region deployments offer low latency but regional vulnerability, while multi-region setups provide disaster resilience at the cost of write latency. Most systems start with multi-zone single-region deployments for the best balance.

Consensus Algorithms (Raft, Paxos) - Complete System

Understanding Consensus Algorithms (Raft, Paxos) is foundational for building scalable systems. Learn more in-depth about Consensus Algorithms (Raft, Paxos) on PrepLoop.io, with 3 detailed cards covering advanced patterns, edge cases, and production scenarios.

Learn more in-depth about Consensus Algorithms (Raft, Paxos) on PrepLoop.io

System Overflow - Master System Design

Discussion about this post

Ready for more?