CAP Theorem
Definition: The CAP Theorem (Brewer’s Theorem) states that in a distributed data system, it is impossible to simultaneously guarantee all three of the following properties: Consistency, Availability, and Partition Tolerance. In the presence of a network partition, a system can provide at most two of these guarantees.
The CAP theorem states that a distributed system can simultaneously guarantee at most two of the following three properties: Consistency (all nodes see the same data at the same time), Availability (every request receives a response), and Partition Tolerance (the system continues to operate despite network failures). Since network partitions are inevitable in real-world distributed systems, designers must choose between prioritizing Consistency (CP systems) or Availability (AP systems) when a partition occurs.
The three properties of the CAP Theorem
Consistency:
Every read operation should receive the most recent write value or an error message. All nodes in the distributed system have the same data at the same time.
Availability:
Every request must receive a response, even if some nodes are down. A system that is highly available may return outdated data during a partition.
Partition Tolerance:
The system continues to function and process requests even when there are network failures, leading to partitions where nodes can't communicate with each other.
The Trade-off
In a distributed system, network partitions are a fact of life. During a network partition, the system must make a choice:
- Choose Consistency over Availability (CP): If a partition occurs, the system will stop responding to requests that would require accessing data on the partitioned nodes to maintain data consistency. This prevents outdated data from being served but makes the system temporarily unavailable.
- Choose Availability over Consistency (AP): If a partition occurs, the system will continue to serve responses to all requests, even if it means returning outdated data to some users. This ensures the system stays responsive but might lead to different nodes having different views of the data.
Examples
- Relational databases like those used for banking often prioritize Consistency (CP) because maintaining accurate financial information is critical.
- Social media platforms like Facebook and Twitter often prioritize Availability (AP) to ensure users have a smooth experience, even if the data they see is slightly stale.
Properties
1. Consistency (C)
- Definition:Every read receives the most recent write or an error.
- Meaning:All nodes see the same data at the same time.
- Example: Banking systems where account balances must be accurate across all branches.
2. Availability (A)
- Definition:Every request to a non-failing node receives a response.
- Meaning:Response may not contain the latest data during a partition.
- Example: Social media feeds that always load, even if slightly stale.
3. Partition Tolerance (P)
- Definition:The system continues to operate despite network failures that split communication between nodes.
- Meaning:Handles message loss or delays gracefully.
- Example: E-commerce sites that still allow browsing during data center outages.
Trade-Offs
Since network partitions are inevitable, designers must choose between:
- CP (Consistency + Partition Tolerance): Sacrifice availability during partitions. Example: HBase, MongoDB (in certain configs).
- AP (Availability + Partition Tolerance): Sacrifice strict consistency during partitions. Example: Cassandra, CouchDB.
- CA (Consistency + Availability): Only possible if no partitions occur — unrealistic at scale.
Example Scenarios
- Banking Transactions (CP): Accuracy is critical; may reject requests during partitions.
- Social Media Newsfeed (AP): Always responsive; tolerates temporary stale data.
- Hybrid Shopping Cart: AP for browsing, CP for checkout/payment.
Key Points
- CAP is a guiding principle for distributed system design, not a strict “pick two” rule in all cases.
- Partition Tolerance is non-negotiable in real-world distributed systems.
- PACELC Theorem extends CAP by adding the trade-off between Latency and Consistency when no partition exists.
Related Topics
- ACID vs BASE
- PACELC Theorem
- Eventual Consistency
- Distributed Databases