CAP Theorem - hurdle to a perfect system

Introduction

A software developer always aims at developing a perfect system - a product which is able to handle heavy loads with low latencies, which is responsive to the user and tolerant to failures. But developing a system is very different from developing a college project. Building a system for deployment purposes forces one to take into consideration other aspects of performance.

CAP Theorem puts restriction on designing a system that ensures all the three golden properties - Consistency, Availability and Partition Tolerance. Let's define these properties one by one :

Consistency : This property ensures that if a user of our system performs a read operation on a resource, the value read is that of the latest write done on that resource.

Availability : The system is always available to user for performing operations on a resource, i.e. a user will always  receive a 'response' to a 'request'.

Partition Tolerance : In case of a network partition, the system continues to operate with same or reduced performance. 'Network Partition' refers to different parts of a system not able to communicate to each other, due to one or more reasons (message drops, delays, and failures of nodes). The reason for network failure may vary - a wire under the sea may get cut or a data centre may catch fire, anything can partition a unified network.



CAP Theorem states that a system cannot ensure all of the three properties given above completely at any time. Since modern day systems are scalable and of distributed nature, we can say that Partition Tolerance is a desirable property at all times, since network errors occur all the time.

Let's understand the theorem with two examples.

Example 1 : An ATM banking system

Suppose you are asked to design a basic ATM banking system with only two allowed operations : Deposit Money ('a write'), and View Account Balance ('a read'). You start with just two ATM machines, attached to a central database with a network. A user can deposit money at any ATM and can view balance at any ATM. Now, due to some error, a network partition occurs at ATM A as shown below.

A user wants to deposit Rs. 1000 at ATM A. Now the system designer can choose one of the following implementations to handle network partition :
  • ATM A can deny its 'Deposit Operation' to the user  as it cannot update the changes due to partition in network. User can comeback later or deposit money at ATM B(ensure consistency, compromise availability).
  • ATM A can continue with the 'Deposit Service' and store the update to account balance in it's local memory. Once the partition is cured, it can write to the database. The problem here is that if user goes to ATM B and queries his/her account balance, ATM B will show the old account balance because the update has not been performed yet by ATM A. It will appear to user that he has lost his 1000 bucks!!(ensure availability, compromise consistency).
As is evident, you cannot ensure both the properties. In such banking and financial applications, Consistency is preferred over availability since inconsistent behaviour may cause a lot of damage and loss.

Example 2 : A social media application

Let's take a look over a simple social media application which allows users to post an image and comment as well as up-vote an image. The front-end application talks to a set of distributed back-end server which store-retrieve data from a set of unified database servers(Servers are present in one of the two data centres).


Again as in previous example, in case of a network partition, the application has two options :
  • It can stop responding to the requests from the users as long as the fault is detected and recovered. Users will not be able to post or comment in such a situation(ensure consistency, compromise availability).
  • It can store the post and comments updates in local disk and write it to the main database when the partition is removed. In this case, a post or a comment made by a user may not be instantly visible to other peers but eventually, the post or comment will be posted(ensure availability, compromise consistency).
As you can see, in this use case, availability is of more importance since a social media platform is expected to be available all the time. Also, even if some user data like a post or a comment is lost or delayed, it not a 'critical information' like account balance from previous example.

The network partition not only happens between the application and the servers, it can happen among the servers of data centres also. The main point is that the systems given above are not able to ensure both consistency and availability when network errors occur, and this is what CAP theorem states.

Conclusion

Modern systems such as Google docs, google drive, Apache Cassandra have designed a number of ways to work around the CAP theorem. Instead of providing fully available or fully consistent services, they provide with levels/percentage of these properties. For example, Cassandra provides with multiple levels of consistency, obviously at the cost of some latency, i.e. the more consistent, the more lag your system suffers. I will write a separate blog on design of Cassandra system.

Transaction-based applications such as Relational databases ensure 'ACID' properties, where 'C' stands for strict consistency. On the other hand, Key-Value data-stores implement 'BASE' properties. Although both terms are more mnemonic than precise, the BASE acronym is a bit more awkward : 'Basically Available, Soft state, Eventually consistent'. Soft state and eventual consistency are techniques that work well in the presence of partitions and thus promote availability. 'Eventual Consistency' implies that if writes to a system stop, after some time, the system will eventually become consistent. It mainly depends on the use-case and properties you want your system to hold. Given below is diagram with example systems.




This is my first tech-blog. I will soon write more about consistency, distributed systems, what problems modern systems face and how they overcome them. I will also write short hands-on tutorial blogs on android development, server development, algorithms and more. If computer science is what excites you, do subscribe to the blog and comment!!

Comments

Post a Comment