Data-centric Consistency

Sequential Consistency + Causal Consistency + Entry Consistency + Eventual Consistency

Reasons for Replication

  1. Reliability

    • If a file has been replicated it may be possible to continue working after one replica crashes by simply switching to one of the other replicas.

    • Also by maintaining multiple copies, it becomes possible to provide better protection against corrupted data.

  2. Performance

    • Scaling with respect to size occurs, for example, when an increasing number of processes need to access data that are managed by a single server.

    • In that case, performance can be improved by replicating the server and subsequently dividing the workload among the processes accessing the data.

Problem with Replication

  • The problem with replication is that having multiple copies may lead to consistency problems.

  • Whenever a copy is modified, that copy becomes different from the rest.

  • Consequently, modifications have to be carried out on all copies to ensure consistency.

  • Exactly when and how those modifications need to be carried out determines the price of replication.

  • For example, Web browsers cache Web pages to increase performance but users might not get the latest version of those pages.

Replication as Scaling Technique

  • Replication and caching for performance are widely applied as scaling techniques.

  • A possible trade-off that needs to be made is that keeping copies up to date may require more network bandwidth.

  • A more serious problem, however, is that keeping multiple copies consistent may itself be subject to serious scalability problems.

  • Difficulties come from the fact that we need to synchronize all replicas.

    • For example, replicas may need to decide on a global ordering of operations using Lamport timestamps, or let a coordinator assign such an order.

    • Global synchronization simply takes a lot of communication time, especially when replicas are spread across a wide-area network.

  • In many cases, the only real solution is to relax the consistency constraints.

Distributed Data Store

  • Traditionally, consistency has been discussed in the context of read and write operations on shared data, available by means of (distributed) shared memory, a (distributed) shared database, or a (distributed) file system.

  • Each process reads data from a local copy. Writes are propagated to the other copies.

Consistency Models

  • A consistency model is essentially a contract between processes and the data store. It says that if processes agree to obey certain rules, the store promises to work correctly.

  • Normally, a process that performs a read operation on a data item, expects the operation to return a value that shows the results of the last write operation on that data.

  • In the absence of a global clock, it is difficult to define precisely which write operation is the last one.

  • As an alternative, we need to provide other definitions, leading to a range of consistency models.

  • Each model effectively restricts the values that a read operation on a data item can return.

Model 1: Sequential Consistency

The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program.

  • What this definition means is that when processes run concurrently on (possibly) different machines, any valid interleaving of read and write operations is acceptable behavior, but all processes see the same interleaving of operations.

  • Note that nothing is said about time; that is, there is no reference to the "most recent" write operation on a data item.

  • Also, a process "sees" the writes from all processes but only through its own reads.

  • Important: study the example in Textbook Section 7.2 Figure 7.5.

Model 2: Causal Consistency

Writes that potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machine.

  • Causality: if event bb is caused or influenced by an earlier event aa, causality requires that everyone else first see aa, then see bb.

  • On the other hand, if two processes spontaneously and simultaneously write two different data items, these are not causally related. Operations that are not causally related are said to be concurrent.

  • Important: study the example in Textbook Section 7.2 Figure 7.10, 7.11, and 7.12.

Grouping Operations

  • The fine granularity of these consistency models (elementary read and write operations) in many cases does not match the granularity as provided by applications.

  • What we see there is that concurrency between programs sharing data is generally kept under control through synchronization mechanisms for mutual exclusion and transactions.

  • Effectively, what happens is that at the program level read and write operations are bracketed by the pair of operations ENTER_CS and LEAVE_CS.

  • A process that has successfully executed ENTER_CS will be ensured that all the data in its local store is up to date.

  • At that point, it can safely execute a series of read and write operations on that store, and subsequently wrap things up by calling LEAVE_CS.

  • Data and instructions between ENTER_CS and LEAVE_CS is denoted as a critical section. It is an atomically executed unit => raises the level of granularity.

Model 3: Entry Consistency

  • A lock has shared data items associated with it, and each shared data item is associated with at most one lock.

  • When a process enters a critical section it should acquire the relevant locks, and likewise when it leaves the critical section, it releases these locks.

  • While having exclusive access to a lock, a single process is allowed to perform read and write operations.

  • It is also possible for several processes can simultaneously have nonexclusive access to a lock; they can read, but not write, the associated data.

  • Acquiring a lock can succeed only when all updates to its associated shared data have completed.

  • Exclusive access to a lock can succeed only if no other process has exclusive or nonexclusive access to that lock.

  • Non-exclusive access to a lock is allowed only if any previous exclusive access has been completed, including updates to that lock’s associated data.

  • Important: study the example in Textbook Section 7.2 Figure 7.13.

Motivation for Eventual Consistency

  • To what extent processes actually operate in a concurrent fashion, and to what extent consistency needs to be guaranteed, may vary.

  • E.g., Web

    • Web pages are updated by a single authority => no write-write conflicts to resolve.

    • Browsers and Web proxies keep a fetched page in a local cache => many users find this inconsistency acceptable.

  • E.g., DNS

    • Only the authority is allowed to update its part of the name space => no write-write conflicts.

    • It is possible to have read-write conflicts => acceptable to propagate an update in a lazy fashion.

Model 4: Eventual Consistency

  • If no updates take place for a long time, all replicas will gradually (eventually) become consistent.

  • Eventual consistency requires only that updates are guaranteed to propagate to all replicas eventually, not how soon the propagation is going to happen. It only assures that the propagation happens at some point.

  • Eventual consistency is sufficient for systems that can tolerate a relatively high degree of inconsistency.

  • Write-write conflicts are often relatively easy to solve when assuming that only a small group of processes can perform updates.

  • Although eventual consistency is weak, we still need it because it is a tradeoff between consistency and scalability.

Last updated