Introduction

Reasons for replication

  1. Reliability

    • If a file has been replicated it may be possible to continue working after one replica crashes by simply switching to one of the other replicas.

    • Also by maintaining multiple copies, it becomes possible to provide better protection against corrupted data.

  2. Performance

    • Scaling with respsect to size occurs, for example, when an increasing number of processes needs to access data that are managed by a single server.

    • In that case, performance can be improved by replicating the server and subsequently dividing the workload among the processes accessing the data.

Problem with replication

  • The problem with replication is that having multiple copies may lead to consistency problems.

  • Whenever a copy is modified, that copy becomes different from the rest.

  • Consequently, modifications have to be carried out on all copies to ensure consistency.

  • Exactly when and how those modifications need to be carried out determines the price of replication.

  • For example, Web browsers cache Web pages to increase performance but users might not get the latest version of those pages.

Replication as scaling technique

  • Replication and caching for performance are widely applied as scaling techniques.

  • A possible trade-off that needs to be made is that keeping copies up to date may require more network bandwidth.

  • A more serious problem, however, is that keeping multiple copies consistent may itself be subject to serious scalability problems.

  • Difficulties come from the fact that we need to synchronize all replicas.

    • For example, replicas may need to decide on a global ordering of operations using Lamport timestamps, or let a coordinator assign such an order.

    • Global synchronization simply takes a lot of communication time, especially when replicas are spread across a wide-area network.

  • In many cases, the only real solution is to relax the consistency constraints.

Last updated