Sabeeeeeee!!!
Index
REPLICATED
DATA
8.1 lNTRODUCTlON A replicated database is a distributed database in which multiple copies of some data items are stored at multiple sites. The main reason for using replicated data is to increase DBS availability. By storing critical data at multiple sites, the DBS can operate even though some sites have failed. Another goal is improved performance.Since there are many copies of each data item, a transaction is more likely to find the data it needs close by, as compared to a single copy database. This benefit is mitigated by the need to update all copies of each data item. Thus, Reads may run faster at the expense of slower Writes. Our goal is to design a DBS that hides all aspects of data replication from users’ transactions. That is,transactions issue Reads and Writes on data items, and the DBS is responsible for translating those operations into Reads and Writes on one or more copies of those data items. Before looking at the architecture of a DBS that performs these functions, let’s first determine what it means for such a system to behave correctly,
Correctness We assume that a DBS managing a replicated database should behavelike a DBS managing a one-copy (i.e., nonreplicated) database insofar as users can tell. In a one-copy database, users expect the interleaved execution of their 265
266
CHAPTER 8 I REPLICATED DATA
transactions to be equivalent to a serial execution of those transactions. Since replicated data should be transparent to them, they would like the interleaved execution of their transactions ona replicated database to be equivalent to a serial execution of those transactions on a one-copy database. Such executions are called one-copy serializable (or ZSR). This is the goal of concurrency control for replicated data. This concept of one-copy serializability is essentially the same as the one we used for multiversion data in Chapter 5. In both cases we are giving the user a one-copy viewof a database that may have multiple copies (replicated copies or multiple versions) of each data item. The only difference is that here we are abstracting replicated copies, rather than multiple versions, from the users’ view.
The Write-All Approach In an ideal world where sites never fail, a DBS can easily manage replicated data. It translates each Read(x) into Read(xA), where XA is any copyof data item x (x4 denotes the copy of x at site A). It translates each Write(x) into { Write(xA,), . . . , Write(xA,) >, where {x,4,, . . . , xA,) are all copies of x. And it uses any serializable concurrency control algorithm to synchronize access to copies. We call this the write-all approach to replicated data. To see why the write-all approach works, consider any execution produced by theDBS. Since the DBS is using a serializable concurrency control algorithm, this execution is equivalent to some serial execution. In that serial execution, each transaction that writes into a data item x writes into all copies of X. From the viewpoint of the next transaction in the serial execution, all copies of x were written simultaneously, So, no matter which copy of x the next transaction reads,it reads the same value, namely, the one written by the last transaction that wrote all copies of x. Thus, the execution behaves as though it were operating on a single copy database. Unfortunately, the world is less than ideal - sites can fail and recover. This is a problem for the write-all approach, because it requires that the DBS process each Write(x) by writing into all copies of X, even ifsome have failed. Since there will be times when some copies of x are down, the DBS will not always be able to write into all copies of x at the time it receives a Write(x) operation. If the DBS were to adhere to the write-all approach in this situation, it would have to delay processing Write(x) until it could write into all copies of x. Such a delay is obviously bad for update transactions. If...
Regístrate para leer el documento completo.