When a computer software application needs to make changes to stored information, such as bank balances, merchandise orders, or travel reservations, care must be taken to ensure that the correct changes are made regardless of whether changes are being made at the same time on behalf of different customers. For example, a bank account balance must correctly reflect concurrent debits and credits. Additionally, the stored information must survive a failure of the computer and/or communication equipment that is involved. Transactions are a software technique for preserving the correctness of stored information in the face of concurrent updates and system crashes.
An application programmer specifies a transaction by bracketing a group of reads and writes with begin_transaction and commit_transaction calls to the system that implements the transactions, usually called a transaction-processing monitor. Assuming the application programmer has chosen a correct set of reads and writes, the transaction processing monitor ensures that these four properties will hold:
These are referred to as the ACID properties.
Of the four ACID properties, three are the responsibility of the underlying transaction-processing monitor, and one – consistency – is the responsibility of the application programmer. Maintaining consistency involves understanding the data model of the application, that is the correspondence between data items and entities in the real world, and ensuring that each transaction changes data items in a way that corresponds to a legitimate change in the real world. For example, moving money from bank account A to bank account B requires that A has a sufficient balance and the customer has signature authority over it, and that the final balance of B is increased by the amount debited from A. To achieve the other three properties – atomicity, isolation, and durability – modern transaction processing systems typically use locking and logging.
Locking is technique for assuring isolation of concurrent transactions. A lock is associated with some data item. Whenever a lock is in the unlocked state, exactly one transaction can acquire it, changing it to the locked state and causing other transactions attempting to acquire it to wait. When the transaction holding the lock is finished with the associated data item, it releases the lock, allowing one of the waiting transactions to acquire it. A transaction is well-formed if it acquires the lock for each data item before reading or writing that data item. A transaction is two-phase if it does not acquire any additional locks after the first time it releases a lock. Gray proved that if all transactions are well-formed and two-phase, then isolation is guaranteed: each transaction sees a state of the world as if all transactions were run sequentially.
Since reads are often more frequent than writes and since multiple transactions reading the same data item do not interfere, it is possible to increase concurrency (hence throughput) while maintaining isolation by distinguishing between locking for reading and locking for writing. Another way of increasing concurrency is to allow locking at different levels of granularity, for example an entire database, a particular table in that database, or a single row within a particular table. Yet another way to increase concurrency is to relax the requirement for isolation. Several useful reduced levels of isolation have been discovered that allow improved performance in certain situations. Gray did pioneering work in all these aspects of locking.
Logging is a technique for atomicity and durability of transactions. A log is a sequential file that records proposed changes to the data items constituting an application. Committing a transaction is performed by ensuring the log records for all its writes have been flushed to stable storage (e.g., disk) and then writing a final commit record to the log. Atomicity is achieved by defining the transaction to have been committed if and only if the commit record is present in the log. Durability is achieved by reading the log whenever the system starts and ensuring that every committed transaction has had its writes performed in the database. (In practice, a technique known as the Write-Ahead Log Protocol is normally used: each write to the database first writes a log record containing both the old and the new value for the data item; this log record must be flushed to stable storage before the data item itself is flushed. At system restart, uncommitted transactions are undone using the old values, while committed transactions are completed using the new values.) Gray pioneered or refined many of the techniques for logging.
For more information on transaction properties and implementation techniques, see:
Jim Gray and Andreas Reuter: Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers, 1993.