Introduction
The transaction log and how SQL uses it seems to be one of the most misunderstood topics among newcomers to the DBA role. I’m going to see if I can shed a little light on what the transaction log is, why SQL uses it, how the different recovery modes affect the log and how to manage it.
What is the Transaction Log?
Introduction
The transaction log and how SQL uses it seems to be one of the most misunderstood topics among newcomers to the DBA role. I’m going to see if I can shed a little light on what the transaction log is, why SQL uses it, how the different recovery modes affect the log and how to manage it.
What is the Transaction Log?
At its simplest, the transaction log is a log of all transactions run against a database and all database modifications made by those transactions. The transaction log is a critical part of the database’s architecture.
The transaction log is not an audit log. It’s not there so that the DBA can see who did what to the database. It’s also not a data recovery tool. There are third-party tools that can get audit or data-recovery info from the log, but that is not its primary purpose.
The transaction log is predominantly used by the SQL engine to ensure database integrity, to allow transaction rollbacks and for database recovery.
How does SQL use the log?
When changes are made to a database, whether it be in an explicit transaction or an auto-committed transaction, those changes are first written (hardened) to the log file and the data pages are changed in memory. Once the record of the changes is in the log, the transaction is considered complete. The data pages will be written to the disk at a later time either by the lazy writer or by the checkpoint process.
Transaction log entries are considered active until the data pages that were modified by that transaction have been written to disk. Once that occurs, the log entries are considered inactive and are no longer necessary for database recovery.
If transactions are rolled back, either by an explicit ROLLBACK TRANSACTION, by an error if XACT_ABORT is on, or due to a loss of connection to the client, the transaction log is used to undo the modifications made by that transaction.
When a server is restarted, SQL uses the transaction log to see if, at the point the server shut down there were any transactions that had completed but whose changes may not been written to disk, or any transactions that had not completed. If there are then the modifications that may not have been written to disk are replayed (rolled forward) and any that had not completed are rolled back. This is done to ensure that the database is in a consistent state after a restart.
Lastly, backups made of the transaction log can be used to recover a database to a point-in-time in case of a failure.
The transaction log is also used to support replication, database mirroring, change data capture and log shipping. I won’t be going into how they affect the log here.
Recovery models and the Transaction Log
The database recovery model does not (with the exception of bulk operations) affect what is written to the transaction log. Rather it affects how long log entries remain in the log.
Simple Recovery Model
In the simple recovery model, the transactions log entries are kept only for the purpose of database integrity and are not kept for database recovery purposes. Once the log entries are marked as inactive, that is once the associated data pages have been written to disk, the log entries can be discarded. In simple recovery mode, when a checkpoint operation runs, all inactive log records are removed from the transaction log and the space is made available for reuse.
In simple recovery, operations that qualify as bulk operations are minimally logged
This is the simplest recovery mode in terms of log management as the log manages itself. The downside of simple recovery is that because transaction log backups cannot be made, recovery of the database can only be done up until the latest full or differential database backup.
Some of the high-availability features in SQL do not work if the database is in simple recovery; specifically log shipping and database mirroring.
Full Recovery model
In full recovery model transaction log entries are kept for both database integrity and database recovery purposes. Inactive log records are retained in the transaction log until a lob backup occurs.
In full recovery, all operations are fully logged, including operations that qualify as bulk operations.
Full recovery can be difficult to manage as the log can grow beyond expected if transaction log backups don’t occur, or if there’s an increase in the amount of database activity that occurs between log backups. Because log records are not discarded until they have been backed up, a database in full recovery mode can be recovered to any time using a combination of full, differential and log backups.
Bulk-logged recovery model
Bulk-logged is very similar to full recovery, except that in bulk-logged, bulk operations are minimally logged. When operations are minimally logged, the full details are not written to the transaction log, however all extent and page allocations are still logged.
The advantage of bulk-logged recovery is that if there are bulk operations occurring, the impact those operations have on the transaction log is less than it would be if the database was in full recovery mode. However the transaction log backups may be much larger than the transaction log itself since the log backups include any data pages modified by bulk operations since the previous log backup
Managing transaction logs
Picking a recovery model
The key to effectively managing transaction logs is to know what the availability and recovery requirements are for the database. The choice of recovery model should not be chosen because of performance issues or space concerns.
If there is no requirement for point-in-time recovery and it is acceptable, in the case of a disaster, to restore the database to the last full/differential backup, then simple recovery model can be used. In reality, it’s not that common to have a database where the loss of several hours of data is acceptable, so in general simple recovery model should be limited to development or testing environments or databases that can be completely recreated from a source if they fail.
If is a requirement for point-in-time recovery and minimal or no data loss in the case of a disaster, then the database should be in full or bulk-logged recover model and regular log backups should be done. The frequency that the log backups get done at should be determined by the maximum acceptable data loss in the case of a complete disaster.
If the database is in full or bulk-logged recovery model then log backups must be done. Without log backups the log entries will never be discarded from the log and the log file will grow without bound. Since one of the main reasons for having a database in full or bulk-logged recovery model is to allow the database to be restored without data loss, it’s important to have an unbroken log chain to allow a restore to the point of failure, if necessary.
Log chains
Log backups form a chain which starts with the first full backup done to the database (or the first full backup after switching to full recovery). To be able to restore to a point in time, the log chain must stretch unbroken from a full or diff backup to the point that the database needs to be recovered to. If the log chain is broken, either by a log truncation, a missing log backup file or a switch to simple recovery mode, then the database cannot be restored past that point.
For this reason it is very important not to truncate the transaction log of a database. Truncating the transaction log of a database that is in full or bulk-logged recovery means discarding log records that may be needed for database recovery. The same applies to changing the database to simple recovery and back to full/bulk-logged. Either way, the log chain is broken and a new full or differential backup of the database is needed to restart the log chain.
Log size
The transaction log should be sized based on the amount of data modifications made to a database and the frequency of the log backups. Large data modifications, such as data loads or index rebuilds should be taken into account when calculating a log file size.
In simple recovery model the transaction log should not grow as the interval between checkpoints (which truncate the log) is based on the amount of data modifications made. If the log does grow, it may be that there are long-running transactions or transactions that have been left open. Either may indicate a problem with the application.
In full or bulk-logged recovery model, if the transaction log grows it may indicate that the frequency of data modifications has increased and as such, the interval between log backups should be decreased. It may also indicate long running transactions or that the log backup jobs are not running properly.
Conclusion
The transaction log is a crucial piece of a database and at least a basic understanding of it is necessary to effectively manage a system. I hope this has clarified some of the details of what the