File Locking And Concurrency In SQLite Version 3

Small. Fast. Reliable.
Choose any three.

这篇文档最初创建于2004年之初，当时SQLite2还在广泛的使用中，然后为已经熟悉SQLite2的读者写了一篇介绍SQLite3中的新概念的文档。但是目前，大多数的读者很可能凑来没有见过SQLite2，并且只熟悉SQLite3。不过这篇文档依然可以用作SQLite3中数据库锁工作原理的权威参考。
This document was originally created in early 2004 when SQLite version 2 was still in widespread use and was written to introduce the new concepts of SQLite version 3 to readers who were already familiar with SQLite version 2. But these days, most readers of this document have probably never seen SQLite version 2 and are only familiar with SQLite version 3. Nevertheless, this document continues to serve as an authoritative reference to how database file locking works in SQLite version 3.

这篇文档只介绍了老的回滚模式事务机制中的锁。较新的WAL中的锁在别处介绍。
The document only describes locking for the older rollback-mode transaction mechanism. Locking for the newer write-ahead log or WAL mode is described separately.

1.0 SQLite3中的文件锁与并发
1.0 File Locking And Concurrency In SQLite Version 3

SQLite 3.0.0 版引入了新的锁和日志机制，用来增强SQLite2中的并发能力，以及减少写入者饿死的问题。这个新的机制还实现了在多个数据库文件上事务的原子提交。本文就是介绍这个新的锁机制。本文的目标读者是一些希望理解或修改pager源码或者验证SQLite3的设计的程序员们。
SQLite Version 3.0.0 introduced a new locking and journaling mechanism designed to improve concurrency over SQLite version 2 and to reduce the writer starvation problem. The new mechanism also allows atomic commits of transactions involving multiple database files. This document describes the new locking mechanism. The intended audience is programmers who want to understand and/or modify the pager code and reviewers working to verify the design of SQLite version 3.

2.0 概览
2.0 Overview

锁和并发控制是由 pager 模块提供的。pager模块是用来保证SQLite“ACID”的（原子性、一致性、隔离性、持久性）。pager模块可以确保修改总是立刻发生的，要么所有的修改都完成了，要么任何修改都没有执行。两个或多个进程不会使用不兼容的方式同时访问数据库。一旦修改变动就会被永久的写入，直到明确的删除它。pager同时还提供了磁盘文件内容的内存缓存。
Locking and concurrency control are handled by the pager module. The pager module is responsible for making SQLite "ACID" (Atomic, Consistent, Isolated, and Durable). The pager module makes sure changes happen all at once, that either all changes occur or none of them do, that two or more processes do not try to access the database in incompatible ways at the same time, and that once changes have been written they persist until explicitly deleted. The pager also provides a memory cache of some of the contents of the disk file.

pager并不关心B-Tree、文本编码、索引等内容的细节。从pager的视点来看数据库只是由一个包含固定大小块的单一文件组成的。每个块叫做“页（page）”，其通常大小为1024字节。这些页从1开始编号。所以数据库的前1024字节叫做“页1”，第二个1024字节叫做“页2”等等。其它具体编码都是库里更高层处理的事情。pager使用某一个模块来和操作系统通讯（例如：os_unix.c、os_win.c），其伟操作系统服务提供了统一的抽象。
The pager is unconcerned with the details of B-Trees, text encodings, indices, and so forth. From the point of view of the pager the database consists of a single file of uniform-sized blocks. Each block is called a "page" and is usually 1024 bytes in size. The pages are numbered beginning with 1. So the first 1024 bytes of the database are called "page 1" and the second 1024 bytes are call "page 2" and so forth. All other encoding details are handled by higher layers of the library. The pager communicates with the operating system using one of several modules (Examples: os_unix.c, os_win.c) that provides a uniform abstraction for operating system services.

pager模块有效的控制了不同线程或者进程的访问。在全文中，只要你看到词“进程”，就可以在不修改语句中任何词的情况下替换为词“线程”。
The pager module effectively controls access for separate threads, or separate processes, or both. Throughout this document whenever the word "process" is written you may substitute the word "thread" without changing the truth of the statement.

3.0 锁
3.0 Locking

在单独一个进程来看，一个数据库文件可以是五种锁状态中的一个：
From the point of view of a single process, a database file can be in one of five locking states:

UNLOCKED	数据库没持有锁。数据库既没有读，也没有写。任何内部缓存数据都是不可信的，并且在使用前需要对照数据库文件进行验证。其他进程可以依照他们持有的锁状态许可来读或写数据库。这是默认状态。 No locks are held on the database. The database may be neither read nor written. Any internally cached data is considered suspect and subject to verification against the database file before being used. Other processes can read or write the database as their own locking states permit. This is the default state.
SHARED	数据库可能在读取，但是不会在写入。同一时刻可以有任意多个进程持有SHARED锁，因此同时可以有多个读取者。当有一个或多个活跃的SHARED锁的时候是不允许任何进程或线程写入数据库文件。 The database may be read but not written. Any number of processes can hold SHARED locks at the same time, hence there can be many simultaneous readers. But no other thread or process is allowed to write to the database file while one or more SHARED locks are active.
RESERVED	RESERVED锁意味着这个进程计划在未来的某一时刻写入数据库文件，但是当前还只是读取文件。虽然多个SHARED锁可以与一个RESERVED锁共存，但是同一时刻只能有一个RESERVED锁。RESERVED与PENDING的不同之处在于当持有RESERVED锁时，还可以获取新的SHARED锁。 A RESERVED lock means that the process is planning on writing to the database file at some point in the future but that it is currently just reading from the file. Only a single RESERVED lock may be active at one time, though multiple SHARED locks can coexist with a single RESERVED lock. RESERVED differs from PENDING in that new SHARED locks can be acquired while there is a RESERVED lock.
PENDING	PENDING锁意味着持有这个锁的进程要开始写入数据库了，只是在等待所有SHARED锁都释放掉以后获取一个EXCLUSIVE锁。如果持有PENDING锁，则数据库上不允许再获取新的SHARED锁了，不过已有的SHARED锁还可以继续使用。 A PENDING lock means that the process holding the lock wants to write to the database as soon as possible and is just waiting on all current SHARED locks to clear so that it can get an EXCLUSIVE lock. No new SHARED locks are permitted against the database if a PENDING lock is active, though existing SHARED locks are allowed to continue.
EXCLUSIVE	当写入数据库文件的时候就需要持有EXCLUSIVE锁。文件同一时刻上只允许存在一个EXCLUSIVE锁，其它所有类型的锁都不能与EXCLUSIVE锁共存。为了尽量提高并发性，SQLite尽可能的缩短了EXCLUSIVE锁的持有时间。 An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.

操作系统接口层能够理解和关注上面锁说的这五种锁状态。pager模块只关注其中的四种锁状态。PENDING锁通常是在获取EXCLUSIVE锁过程中的一个临时步奏，所以pager模块不关注PENDING锁。
The operating system interface layer understands and tracks all five locking states described above. The pager module only tracks four of the five locking states. A PENDING lock is always just a temporary stepping stone on the path to an EXCLUSIVE lock and so the pager module does not track PENDING locks.

4.0 回滚日志
4.0 The Rollback Journal

当一个进程想要修改数据库文件时（并且不是在WAL模式），需要首先将未修改的原始数据库内容记录到回滚日志之中。回滚日志是一个于数据库文件位于同一文件夹内的普通磁盘文件，并且命名是在数据库文件名的基础上加一个-journal后缀。回滚日志还会记录数据库的初始尺寸，以便如果数据库文件增长了，那么在回滚的时候能够截短到原始尺寸。
When a process wants to change a database file (and it is not in WAL mode), it first records the original unchanged database content in a rollback journal. The rollback journal is an ordinary disk file that is always located in the same directory or folder as the database file and has the same name as the database file with the addition of a -journal suffix. The rollback journal also records the initial size of the database so that if the database file grows it can be truncated back to its original size on a rollback.

如果SQLite同时操作多个数据库（使用ATTACH命令），那么每个数据库都有自己的回滚日志。但是还有一个单独的聚合日志叫做主日志。主日志不包含用于回滚改动的页数据，而是包含了每个ATTACH数据库的回滚日志的名字。每个独立的数据库回滚日志中也同样包含了主日志的名字。如果没有ATTACH数据库（或者没有ATTACH数据库参与到当前事务中），那么就不会创建主日志文件，回滚日志中会使用一个空字符串来替代记录主日志名字的位置。
If SQLite is working with multiple databases at the same time (using the ATTACH command) then each database has its own rollback journal. But there is also a separate aggregate journal called the master journal. The master journal does not contain page data used for rolling back changes. Instead the master journal contains the names of the individual database rollback journals for each of the ATTACHed databases. Each of the individual database rollback journals also contain the name of the master journal. If there are no ATTACHed databases (or if none of the ATTACHed database is participating in the current transaction) no master journal is created and the normal rollback journal contains an empty string in the place normally reserved for recording the name of the master journal.

如果一个回滚日志需要执行回滚来恢复数据库的完整性，那么就称为活跃日志。当一个进程正在进行数据库更新的过程中发生了程序或系统崩溃或者意外断电，这阻止了更新的完成，此时就会出现一个活跃日志。活跃日志是一个意外情况。活跃日志是用于从崩溃或意外断电中恢复的。如果一切都正常的运行（没有崩溃或断电）那么你永远也不会得到活跃日志。
A rollback journal is said to be hot if it needs to be rolled back in order to restore the integrity of its database. A hot journal is created when a process is in the middle of a database update and a program or operating system crash or power failure prevents the update from completing. Hot journals are an exception condition. Hot journals exist to recover from crashes and power failures. If everything is working correctly (that is, if there are no crashes or power failures) you will never get a hot journal.

当没有主日志的参与时，只有存在回滚日志，且日志的头部不是零，且相应的数据库文件没有RESERVED锁时，回滚日志才是活跃的。当回滚日志中包含了主日志的名字时，只有存在主日志，且对应的数据库文件没有RESERVED锁时回滚日志才是活跃的。理解什么时候回滚日志时活跃的是非常重要的，所以下面的条目重复了上面的规则：
If no master journal is involved, then a journal is hot if it exists and has a non-zero header and its corresponding database file does not have a RESERVED lock. If a master journal is named in the file journal, then the file journal is hot if its master journal exists and there is no RESERVED lock on the corresponding database file. It is important to understand when a journal is hot so the preceding rules will be repeated in bullets:

如果一个日志时活跃的，那么...
A journal is hot if...
- 日志文件存在，
  It exists, and
- 并且文件的大小超过512字节，
  Its size is greater than 512 bytes, and
- 并且日志头部时非零且格式完整的，
  The journal header is non-zero and well-formed, and
- 并且存在主日志或者主日志文件名是一个空字符串，
  Its master journal exists or the master journal name is an empty string, and
- 并且相应的数据库文件上没有RESERVED锁。
  There is no RESERVED lock on the corresponding database file.

4.1 处理活跃日志
4.1 Dealing with hot journals

在读取数据库文件之前，SQLite总要检查一下看看数据库文件是否有相应的活跃日志。如果存在一个活跃日志，那么久需要在读取文件之前进行回滚。通过这种方法，我们可以确保在读取前数据库状态时一致的。
Before reading from a database file, SQLite always checks to see if that database file has a hot journal. If the file does have a hot journal, then the journal is rolled back before the file is read. In this way, we ensure that the database file is in a consistent state before it is read.

当一个进程想要读取数据库文件时，需要按照下列的步奏：
When a process wants to read from a database file, it followed the following sequence of steps:

打开一个数据库文件，并获取一个SHARED锁。如果无法获取SHARED锁，那么立即失败并返回SQLITE_BUSY。
Open the database file and obtain a SHARED lock. If the SHARED lock cannot be obtained, fail immediately and return SQLITE_BUSY.
检查看数据库文件是否有一个活跃日志。如果没有活跃日志，那么就立即返回。如果有一个活跃日志，那么必须按照这个算法中的后续步奏来回滚日志。
Check to see if the database file has a hot journal. If the file does not have a hot journal, we are done. Return immediately. If there is a hot journal, that journal must be rolled back by the subsequent steps of this algorithm.
在数据库文件上获取一个PENDING锁然后时EXCLUSIVE锁（注意：不需要获取RESERVED锁，因为这会使其它进程以为日志不是活跃的了）。如果我们获取锁失败，那么意味着其它进程已经开始执行回滚操作了。这种情况下，释放所有锁并关闭数据库文件，然后返回SQLITE_BUSY。
Acquire a PENDING lock then an EXCLUSIVE lock on the database file. (Note: Do not acquire a RESERVED lock because that would make other processes think the journal was no longer hot.) If we fail to acquire these locks it means another process is already trying to do the rollback. In that case, drop all locks, close the database, and return SQLITE_BUSY.
读取日志文件，回滚改动。
Read the journal file and roll back the changes.
等待回滚内容写入到持久化存储器中。这保证了当遇到断电或者系统崩溃时数据库的完整性。
Wait for the rolled back changes to be written onto persistent storage. This protects the integrity of the database in case another power failure or crash occurs.
删除日志文件（如果设置了PRAGMA journal_mode=TRUNCATE则将日志截短到零字节，如果设置了PRAGMA journal_mode=PERSIST则将日志头部写入零）。
Delete the journal file (or truncate the journal to zero bytes in length if PRAGMA journal_mode=TRUNCATE is set, or zero the journal header if PRAGMA journal_mode=PERSIST is set).
删除主日志文件，这一步时可选的，之所以执行这一步，是为了防止无效的主日志文件使磁盘驱动器显得杂乱无章。更多内容参见下面的讨论。
Delete the master journal file if it is safe to do so. This step is optional. It is here only to prevent stale master journals from cluttering up the disk drive. See the discussion below for details.
释放EXCLUSIVE和PENDING锁，但是继续持有SHARED锁。
Drop the EXCLUSIVE and PENDING locks but retain the SHARED lock.

在上面的算法成功完成之后，就可以安全的从数据库文件读取内容了。一旦所有的读取都完成，那么就释放SHARED锁。
After the algorithm above completes successfully, it is safe to read from the database file. Once all reading has completed, the SHARED lock is dropped.

4.2 删除无效的主日志
4.2 Deleting stale master journals

无效的主日志是指一个再也没有任何用处的主日志。这里并不要求一定要删除主日志，做这件事的唯一目的就是释放磁盘空间。
A stale master journal is a master journal that is no longer being used for anything. There is no requirement that stale master journals be deleted. The only reason for doing so is to free up disk space.

如果没有其他日志文件指向这个主日志文件了，那么这个主日志文件就是无效的了。为了计算出一个主日志文件是否时失效的，我们首先要读取主日志文件，来获取所有日志文件的名字。然后检查每个日志文件，如果每个日志文件名所指的日志都还在，并且还指回了主日志文件，那么这个主日志就不是无效的。如果所有日志文件要么不在了，要么指向其它的主日志文件或没有指向主日志，那么这个日志文件就认为时失效的，可以安全的删除。
A master journal is stale if no individual file journals are pointing to it. To figure out if a master journal is stale, we first read the master journal to obtain the names of all of its file journals. Then we check each of those file journals. If any of the file journals named in the master journal exists and points back to the master journal, then the master journal is not stale. If all file journals are either missing or refer to other master journals or no master journal at all, then the master journal we are testing is stale and can be safely deleted.

5.0 写入数据库文件
5.0 Writing to a database file

要想写一个数据库，进程必须首先获取一个上面所说的SHARED锁（如果有一个活跃日志，有可能还没有完成回滚）。在获取了SHARED锁之后，需要获取一个RESERVED锁。RESERVED锁标志这这个进程在未来的某个时间点打算写入数据库。同一时刻只能有一个进程持有RESERVED锁。不过其他进程在持有RESERVED锁的时候还可以继续读取数据库。
To write to a database, a process must first acquire a SHARED lock as described above (possibly rolling back incomplete changes if there is a hot journal). After a SHARED lock is obtained, a RESERVED lock must be acquired. The RESERVED lock signals that the process intends to write to the database at some point in the future. Only one process at a time can hold a RESERVED lock. But other processes can continue to read the database while the RESERVED lock is held.

如果想要写入的进程无法获取RESERVED锁，那么就意味着其他进程已经获取了RESERVED锁。这种情况下，写入会失败并返回SQLITE_BUSY。
If the process that wants to write is unable to obtain a RESERVED lock, it must mean that another process already has a RESERVED lock. In that case, the write attempt fails and returns SQLITE_BUSY.

在获取了RESERVED锁之后，想要写入的进程会创建一个回滚日志。日志的头部会初始化为数据库文件的原始尺寸值。日志头部的空间还要保存主日志的名字，也可能主日志的名字是空字符串。
After obtaining a RESERVED lock, the process that wants to write creates a rollback journal. The header of the journal is initialized with the original size of the database file. Space in the journal header is also reserved for a master journal name, though the master journal name is initially empty.

在修改数据库的任何页之前，进程需要将这些页的原始内容写入到回滚日志中。修改的页首先是保存在内存中，而不写入到磁盘中。原始数据库文件还是未更新的，这意味着其他进程还可以继续读取数据库。
Before making changes to any page of the database, the process writes the original content of that page into the rollback journal. Changes to pages are held in memory at first and are not written to the disk. The original database file remains unaltered, which means that other processes can continue to read the database.

最后，写入进程将更新数据库文件，因为内存缓存已经慢了，也因为做好了提交的准备。在这发生之前，写入者必须确保没有其它进程还在读取数据库以及回滚日志数据已经安全的写入到磁盘中，以便当遇到断电等事件的时候可以回滚未完成的改动。具体步奏按如下所述：
Eventually, the writing process will want to update the database file, either because its memory cache has filled up or because it is ready to commit its changes. Before this happens, the writer must make sure no other process is reading the database and that the rollback journal data is safely on the disk surface so that it can be used to rollback incomplete changes in the event of a power failure. The steps are as follows:

却表所有回滚日志数据都已经写入到了磁盘上（不是只存入到操作系统或磁盘控制器的缓存中），以便当遇到断电故障后，当电力恢复时数据还在磁盘上。
Make sure all rollback journal data has actually been written to the surface of the disk (and is not just being held in the operating system's or disk controllers cache) so that if a power failure occurs the data will still be there after power is restored.
在数据库文件上获取一个PENDING锁，再获取一个EXCLUSIVE锁。如果其它进程还持有SHARED锁，那么写入者可能要等待所有的SHARED锁都释放了，然后在获取EXCLUSIVE锁。
Obtain a PENDING lock and then an EXCLUSIVE lock on the database file. If other processes still have SHARED locks, the writer might have to wait until those SHARED locks clear before it is able to obtain an EXCLUSIVE lock.
将所有已经在内存中修改了的页写入到原始的数据库磁盘文件中。
Write all page modifications currently held in memory out to the original database disk file.

如果写入磁盘文件的原因时由于内存缓存满了，那么写入者就不会立即提交，而是继续修改其它页。在后续修改写入到数据库文件之前，回滚日志必须已经同步到磁盘上。还要注意，写入者最初为了写入数据库而持有的EXCLUSIVE锁必须一直持有者，知道所有的改动都已经提交了。这意味着从内存缓存第一次溢出到事务提交期间，其它进程都无法访问数据库。
If the reason for writing to the database file is because the memory cache was full, then the writer will not commit right away. Instead, the writer might continue to make changes to other pages. Before subsequent changes are written to the database file, the rollback journal must be flushed to disk again. Note also that the EXCLUSIVE lock that the writer obtained in order to write to the database initially must be held until all changes are committed. That means that no other processes are able to access the database from the time the memory cache first spills to disk until the transaction commits.

当一个写入者准备提交改动的时候，会执行以下步奏：
When a writer is ready to commit its changes, it executes the following steps:

上面算法的1-3步，在数据库上获取一个EXCLUSIVE锁并确保所有内存中的改动都写入到数据库文件中。
Obtain an EXCLUSIVE lock on the database file and make sure all memory changes have been written to the database file using the algorithm of steps 1-3 above.
将所有数据库文件改动同步到磁盘中，等待所有改动都真正写入到磁盘上面。
Flush all database file changes to the disk. Wait for those changes to actually be written onto the disk surface.
删除日志文件（如果PRAGMA journal_mode时TRUNCATE或PERSIST时，截短日志文件或者在日志头部写入零）。这就是修改提交的时刻。在删除日志文件之前，如果发生断电或崩溃，那么下一个打开数据库的进程会发现有一个活跃日志，然后将回滚所有改动。在日志删除之后，就没有活跃日志了，所有改动都会持久化了。
Delete the journal file. (Or if the PRAGMA journal_mode is TRUNCATE or PERSIST, truncate the journal file or zero the header of the journal file, respectively.) This is the instant when the changes are committed. Prior to deleting the journal file, if a power failure or crash occurs, the next process to open the database will see that it has a hot journal and will roll the changes back. After the journal is deleted, there will no longer be a hot journal and the changes will persist.
从数据库文件上释放EXCLUSIVE和PENDING锁。
Drop the EXCLUSIVE and PENDING locks from the database file.

一旦从数据库文件上释放了PENDING锁，其他进程就再一次可以开始读取数据库了。在当前的实现中，RESERVED锁也会释放掉，不过这并不是正确操作所必须的。
As soon as the PENDING lock is released from the database file, other processes can begin reading the database again. In the current implementation, the RESERVED lock is also released, but that is not essential for correct operation.

如果一个事务包含了多个数据库，那么会使用一个更加复杂的流程，如下：
If a transaction involves multiple databases, then a more complex commit sequence is used, as follows:

确保所有数据库文件都持有EXCLUSIVE锁，以及有一个有效的日志。
Make sure all individual database files have an EXCLUSIVE lock and a valid journal.
创建一个主日志。主日志的名字时随意的（当前实现中是在主数据库文件名的后面加上一个随机后缀，还要确保这个名字以前没有出现过）。在主日志中填入所有单独的日志文件名，然后将其同步到磁盘上。
Create a master-journal. The name of the master-journal is arbitrary. (The current implementation appends random suffixes to the name of the main database file until it finds a name that does not previously exist.) Fill the master journal with the names of all the individual journals and flush its contents to disk.
将主日志的名字写入到每个单独的日志中（每个日志的头部为此专门预留的空间中）并将每个独立日志的内容同步到磁盘中，并等待这些改变都已经真正写入到磁盘上。
Write the name of the master journal into all individual journals (in space set aside for that purpose in the headers of the individual journals) and flush the contents of the individual journals to disk and wait for those changes to reach the disk surface.
将所有数据库文件的改动都同步到磁盘上。等待所有的改动都真正写入到磁盘上。
Flush all database file changes to the disk. Wait for those changes to actually be written onto the disk surface.
删除主日志文件。这就是改动提交的时刻。在删除主日志之前如果发生了断电或崩溃，那么每个独立的日志文件都是活跃日志，在下个进程访问时都将进行回滚。在主日志删除之后，日志文件就都不是活跃日志了，改动也就持久化了。
Delete the master journal file. This is the instant when the changes are committed. Prior to deleting the master journal file, if a power failure or crash occurs, the individual file journals will be considered hot and will be rolled back by the next process that attempts to read them. After the master journal has been deleted, the file journals will no longer be considered hot and the changes will persist.
删除所有独立的日志文件。
Delete all individual journal files.
释放所有数据库文件上的EXCLUSIVE和PENDING锁。
Drop the EXCLUSIVE and PENDING locks from all database files.

5.1 写饥饿
5.1 Writer starvation

在SQLite2中，如果许多进程都读取数据库，那么可能会出现一个情况时，永远都有活跃的读取者。并且如果总是有至少一个读取锁加在数据库上，那么就没有进程能够修改数据库了，因为不可能获取到一个写锁。这种情况叫做写饥饿。
In SQLite version 2, if many processes are reading from the database, it might be the case that there is never a time when there are no active readers. And if there is always at least one read lock on the database, no process would ever be able to make changes to the database because it would be impossible to acquire a write lock. This situation is called writer starvation.

SQlite3中力争通过使用PENDING锁来避免写饥饿。PENDING锁允许已有的读者继续读取，但是会阻止新的读取者连接到数据库。所以，当一个进程想要写入一个繁忙的数据库时，可以先设置一个PENDING锁，用来阻止新的读取者进入。一旦已有的读取者都最终完成了读取，所有的SHARED锁最终都会释放，这时写入者就可以有机会来做修改了。
SQLite version 3 seeks to avoid writer starvation through the use of the PENDING lock. The PENDING lock allows existing readers to continue but prevents new readers from connecting to the database. So when a process wants to write a busy database, it can set a PENDING lock which will prevent new readers from coming in. Assuming existing readers do eventually complete, all SHARED locks will eventually clear and the writer will be given a chance to make its changes.

6.0 如何损坏你的数据库文件
6.0 How To Corrupt Your Database Files

pager模块时非常健壮的，但是依然会被破坏。这一节主要来明确和解释这个风险。（也可以参见原子提交一章中的能够导致错误的事情节）
The pager module is very robust but it can be subverted. This section attempts to identify and explain the risks. (See also the Things That Can Go Wrong section of the article on Atomic Commit.

毫无疑问，硬件或者操作系统错误导致错误的数据写入到数据库文件或日志中会导致问题。同样的，如果一个流氓程序打开数据库文件或者日志文件，并且在文件中写入一些畸形的数据，这样也会导致数据库损坏。由于对于这些类型的问题没有任何可做的事情，所以也就不再过多关注这个问题了。
Clearly, a hardware or operating system fault that introduces incorrect data into the middle of the database file or journal will cause problems. Likewise, if a rogue process opens a database file or journal and writes malformed data into the middle of it, then the database will become corrupt. There is not much that can be done about these kinds of problems so they are given no further attention.

SQLitezai Unix上使用POSIX推荐的锁来实现锁模块。在Windows上使用LockFile()、LockFileEx()和 UnlockFile()系统调用。SQLite假设这些系统调用都能像广告上说的一样工作。如果不是这样，那么结果可能就是数据库损坏了。需要注意的一点时，POSIX推荐的锁在许多NFS实现中是有很多bug的，甚至还有未实现的（包括当前版本的Mac OS X）。Windows上的网络文见系统也报告出许多锁的问题。最好的预防手段就是不要在网络文件系统上使用SQLite。
SQLite uses POSIX advisory locks to implement locking on Unix. On Windows it uses the LockFile(), LockFileEx(), and UnlockFile() system calls. SQLite assumes that these system calls all work as advertised. If that is not the case, then database corruption can result. One should note that POSIX advisory locking is known to be buggy or even unimplemented on many NFS implementations (including recent versions of Mac OS X) and that there are reports of locking problems for network filesystems under Windows. Your best defense is to not use SQLite for files on a network filesystem.

SQLite在Unix上使用fsync()系统调用来将数据同步到磁盘上，在Windows上则只使用FlushFileBuffers()来完成。SQLite假设这些操作系统服务函数都能稳定的工作。但是依然有报告说fsync()和FlushFileBuffers()会出现错误，特别是在廉价的IDE磁盘上。而且一些IDE磁盘设备中，当数据还在磁盘驱动中易失的内存缓存中时，就报告说数据已经写入到磁盘上了。还有报告显示Windows有时候会因为未知的原因有选择的忽略FlushFileBuffers()。作者无法验证所有的这些报告。但是如果这些时真的，这又意味着数据库可能会在意外断电中被损坏。这些都是硬件或者操作系统的漏洞，SQLite完全无法预防这些问题。
SQLite uses the fsync() system call to flush data to the disk under Unix and it uses the FlushFileBuffers() to do the same under Windows. Once again, SQLite assumes that these operating system services function as advertised. But it has been reported that fsync() and FlushFileBuffers() do not always work correctly, especially with inexpensive IDE disks. Apparently some manufactures of IDE disks have controller chips that report that data has reached the disk surface when in fact the data is still in volatile cache memory in the disk drive electronics. There are also reports that Windows sometimes chooses to ignore FlushFileBuffers() for unspecified reasons. The author cannot verify any of these reports. But if they are true, it means that database corruption is a possibility following an unexpected power loss. These are hardware and/or operating system bugs that SQLite is unable to defend against.

如果在Linux中，挂载ext3文件系统时没有在/etc/fstab中加入"barrier=1"选项，并且开启了磁盘驱动器的写入缓存，那么断电或者系统崩溃会导致文件系统发生错误。具体是否会发生错误取决于磁盘控制器的具体实现。在廉价的用户级磁盘上更容易发生错误，而使用了类似非易失写入缓存等高级特性的企业级的存储设备则很少发生错误。许多ext3专家都确认了这个行为告诉我们，大多数的Linux分发版本都没有使用barrier=1，并且没有禁用写入缓存，也就更容易受到这个问题的影响。注意，这些都是操作系统或者硬件的问题，对于这些问题SQLite是无能为力的。其他数据库引擎也同样会面对这些问题。
If a Linux ext3 filesystem is mounted without the "barrier=1" option in the /etc/fstab and the disk drive write cache is enabled then filesystem corruption can occur following a power loss or OS crash. Whether or not corruption can occur depends on the details of the disk control hardware; corruption is more likely with inexpensive consumer-grade disks and less of a problem for enterprise-class storage devices with advanced features such as non-volatile write caches. Various ext3 experts confirm this behavior. We are told that most Linux distributions do not use barrier=1 and do not disable the write cache so most Linux distributions are vulnerable to this problem. Note that this is an operating system and hardware issue and that there is nothing that SQLite can do to work around it. Other database engines have also run into this same problem.

如果崩溃或者意外断电导致生成了活跃日志，但是这个日志被删除了，那么下一个打开数据库的进程就不知道里面的修改需要回滚。从而导致数据库出现不一致状态。回滚日志可能会因为下列原因而被删除：
If a crash or power failure occurs and results in a hot journal but that journal is deleted, the next process to open the database will not know that it contains changes that need to be rolled back. The rollback will not occur and the database will be left in an inconsistent state. Rollback journals might be deleted for any number of reasons:

系统管理员可能会在系统崩溃或断电后做一些清理工作，当看到日志文件时会以为这是垃圾文件，然后删除掉。
An administrator might be cleaning up after an OS crash or power failure, see the journal file, think it is junk, and delete it.
一些人或进程可能会重命名数据库文件，但是在重命名响应的日志文件时却发生错误。
Someone (or some process) might rename the database file but fail to also rename its associated journal.
如果数据库文件还有别名（软连接或硬链接），然后使用了与创建日志时不同的别名来打开数据库，这时，日志就无法找到。为了避免这个问题，就不应该对SQlite数据库文件创建连接。
If the database file has aliases (hard or soft links) and the file is opened by a different alias than the one used to create the journal, then the journal will not be found. To avoid this problem, you should not create links to SQLite database files.
在断电后的文件系统故障可能会导致日志文件被重命名或者删除。
Filesystem corruption following a power failure might cause the journal to be renamed or deleted.

上面最后一条（第四条）记录还有一些备注信息。当SQLite在Unix上创建一个日志文件时，打开包含这个文件的目录，并且对目录调用fsync()，以此将目录信息写入到磁盘中。但是假设其他一些进程正在这个目录中添加或删除无关的文件的时候发生了意外断电，这些进程的无关操作可能导致日志文件被删除或者移动到"lost+found"中。这是一个很难遇到的场景，但是还是有可能发生。最好的预防方法就是使用一个日志文件系统或者只在一个目录中保存数据库和日志文件。
The last (fourth) bullet above merits additional comment. When SQLite creates a journal file on Unix, it opens the directory that contains that file and calls fsync() on the directory, in an effort to push the directory information to disk. But suppose some other process is adding or removing unrelated files to the directory that contains the database and journal at the moment of a power failure. The supposedly unrelated actions of this other process might result in the journal file being dropped from the directory and moved into "lost+found". This is an unlikely scenario, but it could happen. The best defenses are to use a journaling filesystem or to keep the database and journal in a directory by themselves.

对于涉及多个数据库的提交和主日志文件，如果这些数据库在不同的磁盘上，当提交过程中发生意外断电时，当机器恢复时磁盘可能会被挂载为不同的名字。或者一些磁盘可能没有挂载。当发生这个问题时，每个独立的日志文件和主日志可能无法找到对方。这种情况下最坏的结果就是提交无法保证原子性，一些数据库可能回滚了，而另一些可能没有。所有数据库都继续自一致。为了预防这个问题，应当保持所有数据库在同一个磁盘上，并且重新挂载磁盘时使用相同的名字。
For a commit involving multiple databases and a master journal, if the various databases were on different disk volumes and a power failure occurs during the commit, then when the machine comes back up the disks might be remounted with different names. Or some disks might not be mounted at all. When this happens the individual file journals and the master journal might not be able to find each other. The worst outcome from this scenario is that the commit ceases to be atomic. Some databases might be rolled back and others might not. All databases will continue to be self-consistent. To defend against this problem, keep all databases on the same disk volume and/or remount disks using exactly the same names after a power failure.

7.0 SQL层的事务控制
7.0 Transaction Control At The SQL Level

SQLite3对锁和并发控制的改动还为SQL语言层的事务工作方式引入了一些细微的改动。默认情况，SQLite3运行在自动提交模式下，在自动提交模式下，一旦所有与当前数据库连接相关的操作完成，就会将所有对数据库的修改提交。
The changes to locking and concurrency control in SQLite version 3 also introduce some subtle changes in the way transactions work at the SQL language level. By default, SQLite version 3 operates in autocommit mode. In autocommit mode, all changes to the database are committed as soon as all operations associated with the current database connection complete.

SQL命令 "BEGIN TRANSACTION"（TRANSACTION关键词是可选的）用来时SQLite跳出自动提交模式。注意，BEGIN命令不会向数据库请求任何锁。在BEGIN命令之后，当执行第一个SELECT语句的时候会获取一个SHARED锁。当执行第一条INSERT、UPDATE或DELETE语句时会获取一个RESERVED锁。只有到内存缓存已满必须写入到磁盘或者事务提交的时候才会获取EXCLUSIVE锁。通过这个方法，系统将对读取访问的阻塞延迟到最后时刻。
The SQL command "BEGIN TRANSACTION" (the TRANSACTION keyword is optional) is used to take SQLite out of autocommit mode. Note that the BEGIN command does not acquire any locks on the database. After a BEGIN command, a SHARED lock will be acquired when the first SELECT statement is executed. A RESERVED lock will be acquired when the first INSERT, UPDATE, or DELETE statement is executed. No EXCLUSIVE lock is acquired until either the memory cache fills up and must be spilled to disk or until the transaction commits. In this way, the system delays blocking read access to the file file until the last possible moment.

SQL命令"COMMIT"并没有真正就爱那个改动提交到磁盘上，它只是将自动提交重新打开。这样，在这个命令的末尾，标准的自动提交逻辑接管业务并且最终真正提交到磁盘上。SQL命令"ROLLBACK"也同样是将自动提交重新打开，但是还会设置一个标志来告诉自动提交逻辑进行回滚而不是提交。
The SQL command "COMMIT" does not actually commit the changes to disk. It just turns autocommit back on. Then, at the conclusion of the command, the regular autocommit logic takes over and causes the actual commit to disk to occur. The SQL command "ROLLBACK" also operates by turning autocommit back on, but it also sets a flag that tells the autocommit logic to rollback rather than commit.

如果SQL COMMIT命令打开了自动提交，然后自动提交逻辑尝试提交所有改动，但是，因为其他进程持有SHARED锁而导致失败了，那么自动提交会重新自动关闭。这就允许用户等SHARED可能已经释放的时候再一次尝试COMMIT。
If the SQL COMMIT command turns autocommit on and the autocommit logic then tries to commit change but fails because some other process is holding a SHARED lock, then autocommit is turned back off automatically. This allows the user to retry the COMMIT at a later time after the SHARED lock has had an opportunity to clear.

如果在同一个SQLite数据库连接上同时执行多个命令，自动提交会延时到最后一个命令完成时才打开。例如，如果执行了一个SELECT语句，这个命令的执行在每行数据都返回来之前会暂停。在这个暂停期间，其他的INSERT、UPDATE或DELETE命令可以在这个数据库上的其它表上执行。但是在最初的SELECT语句完成之前，这些改动是不会提交的。
If multiple commands are being executed against the same SQLite database connection at the same time, the autocommit is deferred until the very last command completes. For example, if a SELECT statement is being executed, the execution of the command will pause as each row of the result is returned. During this pause other INSERT, UPDATE, or DELETE commands can be executed against other tables in the database. But none of these changes will commit until the original SELECT statement finishes.

1.0 SQLite3中的文件锁与并发 1.0 File Locking And Concurrency In SQLite Version 3

2.0 概览 2.0 Overview

3.0 锁 3.0 Locking

4.0 回滚日志 4.0 The Rollback Journal

4.1 处理活跃日志 4.1 Dealing with hot journals

4.2 删除无效的主日志 4.2 Deleting stale master journals

5.0 写入数据库文件 5.0 Writing to a database file

5.1 写饥饿 5.1 Writer starvation

6.0 如何损坏你的数据库文件 6.0 How To Corrupt Your Database Files

7.0 SQL层的事务控制 7.0 Transaction Control At The SQL Level