Incremental Backup Implementation

The incremental backup process creates a backup file which stores database pages in blocks marked by a binary session header and footer structure (called a backup record). A single backup file can contain several independent backups, possibly of different types (full or incremental) or with different tags or timestamps.

For all-in-memory databases, only the backup file is created. For persistent databases, the database close operation causes an additional Backup Map file to be created.

Backup Map

This map file contains information used to determine the pages that were modified between one incremental backup and another. For a persistent database, the eXtremeDB runtime maintains counters (the backup bitmap). When the application shuts the database down, the state of the counters needs to be saved and then restored when the application opens the database again.

Without this map file a full database snapshot would be needed after every application shutdown and restart since the backup procedure would have no data about modifications made to the database pages. So, when an application opens a persistent database with incremental backup enabled, the eXtremeDB runtime will create the backup bitmap file on every database shutdown, whether the backup API is actually called or not.

The backup map keeps track of database pages processed by the backup algorithm. To perform an incremental backup, the runtime needs to track modifications of all the database pages. This is done by using integer counters to indicate the pages' content versions. Every time a database page is modified the runtime increases a counter associated with this page.

It would be untenable to keep a separate counter for each database page since a database could require several terabytes of storage. So the runtime instead keeps a counter for a cluster of pages. The size of that cluster depends on the parameter backup_map_size which determines the total number of counters for the database. The runtime calculates how many counters can fit in the specified backup_map_size and then how many pages will be served by a counter (a cluster). Consequently, the number of pages in a cluster is less for a larger map which means that the backup procedure will store fewer pages for every modified page to the backup file. Therefore, the backup process will be faster and the backup file will be smaller. On the other hand, the map is allocated from the database memory buffer, so the buffer needs to be bigger for bigger maps.

The page modification tracking mechanism is similar for in-memory and persistent databases (and for hybrid databases the in-memory part and the persistent part of a database). The difference is the location of the page version counters: for a persistent database, the counters reside in the backup bitmap and for an in-memory database, the counters are contained in the data pages. The size of the memory allocated for the database is known from the database open call, so there is no need for the backup_map_size parameter for the in-memory part as the runtime calculates this automatically. But for persistent databases, a reasonable size should be specified.

A reasonable map size can be calculated as follows:

 
    sizeof(int) * <expected-persistent-part-size-in-bytes> / <disk-page-size> / <disk-pages-per-backup-cluster>
     

where disk-pages-per-backup-cluster is the number of pages that will be stored into the backup map if even a single page of that cluster is modified. For example, suppose that the disk-page-size is 4K and we would like to store modifications in bunches of 64K; then disk-pages-per-backup-cluster would be 64 / 4 = 16. Then to calculate backup_map_size for a 1G database on a 64 bit system (sizeof(int) = 4), the math would be as follows:

     
    4 * 1G / 4K / 16 = (4G / 4K) / 16 = 1M / 16 = 64K
     

If no backup_map_size is specified, then the eXtremeDB runtime calculates it using the maximum persistent database size, which is MCO_INFINITE_DATABASE_SIZE (1 terabyte for 64-bit databases and 4G for 32-bit databases); so the default backup_map_size is calculated as:

 
    64bit: 4 * 1T / 4K / 16 = (4T / 4K) / 16 = 1G / 16 = 64M
     
    32bit: 4 * 4G / 4K / 16 = (16G / 4K) / 16 = 4M / 16 = 256K
     
     

Backup Process

The backup is an iterative process controlled by the two conditions backup_max_passes and backup_min_pages. The procedure periodically scans the database for modifications while it works in parallel with other application activities that may interact with the database runtime. On each pass it counts the number of modified pages. When it reaches the threshold determined by parameter backup_min_pages or backup_max_passes, the final exclusive backup pass locks the database for the final pass when it writes modified pages to the backup file. The parameter backup_max_passes defines a limit on the number of backup procedure scans of the database. If the database usage is such that so few pages are modified that the backup_min_pages threshold may not be reached, backup_max_passes can be set to a small number of passes so that when it is reached the runtime will block database operations and initiate the final exclusive backup pass.