The incremental backup process creates a backup file which stores database pages in blocks marked by a binary session header and footer structure (called a backup record). A single backup file can contain several independent backups, possibly of different types (full or incremental) or with different tags or timestamps.
For all-in-memory databases, only the backup file is created. For persistent databases, the database close operation causes an additional Backup Map file to be created.
This map file contains information used to determine the pages that were modified between one incremental backup and another. For a persistent database, the eXtremeDB runtime maintains counters (the backup bitmap). When the application shuts the database down, the state of the counters needs to be saved and then restored when the application opens the database again.
Without this map file a full database snapshot would be needed after every application shutdown and restart since the backup procedure would have no data about modifications made to the database pages. So, when an application opens a persistent database with incremental backup enabled, the eXtremeDB runtime will create the backup bitmap file on every database shutdown, whether the backup API is actually called or not.
The backup map keeps track of database pages processed by the backup algorithm. To perform an incremental backup, the runtime needs to track modifications of all the database pages. This is done by using integer counters to indicate the pages' content versions. Every time a database page is modified the runtime increases a counter associated with this page.
It would be untenable to keep a separate counter for each database page since a database could require several terabytes of storage. So the runtime instead keeps a counter for a cluster of pages. The size of that cluster depends on the parameter
backup_map_sizewhich determines the total number of counters for the database. The runtime calculates how many counters can fit in the specified
backup_map_sizeand then how many pages will be served by a counter (a cluster). Consequently, the number of pages in a cluster is less for a larger map which means that the backup procedure will store fewer pages for every modified page to the backup file. Therefore, the backup process will be faster and the backup file will be smaller. On the other hand, the map is allocated from the database memory buffer, so the buffer needs to be bigger for bigger maps.
The page modification tracking mechanism is similar for in-memory and persistent databases (and for hybrid databases the in-memory part and the persistent part of a database). The difference is the location of the page version counters: for a persistent database, the counters reside in the backup bitmap and for an in-memory database, the counters are contained in the data pages. The size of the memory allocated for the database is known from the database open call, so there is no need for the
backup_map_sizeparameter for the in-memory part as the runtime calculates this automatically. But for persistent databases, a reasonable size should be specified.
A reasonable map size can be calculated as follows:sizeof(int) * <expected-persistent-part-size-in-bytes> / <disk-page-size> / <disk-pages-per-backup-cluster>
disk-pages-per-backup-clusteris the number of pages that will be stored into the backup map if even a single page of that cluster is modified. For example, suppose that the
disk-page-sizeis 4K and we would like to store modifications in bunches of 64K; then
64 / 4 = 16. Then to calculate
backup_map_sizefor a 1G database on a 64 bit system (sizeof(int) = 4), the math would be as follows:4 * 1G / 4K / 16 = (4G / 4K) / 16 = 1M / 16 = 64K
backup_map_sizeis specified, then the eXtremeDB runtime calculates it using the maximum persistent database size, which is
MCO_INFINITE_DATABASE_SIZE(1 terabyte for 64-bit databases and 4G for 32-bit databases); so the default
backup_map_sizeis calculated as:64bit: 4 * 1T / 4K / 16 = (4T / 4K) / 16 = 1G / 16 = 64M 32bit: 4 * 4G / 4K / 16 = (16G / 4K) / 16 = 4M / 16 = 256K
The backup is an iterative process controlled by the two conditions
backup_min_pages. The procedure periodically scans the database for modifications while it works in parallel with other application activities that may interact with the database runtime. On each pass it counts the number of modified pages. When it reaches the threshold determined by parameter
backup_max_passes, the final exclusive backup pass locks the database for the final pass when it writes modified pages to the backup file. The parameter
backup_max_passesdefines a limit on the number of backup procedure scans of the database. If the database usage is such that so few pages are modified that the
backup_min_pagesthreshold may not be reached,
backup_max_passescan be set to a small number of passes so that when it is reached the runtime will block database operations and initiate the final exclusive backup pass.