Change Data Capture
Change Data Capture is central to any database management system.
Change Data Capture in eXtremeDB
Change Data Capture (CDC) is broadly defined as tracking changes in a database. The purposes of tracking changes are many and varied. CDC in eXtremeDB is implemented in several different ways that are sometimes invisible to applications (i.e., for eXtremeDB’s own database management purposes) and ways that applications can exploit for data sharing, responding to events and incremental back up.
The first, and possibly most obvious, implementation of CDC in eXtremeDB is part-and-parcel of implementing the ACID properties of transactions: Atomicity, Consistency, Isolation and Durability. The successful application of a transaction to a database moves that database from one consistent state to a new consistent state. Conversely, the unsuccessful application of a transaction to the database must return the database to the consistent state that existed just prior to the unsuccessful application of the transaction. To accomplish these requirements, a database management system must keep track of the changes. Implementation details vary from a pure in-memory database, a persistent in-memory database, and a database that is partially or fully persistent (a hybrid database).
In the case of a pure in-memory database, data changes are written directly to the database after keeping a before-image in a buffer. If the transaction is subsequently aborted, the before images are put back to the database. If the transaction is committed, which is the normal flow, the before-image buffer is simply discarded which is very fast.
For in-memory databases with persistence, all changes are also appended to a transaction log that is stored on persistent media, that can be replayed for recovery after a crash.
For persistent (disk-based) databases, transaction logging is also utilized, both to optimize performance and to support recovery from a crash. In this case, two forms of transaction logging are offered by eXtremeDB: UNDO and REDO logging. REDO logging is as described in the previous paragraph. UNDO logging is like the logging mechanism described for pure in-memory databases except that the data is maintained on persistent storage, not in memory. In the event of a crash, the UNDO log information is used to rollback an incomplete transaction (i.e., to return the database to the last consistent state).
Another internal use of CDC in eXtremeDB is in the implementation of optimistic concurrency control in the MVCC (Multi-Version Concurrency Control) transaction manager. Optimistic concurrency control means that applications do not have to acquire locks, which also means that an application never has to wait for a lock that is held by another application. This requires eXtremeDB to know if two applications attempt to modify the same database object at the same time, so eXtremeDB maintains version numbers that are checked when a transaction is committed. If an object’s version has changed between the time the application acquired a copy of the object and the time when the application wants to commit a change to that object, it means that another application modified the underlying object first and this transaction must be aborted and retried. The theory behind MVCC is that such conflicts are rare and that an occasional retry is more efficient, in the large, than always having to acquire locks and potentially blocking other applications with those locks.
High Availability and Cluster
CDC is also used in eXtremeDB High Availability (HA) and eXtremeDB Cluster – two distributed database options. eXtremeDB HA offers a solution for systems that require “five 9s” availability (i.e., 99.999% uptime) and is an Active/Passive architecture, meaning that replica databases are read-only; all insert/update/delete operations are carried out on the active node and replicated to passive nodes. eXtremeDB Cluster can also offer 99.999% uptime, but in this case in an Active/Active architecture that offers greater scalability at the cost of some flexibility and simplicity. The approach is very similar to REDO transaction logging: Changes affected within a transaction are accumulated in a buffer which is transmitted upon transaction commit.
Beyond providing persistence to in-memory databases
CDC is utilized in eXtremeDB for three key features that can be used within applications: Data Relay, Persistent Queue of Events, and Incremental Backup.
Data relay is our open replication mechanism. While eXtremeDB HA and eXtremeDB Cluster make it very easy to replicate between eXtremeDB instances, they are not a solution when it is necessary to replicate from eXtremeDB to another destination. There are, of course, third party products that attempt to fill this gap, such as Actian™ DataConnect and Oracle GoldenGate, though usually through an extract-transform-load (ETL) paradigm, which can be overkill and sensitive to changes in the source and/or destination databases. Data Relay is unidirectional (from eXtremeDB to an external system), and because it is an integral part of eXtremeDB, it is not sensitive to changes in eXtremeDB data structures. Data Relay works by providing an API (application programming interface) to traverse CDC information. Applications can detect transaction boundaries and inspect the contents of objects that have been inserted, deleted or updated so that those same actions can be reflected in an external system.
eXtremeDB’s Persistent Queue of Events is like Data Relay but uses events to determine what data is pushed. Applications define events on tables (row granularity for insert/delete, optional column granularity for update) and then ‘subscribe’, if you will, by instantiating a pipe and passing the pipe to eXtremeDB. In its turn, eXtremeDB fills the pipe with event data while the application pulls event data out of the other end of the pipe. Any number of pipes can be established between eXtremeDB and consumers.
The last significant purpose of Change Data Capture in eXtremeDB is in the implementation of its incremental backup facility which, as an embedded database system, is programmatically controlled in the application. Incremental backup starts with a full snapshot of the database at a moment in time. Thereafter, at appropriate intervals, the backup procedure appends so-called backup records to the backup image where each such record consists of a header, database page images, and a footer. Naturally, to know what database pages to include in a backup record, eXtremeDB must track changes to database pages in between a snapshot or backup record and the next (current) backup record.
Change Data Capture is central to any database management system. It is instrumental in the implementation of enforcing the ACID properties of concurrency and durability, replication, triggers/event notifications, and backup and restore.