The eXtremeDB MVCC Transaction Manager

The Multi-Versioning Concurrency Control (MVCC) transaction manager enhances applications’ database concurrency management options. With the MVCC model, while querying a database each transaction sees a snapshot of committed data, regardless of any in-progress transactions belonging to other tasks. This protects the transaction from viewing inconsistent data that could be caused by other transaction updates on the same set of objects or indexes, thus providing transaction isolation for each transaction. The MVCC manager allows applications to choose how transactions are isolated from each other by setting the transaction isolation level at runtime.

Isolation Level

When the transaction start API is called without specifying an isolation level, the transaction isolation level is set to MCO_DEFAULT_ISOLATION_LEVEL, which is by default MCO_REPEATABLE_READ for MVCC.

Conflict Management

When MVCC is used with other than MCO_SERIALIZABLE, then MCO_READ_WRITE transactions are executed concurrently. Sometimes concurrent transactions modify the same objects, thus creating transaction conflicts. The transaction manager resolves those conflicts by aborting one of the conflicting transactions and letting the other one commit its updates to the database. When a transaction is aborted, the application receives the MCO_E_CONFLICT error code. It is the application’s responsibility to manage this possibility with logic similar to the following:

    do {
        mco_trans_start( db, MCO_READ_WRITE, MCO_TRANS_FOREGROUND, &t);
        ...<update database>...
        rc = mco_trans_commit(t);
    } while ( rc == MCO_E_CONFLICT );

Note that when MVCC is used, the application must be able to tolerate transaction rollbacks due to conflicts as described above.

If the number of conflicts is too high, it could lead to sharp performance degradation due to the need to retry transactions. When this occurs, the transaction manager temporarily changes the isolation level to MCO_SERIALIZABLE. The application can set the conflicts threshold over which the optimistic control is disabled. This is done in C applications by calling mco_trans_optimistic_threshold(). (This functionality is only available in the C API.)

Also in C applications, if the percentage of transactions that have been aborted because of the transaction conflict exceeds the max_conflicts_percent, the transaction isolation level is changed to MCO_SERIALIZABLE for disable_period successive transactions. MCO_SERIALIZABLE permits a single MCO_READ_WRITE transaction at a time (eliminating the potential for conflicts), that can run in parallel with MCO_READ_ONLY transactions. By default the optimistic threshold is set to 100 (which means “never disable optimistic mode no matter how many conflicts occur”).

Object Versions and Cleanup

With MVCC every transaction that changes objects in the database spawns versions of those objects. The versions are visible depending on transaction order and active isolation level. When a version becomes not needed by any transaction it should be deleted as it consumes memory. On the other hand, all object versions are included in the indexes defined for that class, so excluding them requires re-balancing which requires a lock on the indexes and a decrease in performance. This object version cleanup procedure is referred to as Garbage Collection.

Garbage Collection Policies

The Garbage Collection (GC) policy can be set a runtime with the C API mco_trans_set_gc_policy(). The possible policies are:

The default behavior is MCO_GC_ON_MODIFICATION, which is generally the best practice. This means that, for each MCO_READ_WRITE transaction, the eXtremeDB runtime tries to collect all "garbage" immediately on completion of the transaction. But if there are long living transactions, then old object versions will not be removed for a long time. In this case GC at each transaction commit will just waste time and CPU. For this reason it is also possible, and can be optimal, to spawn one or more separate GC threads. (It is possible to specify the number of GC threads in the configuration file when starting xSQL as server.)

But consider the case where there are large number of MCO_READ_ONLY transactions and rare update (MCO_READ_WRITE) transactions. The update transaction will not be able to perform cleanup because old versions may still be accessed by MCO_READ_ONLY transactions which in turn do not perform cleanup because of the MCO_GC_ON_MODIFICATION policy. This means that all versions will persist until the next portion of updates which will cause degradation of performance. In this case the MCO_GC_ALWAYS policy is preferable.

The MCO_GC_DISABLED policy is intended to be used to limit garbage collection to dedicated sessions (connections). In this scenario, the worker sessions will have this policy, whereas a dedicated session (or multiple sessions) with policy MCO_GC_ALWAYS will collect garbage in separate threads.

Special Case for Persistent Databases

A special case is persistent (disk) objects when one or more GC threads use policy MCO_GC_ALWAYS. If a MCO_READ_ONLY transaction tries to perform GC and deletes some version(s) then it actually becomes a MCO_READ_WRITE transaction and requires a write to the transaction log. This is why the default policy is MCO_GC_ON_MODIFICATION; so that MCO_READ_ONLY transactions do not try to perform GC.

Recovering unused data storage space

When the MVCC transaction manager is used, in the case of a crash, a persistent database can contain undeleted old versions and working copies. Their presence will not break the consistency of the database and doesn't prevent the normal working of an application, but does unnecessarily consume space. Detecting these stale object versions requires a complete scan of the database. For this reason the recovery process doesn't perform this function automatically. Instead, the removal of the unused versions is performed explicitly in C/C++ applications by calling the mco_disk_database_vacuum() function:

    MCO_RET mco_disk_database_vacuum(mco_db_h con);

Note that mco_disk_database_vacuum() requires exclusive access to the database, so no operations can be performed on the database until the vacuum operation is complete and the function has returned control back to the application.

Alternatively, the application can enable the repair process by setting the MCO_DB_MODE_MVCC_AUTO_VACUUM mode mask in the mco_db_params_t when calling mco_db_open_dev().