eXtremeDB HA Theory of Operation

Computing environments configured to provide nearly full-time availability are known as high availability systems. Typically they have redundant hardware and software and do not have single “points-of-failure”, i.e. hardware or software components that could fail. When failures occur, the failover process moves processing performed by the failed component to the backup component. This requires that highly available systems have accurate instance monitoring or heartbeat mechanisms, and they must be able to quickly and accurately synchronize resources during failover. As a part of high availability systems, databases often must be highly available.

Consider a few highly available, database-equipped systems that provide service with absolutely no down time:

Industrial controllers incorporate an in-memory database that holds measurements made throughout the factory by sensors attached to machinery

Internet IP routers use in-memory databases to maintain their internal storage subsystems – the routing tables

Aircraft control and navigation systems. An aircraft has multiple data sources such as antennas, pilot tubes, gyros, etc. It must collect data from all these devices in addition to monitoring its own infrastructure -- power, telemetry, etc.

Operating room theater, organ transport and other medical equipment

All of these systems require reliable failover processes and high availability data stores. To achieve high availability, a data store will normally offer a means of maintaining copies of data. The traditional mechanism for implementing a high availability database application is called database replication (as illustrated below). In this solution fail-over procedures allow the system to continue using the database.

A replicated database is one in which transactions are replicated at different failure-independent nodes. The node at which a transaction was initiated is referred to as a master or a primary node. The node to which the transaction is replicated is referred to as a replica or a secondary node. Database changes are propagated from the master to the

secondary nodes within database transactions.

Replica control mechanisms configured at the Master node facilitate data propagation between the Master site and Replica copies of the database. These mechanisms can be categorized according to when updates – changes introduced by transactions – are propagated to all database copies.

Update propagation can be done within or outside transaction boundaries. In the first case, replication is termed eager or synchronous. If it occurs outside the boundaries of a transaction, the replication is termed lazy or asynchronous. Regardless of when the propagation occurs, the unit of propagation is a transaction (i.e. all insert/update/delete operations within a transaction are applied together, and succeed or fail together).

Eager Replication

Eager (synchronous) replication (illustrated below) provides data consistency in a straightforward way, and the quickest application recovery time when a fault occurs. No transactions are ever lost in the eager replication scheme, there is no overhead associated with node synchronization during fail-over, and the replica database is available immediately. At the same time, synchronous replication has a higher processing cost and higher communications overhead that can increase the response times during normal use.

Lazy Replication

In contrast to eager replication, lazy replication schemes propagate updates to replica nodes asynchronously and after the transaction is committed on the master node (illustrated below). These updates are applied to replica nodes as separate transactions. Compared to eager propagation, lazy update propagation can improve transaction responsiveness by saving on the message round-trip within the transaction. However, since updates are applied to replica nodes asynchronously, replica transactions run the risk of operating with stale data or falsely reporting certain data as available if a sequence of updates and lookups occurs. For example, a replica application can read a data element that has been removed by a master node transaction, if the read occurs before this transaction is propagated to the replica node.

Time-cognizant eager replication

To facilitate predictable fail-over and other response times in the face of unpredictable network latency, time-cognizant eager replication protocols can be used (illustrated below) . Because embedded and other real-time systems frequently impose strict processing deadlines, the time-cognizant approach ensures on-time delivery of the transaction data from master to replica sites. As with any synchronous replication scheme, fail-over procedures are extremely short.

Terminology

Master database The main database instance
Master application (or simply “master”) The process that creates and maintains the master database. In some cases (usually shared memory applications) there can be more than one application acting as master. In these cases one application will be the “main master” while all other applications acting as master will be referred to as “secondary master”.
Standby database or replica database A database that holds a copy of the master database
Replica application (or simply “replica”) A process that interacts with the replica database. There may be more than one process acting as replica. Replicas should be mirror images of masters, differing only in the role they play a given moment in time.
Synchronization The process of initializing a newly attached replica by sending a copy of the master’s database to the replica
Replication The process of transmitting committed transactions from the master to the replica(s)