Distributed Database Management
With eXtremeDB for High Performance Computing
What to choose: review our chart of distributed database choices and use cases
eXtremeDB for HPC delivers the benefits of distributed database management via distributed query processing, clustering and high availability options.
Distributed Query Processing
eXtremeDB for HPC partitions, or shards, a database and distributes query processing across multiple servers, CPUs and/or CPU cores. Performance is accelerated — dramatically, in some cases — via parallel execution of database operations and by harnessing the capabilities of many host computers rather than just one.
The benefits of distributed query processing are evident in McObjects recent STAC-M3 benchmarks. We partnered with E8 Storage, IBM, and Lucera Financial Infrastructures to name a few. In these tests, the eXtremeDB database was partitioned horizontally across up to 128 shards, resulting in record-setting performance managing tick data. Please use the following link to review a summary of the benchmark records.
eXtremeDB offers different distributed database options to address different objectives. Learn about Sharding with eXtremeDB, or review the table below that lists different distributed database uses and options.
Sharding
eXtremeDB offers ultra-fast, elastically scalable data management with sharding. Databases are partitioned (“sharded”), with each partition/shard managed by an instance of the DBMS server. Shards are typically distributed on a storage array (which may be a SAN) – with each server keeping a CPU core busy – or distributed across different physical servers with their own storage systems.
Read about using the eXtremeSQL distributed SQL engine in our on-line documentation
Learn more about eXtremeDB distributed query processing
Learn about Sharding for elastic scalability
High Availability
High availability enables deployment of a master database and one or more synchronized replica databases. Replication is between separate hardware instances and features application-directed fail-over with strategies that include 2-safe (synchronous) and 1-safe (asynchronous). It delivers “five nines” (99.999% uptime) reliability, or better, with eXtremeDB for HPC’s unsurpassed performance. In addition, read-only replicas are available to support distribution/load-balancing of database query/analysis/reporting requirements.
Read about High Availability in our on-line documentation
Learn more about eXtremeDB time-cognizant eager replication
Read about High Availability in our on-line documentation
Learn more about eXtremeDB time-cognizant eager replication
Clustering
In clustered deployments, every eXtremeDB HPC database instance serves as a master. This means that changes to one node are efficiently replicated to others. It is unique as the first clustering database system to offer an embedded architecture. The database system runs within the application process at every node, eliminating the need for separate client and server modules.
Read about clustering our on-line documentation
Learn more about eXtremeDB independently audited speed records.
Distributed Database Options – Contrast and Compare
Which distributed database option best fits your needs?
Flexible eXtremeDB offers professional developers the tools they need to tailor their data management needs. The table below summarizes the primary purpose and characteristics of different distributed database options and objectives (some of which may be combined, e.g. Sharding, Cluster and High Availability).
Sharding | High Availability | Cluster | IoT | |
Primary purpose | Scalable database applications that require maximum CPU, memory and storage utilization to serve large data sets with a high degree of resource efficiency | Database applications that require five 9s availability and instant switch-over. Supports the distribution of read-only workloads (read load balancing) | Applications that require distributed, cooperative computing and a resilient topology with five 9s availability. Cluster supports distribution of all workloads (read- and write load balancing) on modest sized networks | Data aggregation from a large number of data collection points. Smart data containers to support sporadic connectivity. Advanced server-side analytics for aggregated data |
Replication | When combined with HA | Master-slave replication. Synchronous, Asynchronous | Multi-master replication. Synchronous | On-demand, based on connection state, data modification events, timers, and more |
Scalability | Elastic, near liner scalability with added shards | Near linear read scalability. Read requests can be distributed across multiple nodes | Near linear read scalability. Overall scalability is a function of the workload (% read-only versus read-write transactions). | Server-side performance can be increased with added cores & sharding |
Reliability and Fault-tolerance | When combined with HA | Fault tolerant | Fault tolerant | Containers are durable even with sparse connectivity. Server-side can be made reliable through the normal means — clustering and HA |
Concept and Distribution Topology | A logical database is horizontally partitioned — physically split into multiple (smaller) parts called shards; shards may reside on separate servers to spread the load or on the same server to better utilize multiple CPU cores. eXtremeDB’s SQL engine handles query distribution and presents the distributed database as a single logical database | A single master database receives all modifications (insert/update/delete operations) and replicates transactions to replicas. In the event of a failure, one replica is elected as new master | Multi-master architecture in which each node can apply modifications (insert/update/delete). Each transaction is synchronously propagated to all nodes, keeping copies of the database identical (consistent). Database reads are always local (and fast). Writes are longer, but don’t block the database —high concurrency is achieved through Optimistic Concurrency Control. | Push data from IoT Edge to aggregation points (Gateways and/or Servers) for analytics. Push data down to the edge, usually for new device configuration/provisioning. Controlled through push/pull interfaces and/or automatic data exchange between collection points and servers. |
We want to help with your next project. Contact us to discuss your distributed database options and objectives.
Distributed Database Options – Contrast and Compare
Which distributed database option best fits your needs?
Flexible eXtremeDB offers professional developers the tools they need to tailor their data management needs. The table below summarizes the primary purpose and characteristics of different distributed database options and objectives (some of which may be combined, e.g. Sharding, Cluster and High Availability).
Sharding
Primary purpose: Scalable database applications that require maximum CPU, memory and storage utilization to serve large data sets with a high degree of resource efficiency.
Replication: When combined with HA
Scalability: Elastic, near liner scalability with added shards
Reliability and Fault-tolerance: When combined with HA
Concept and Distribution Topology: A logical database is horizontally partitioned — physically split into multiple (smaller) parts called shards; shards may reside on separate servers to spread the load or on the same server to better utilize multiple CPU cores. eXtremeDB’s SQL engine handles query distribution and presents the distributed database as a single logical database.
High Availability
Primary purpose: Database applications that require five 9s availability and instant switch-over. Supports the distribution of read-only workloads (read load balancing).
Replication: Master-slave replication. Synchronous, Asynchronous
Scalability: Near linear read scalability. Read requests can be distributed across multiple nodes.
Reliability and Fault-tolerance: Fault tolerant
Concept and Distribution Topology: A single master database receives all modifications (insert/update/delete operations) and replicates transactions to replicas. In the event of a failure, one replica is elected as new master.
Cluster
Primary purpose: Applications that require distributed, cooperative computing and a resilient topology with five 9s availability. Cluster supports distribution of all workloads (read- and write load balancing) on modest sized networks.
Replication: Multi-master replication. Synchronous
Scalability: Near linear read scalability. Overall scalability is a function of the workload (% read-only versus read-write transactions).
Reliability and Fault-tolerance: Fault tolerant
Concept and Distribution Topology: Multi-master architecture in which each node can apply modifications (insert/update/delete). Each transaction is synchronously propagated to all nodes, keeping copies of the database identical (consistent). Database reads are always local (and fast). Writes are longer, but don’t block the database —high concurrency is achieved through Optimistic Concurrency Control.
IoT
Primary purpose: Data aggregation from a large number of data collection points. Smart data containers to support sporadic connectivity. Advanced server-side analytics for aggregated data
Replication: On-demand, based on connection state, data modification events, timers, and more.
Scalability: Server-side performance can be increased with added cores & sharding
Reliability and Fault-tolerance:
- Containers are durable even with sparse connectivity.
- Server-side can be made reliable through the normal means — clustering and HA
Concept and Distribution Topology:
- Push data from IoT Edge to aggregation points (Gateways and/or Servers) for analytics.
- Push data down to the edge, usually for new device configuration/provisioning.
- Controlled through push/pull interfaces and/or automatic data exchange between collection points and servers.