Distributed Database Contrast and Compare

eXtremeDB offers different distributed database options to address different objectives. The table below summarizes primary purpose and characteristics of each distributed database option (some of which may be combined, e.g. Sharding, Cluster and High Availability).

Sharding  High Availability Cluster IoT
Primary purpose Scalable database applications that require maximum CPU, memory and storage utilization to serve large data sets with a high degree of resource efficiency Database applications that require five 9s availability and instant switch-over. Supports the distribution of read-only workloads (read load balancing) Applications that require distributed, cooperative computing and a resilient topology with five 9s availability. Supports distribution of all workloads (read- and write load balancing) on modest sized networks Data aggregation from a large number of data collection points. Smart data containers to support sporadic connectivity. Advanced server-side analytics for aggregated data
Replication When combined with HA Master-slave replication. Synchronous, Asynchronous Multi-master replication. Synchronous On-demand, based on connection state, data modification events, timers, and more
Scalability Elastic, near liner scalability with added shards Near linear read scalability. Read requests can be distributed across multiple nodes Near linear read scalability. Overall scalability is a function of the workload (% read-only versus read-write transactions). Server-side performance can be increased with added cores & sharding
Reliability and Fault-tolerance When combined with HA Fault tolerant Fault tolerant

Containers are durable even with sparse connectivity.

Server-side can be made reliable through the normal means — clustering and HA

Concept and Distribution Topology A logical database is horizontally partitioned — physically split into multiple (smaller) parts called shards; shards may reside on separate servers to spread the load or on the same server to better utilize multiple CPU cores. eXtremeDB’s SQL engine handles query distribution and presents the distributed database as a single logical database A single master database receives all modifications (insert/update/delete operations) and replicates transactions to replicas. In the event of a failure, one replica is elected as new master Multi-master architecture in which each node can apply modifications (insert/update/delete). Each transaction is synchronously propagated to all nodes, keeping copies of the database identical (consistent). Database reads are always local (and fast). Writes are longer, but don’t block the database —high concurrency is achieved through Optimistic Concurrency Control.

Push data from IoT Edge to aggregation points (Gateways and/or Servers) for analytics.

Push data down to the edge, usually for new device configuration/provisioning.

Controlled through push/pull interfaces and/or automatic data exchange between collection points and servers.

Sharding 

Primary purpose: Scalable database applications that require maximum CPU, memory and storage utilization to serve large data sets with a high degree of resource efficiency.

Replication: When combined with HA

Scalability:  Elastic, near liner scalability with added shards

Reliability and Fault-tolerance:  When combined with HA 

Concept and Distribution Topology:  A logical database is horizontally partitioned — physically split into multiple (smaller) parts called shards; shards may reside on separate servers to spread the load or on the same server to better utilize multiple CPU cores. eXtremeDB’s SQL engine handles query distribution and presents the distributed database as a single logical database.

  

High Availability

Primary purpose:  Database applications that require five 9s availability and instant switch-over. Supports the distribution of read-only workloads (read load balancing).

Replication:  Master-slave replication. Synchronous, Asynchronous

Scalability:  Near linear read scalability. Read requests can be distributed across multiple nodes.

Reliability and Fault-tolerance:  Fault tolerant

Concept and Distribution Topology:  A single master database receives all modifications (insert/update/delete operations) and replicates transactions to replicas. In the event of a failure, one replica is elected as new master.

 

Cluster

Primary purpose:  Applications that require distributed, cooperative computing and a resilient topology with five 9s availability. Supports distribution of all workloads (read- and write load balancing) on modest sized networks.

Replication:  Multi-master replication. Synchronous

Scalability:  Near linear read scalability. Overall scalability is a function of the workload (% read-only versus read-write transactions).

Reliability and Fault-tolerance:  Fault tolerant

Concept and Distribution Topology:  Multi-master architecture in which each node can apply modifications (insert/update/delete). Each transaction is synchronously propagated to all nodes, keeping copies of the database identical (consistent). Database reads are always local (and fast). Writes are longer, but don’t block the database —high concurrency is achieved through Optimistic Concurrency Control.

 

IoT

Primary purpose:  Data aggregation from a large number of data collection points. Smart data containers to support sporadic connectivity. Advanced server-side analytics for aggregated data

Replication:  On-demand, based on connection state, data modification events, timers, and more.

Scalability:  Server-side performance can be increased with added cores & sharding

Reliability and Fault-tolerance:  

  • Containers are durable even with sparse connectivity.
  • Server-side can be made reliable through the normal means — clustering and HA

 Concept and Distribution Topology:

  • Push data from IoT Edge to aggregation points (Gateways and/or Servers) for analytics.
  • Push data down to the edge, usually for new device configuration/provisioning.
  • Controlled through push/pull interfaces and/or automatic data exchange between collection points and servers.