A distributed database system is one in which the data belonging to a single logical database is distributed to two or more physical databases.

Beyond that simple definition, there are a confusing number of possibilities for when, how, and why the data is distributed.  Some are applicable to edge and/or fog computing, some others are applicable to fog and/or cloud computing, and some are applicable across the entire spectrum of edge, fog and cloud computing.  See the table farther down this page for more information.

eXtremeDB Distributed Database System & Distributed Query Processing

With this capability, eXtremeDB for HPC horizontally partitions, or shards, a database. Each shard is a physical database and all the shards represent a single logical database. In eXtremeDB, each shard is managed by an instance of an eXtremeDB database server. The shards can exist on a single server or can be spread across multiple servers. In the first case, it is to take advantage of multiple CPUs, CPU cores and I/O channels within a single chassis. In the latter case, it is to take advantage of the same, but for multiple servers.

When a logical database is manifested as multiple physical databases, the database system needs to isolate the physical topology from client applications. In other words, an application should not be concerned about what physical database data may reside it; it should be able to just query the logical database and receive the result set. eXtremeDB does this through its distributed query processing engine.

What is Distributed Query Processing?

The distributed query engine accepts an SQL query from the client application and distributes the query to all of the database server instances managing the database shards. Performance is accelerated — dramatically, in some cases — via parallel execution of database operations and by harnessing the capabilities of many host computers rather than just one.

The benefits of distributed query processing are evident in McObject’s recent STAC-M3 benchmarks with partners E8 Storage, IBM and Lucera Financial Infrastructures; in these tests, the eXtremeDB database was partitioned horizontally across up to 128 shards, resulting in record-setting performance managing tick data.

 

High Availability

High availability enables deployment of a master database and one or more synchronized replica databases within separate hardware instances, with application-directed failover. Replication strategies include 2-safe (synchronous) and 1-safe (asynchronous). This feature delivers “five nines” (99.999% uptime) reliability, or better, with eXtremeDB for HPC’s unsurpassed performance. Replica database instances are available for read-only operations, to support distribution/load-balancing of database query/analysis/reporting requirements.

Clustering

In clustered deployments, every eXtremeDB HPC database instance serves as a master. Changes to one node are efficiently replicated to others.

eXtremeDB offers different distributed database options to address different objectives.  Learn more.

The information below summarizes the primary purpose and characteristics of each distributed database option (some of which may be combined, e.g. Sharding and High Availability).

Sharding  High Availability Cluster IoT
Primary purpose Scalable database applications that require maximum CPU, memory and storage utilization to serve large data sets with a high degree of resource efficiency Database applications that require five 9s availability and instant switch-over. Supports the distribution of read-only workloads (read load balancing) Applications that require distributed, cooperative computing and a resilient topology with five 9s availability. Supports distribution of all workloads (read- and write load balancing) on modest sized networks Data aggregation from a large number of data collection points. Smart data containers to support sporadic connectivity. Advanced server-side analytics for aggregated data
Replication When combined with HA Master-slave replication. Synchronous, Asynchronous Multi-master replication. Synchronous On-demand, based on connection state, data modification events, timers, and more
Scalability Elastic, near liner scalability with added shards Near linear read scalability. Read requests can be distributed across multiple nodes Near linear read scalability. Overall scalability is a function of the workload (% read-only versus read-write transactions). Server-side performance can be increased with added cores & sharding
Reliability and Fault-tolerance When combined with HA Fault tolerant Fault tolerant

Containers are durable even with sparse connectivity.

Server-side can be made reliable through the normal means — clustering and HA

Concept and Distribution Topology A logical database is horizontally partitioned — physically split into multiple (smaller) parts called shards; shards may reside on separate servers to spread the load or on the same server to better utilize multiple CPU cores. eXtremeDB’s SQL engine handles query distribution and presents the distributed database as a single logical database A single master database receives all modifications (insert/update/delete operations) and replicates transactions to replicas. In the event of a failure, one replica is elected as new master Multi-master architecture in which each node can apply modifications (insert/update/delete). Each transaction is synchronously propagated to all nodes, keeping copies of the database identical (consistent). Database reads are always local (and fast). Writes are longer, but don’t block the database —high concurrency is achieved through Optimistic Concurrency Control.

Push data from IoT Edge to aggregation points (Gateways and/or Servers) for analytics.

Push data down to the edge, usually for new device configuration/provisioning.

Controlled through push/pull interfaces and/or automatic data exchange between collection points and servers.

Sharding 

Primary purpose: Scalable database applications that require maximum CPU, memory and storage utilization to serve large data sets with a high degree of resource efficiency.

Replication: When combined with HA

Scalability:  Elastic, near liner scalability with added shards

Reliability and Fault-tolerance:  When combined with HA 

Concept and Distribution Topology:  A logical database is horizontally partitioned — physically split into multiple (smaller) parts called shards; shards may reside on separate servers to spread the load or on the same server to better utilize multiple CPU cores. eXtremeDB’s SQL engine handles query distribution and presents the distributed database as a single logical database.

  

High Availability

Primary purpose:  Database applications that require five 9s availability and instant switch-over. Supports the distribution of read-only workloads (read load balancing).

Replication:  Master-slave replication. Synchronous, Asynchronous

Scalability:  Near linear read scalability. Read requests can be distributed across multiple nodes.

Reliability and Fault-tolerance:  Fault tolerant

Concept and Distribution Topology:  A single master database receives all modifications (insert/update/delete operations) and replicates transactions to replicas. In the event of a failure, one replica is elected as new master.

 

Cluster

Primary purpose:  Applications that require distributed, cooperative computing and a resilient topology with five 9s availability. Supports distribution of all workloads (read- and write load balancing) on modest sized networks.

Replication:  Multi-master replication. Synchronous

Scalability:  Near linear read scalability. Overall scalability is a function of the workload (% read-only versus read-write transactions).

Reliability and Fault-tolerance:  Fault tolerant

Concept and Distribution Topology:  Multi-master architecture in which each node can apply modifications (insert/update/delete). Each transaction is synchronously propagated to all nodes, keeping copies of the database identical (consistent). Database reads are always local (and fast). Writes are longer, but don’t block the database —high concurrency is achieved through Optimistic Concurrency Control.

 

IoT

Primary purpose:  Data aggregation from a large number of data collection points. Smart data containers to support sporadic connectivity. Advanced server-side analytics for aggregated data

Replication:  On-demand, based on connection state, data modification events, timers, and more.

Scalability:  Server-side performance can be increased with added cores & sharding

Reliability and Fault-tolerance:  

  • Containers are durable even with sparse connectivity.
  • Server-side can be made reliable through the normal means — clustering and HA

 Concept and Distribution Topology:

  • Push data from IoT Edge to aggregation points (Gateways and/or Servers) for analytics.
  • Push data down to the edge, usually for new device configuration/provisioning.
  • Controlled through push/pull interfaces and/or automatic data exchange between collection points and servers.