eXtremeDB for HPC
The eXtremeDB for HPC database management system is available for use on the following operating systems:
- x32- and x64-bit versions of Windows and Linux
- Linux on POWER8
- Solaris Sparc
- Solaris x86_64
Does eXtremeDB scale?
There are no practical limits to the size of an eXtremeDB database. 32-bit operating environments limit addressable memory to 2GB or 3GB, but eXtremeDB’s 64-bit support overcomes this, so in-memory databases can scale to terabytes and beyond. Databases using eXtremeDB’s optional persistent storage are constrained only by available file system space (in either 32-bit or 64-bit implementations). Multi-version concurrency control (MVCC) boosts scalability by allowing an arbitrary number of threads/processes to simultaneously insert/update/delete the database without having to be queued by the transaction manager; this benefit is especially marked on multi-core hardware platforms.
eXtremeDB offers distributed computing options – clustering and distributed query processing – to enhance scalability by leveraging multiple hardware nodes’ memory, bandwidth and processing power, and to bring down the cost of scaling by permitting system expansion via low-cost servers. Learn more about eXtremeDB’s elastic scalability.
How is eXtremeDB similar to / different from a key value store database?
A key-value pair (KVP) is a set of two linked data items: an identifier called a key, and the content (value). A KVP database system manages such pairs, and can provide flexibility and scalability, but with limitations. An important difference is that the KVP database has no database definition language (DDL). It is schema-less. Without a DDL, there is no way to generate different views of the stored data for different users (applications, systems or people). Say that the key is a customer ID and the value is everything about that customer. All the KVP can do is return “everything about that customer” and the user is responsible for any further sorting or filtering. If a different view is needed, it must be created by the application, and possibly stored back in the database. In contrast, a database system with a DDL (such as eXtremeDB for Cloud Analytics Financial) understands the difference between the queries
Select cust_name, cust_addr_1, cust_addr_2, cust_city, cust_st, cust_zip from customer
where cust_zip like “98*”
Select cust_name from customer where cust_bal > 10000
Select avg(cust_bal), cust_st from customer where cust_bal > 10000 group by cust_st
Lack of a schema means that the DBMS “knows” nothing about the data, and can’t support features that work with specific data types. For example, it would be impossible to implement eXtremeDB’s library of vector-based statistical analysis functions and the ability to pipeline them, which use the product’s sequence data type to apply special handling to market data.
With its sequence data type, eXtremeDB provides one of the key features of column-oriented database systems: the ability to store and manage data by columns rather than rows. As discussed above, this is a boon when working with naturally columnar data such as time series (including market data). A key difference is that eXtremeDB’s column-based handling of data is optional and selective. If the sequence data type is not used, eXtremeDB operates as a “conventional” row-based DBMS. The sequence data type is a hybrid approach: tables must contain at least one conventional (non-sequence) column that is used to lookup the row for subsequent access to the columnar (sequence) data. While column-based handling accelerates performance in analytical processing on columns of data, row-based management is generally faster elsewhere, i.e. for “general” data processing that requires operations on >1 column of values. By supporting a combination of row- and column-based management, eXtremeDB enables database designers to apply the most efficient approach for all their data.
Does eXtremeDB support transactions?
Yes. eXtremeDB transactions enforce the ACID properties (ACID refers to the principles of Atomicity, Consistency, Isolation and Durability). As an in-memory database, eXtremeDB offers durability in the sense that transactions are durable so long as the database is not discarded. Durability can be strengthened by periodically saving the in-memory database and, optionally, through the introduction of transaction logging.
Doesn’t transaction logging just re-introduce the overhead of disk storage, eliminating any performance benefit of using an in-memory database?
McObject’s benchmark tests confirm that an IMDS with transaction logging active still outperforms a traditional “on-disk” DBMS performing the same tasks, even when the on-disk database is fully cached. This performance edge ranges from 2x – 3x with a hard disk drive storing the on-disk DBMS’s records and the IMDS’s transaction log, to more than 23x (2300% faster) when using state-of-the-art memory-tier storage such as Fusion-io’s ioDrive2 or EMC’s eXtremeSF.
What language will I use to create applications with eXtremeDB?
Developers can access eXtremeDB’s most powerful capabilities – including pipelining vector-based statistical functions – from the industry standard SQL database programming language and Python scripting language. Using these languages alongside eXtremeDB’s dynamic database definition language (DDL) capability, developers can implement their ideas quickly, and optimize rapidly by testing changes to code, database tables and indexes.
Application development is supported in C/C++, Java, C# (.NET) and Python. The native, navigational C/C++ API is type-safe and specific to a given project/database design. When coding in C or C++, applications can access the database system using this API as well as standard SQL/ODBC. eXtremeDB’s SQL implementation also provides a type-3, version 4 JDBC driver for development in Java. Its Java Native Interface (JNI) and C# native interface offer the fastest possible DBMS solutions in these languages, and afford developers the experience of working entirely with “plain old” Java and C# objects.
Does eXtremeDB have a data definition language?
Yes. eXtremeDB has a data definition language (DDL) and a database dictionary. Databases are created with a text editor using either the standard SQL DDL or the product’s native DDL (which looks very much like C++) and processed by our schema processor. The Java, C# and Python languages can also be the schema language, and a schema created in any language can be translated to any other language, facilitating simultaneous database access from applications coded in different languages. eXtremeDB’s dynamic DDL capability allows new indexes and tables to be added at run-time, without re-processing the schema.
Does eXtremeDB support multi-threaded concurrent access (simultaneous readers and writers)?
Yes. eXtremeDB is fully re-entrant and thread-safe. The multiple reader, single writer (MURSIW) transaction manager coordinates concurrent access, and concurrency requires no explicit action on the part of the developer other than to use the database system’s programming interface for starting, committing and aborting transactions. eXtremeDB’s multi-version concurrency control (MVCC) transaction manager eliminates database locking and can dramatically improve scalability and performance when many tasks or processes are concurrently modifying the database (versus read-only), especially on multi-core hardware. A more detailed discussion of the two transaction managers, with benchmark results comparing their performance, is available here.
Are duplicate/unique/compound/descending keys (indexes) supported?
Yes. eXtremeDB supports all of the above index variations, on b-tree indexes. eXtremeDB also offers other supported indexes including hash-based indexes (which can also be duplicate or unique, and/or compound), user-defined Object Identifiers (OIDs) and system-generated Auto-Ids (similar to auto-increment or serial data types in other DBMSs), Trigram and KD-Tree indexes.
Does eXtremeDB support complex data structures?
Yes. There is no theoretical limit to the complexity of the data you can define with eXtremeDB’s DDL. eXtremeDB supports C/C++ structs, nested structs, fixed and variable length arrays of any supported data type, sequences, and BLOBs.
The differences between a sequence and either a fixed-length array or a vector are: 1) unlike a fixed length array, sequences and vectors are variable length arrays; 2) vectors and arrays are limited to 64K elements, whereas sequences are unbounded; 3) arrays and vectors are stored with the same storage layout manager as other “normal” data, whereas sequences are stored with a specialized layout manager that implements a columnar layout; and 4) sequences can be used with eXtremeDB’s extensive library of statistical functions, while arrays and vectors cannot.
What capabilities are present in eXtremeDB for application/database debugging?
The database system is incorporated within an application in one of two versions, “debug” and “release.” In debug mode, eXtremeDB provides excellent debugging capabilities, with multiple traps throughout the eXtremeDB code that facilitate detection of application errors. For example, if an incorrect object handle is passed into eXtremeDB, a fatal exception will be raised by the eXtremeDB library before the handle is used, which makes it possible to examine the call stack and find the source of the corruption.
In addition, eXtremeDB for HPC’s native, application-specific C/C++ API is type-safe. Methods that are generated to provide access to a certain object type always take a reference to that data type as a parameter. If a mistake is made in data-typing, a compiler error will be generated. In release mode, all the above traps are removed from the eXtremeDB runtime code, and many functions inside the runtime are inlined. That can significantly increase the performance of the eXtremeDB -based application. (The software quality benefits of a type-safe API are discussed in this Linux Journal article by McObject’s co-founder/CEO).
What options does eXtremeDB provide for database replication?
eXtremeDB capabilities – High Availability (HA) and Clustering – implement database replication. The HA implementation offers master-slave replication, with one “main” node and one or more synchronized replicas, with failover. eXtremeDB HA is a popular choice when the principal goal of database replication is to maximize system uptime and/or to distribute the query processing load.
eXtremeDB’s distributed query processing implements replication via the concept of “k-safety” in which k-1 means each shard/server has one fully synchronized copy of the database (in addition to the main or “master” database), “k-2” means each shard has two copies, and so-on. K-safety levels are specified in each shard/server’s configuration file (a JSON document).
In Clustered deployments, every node serves as a master and the node originating a transaction will replicate the transaction to every other node. eXtremeDB’s Clustering also has a “local tables” option that exempts specified tables from cluster-wide replication, to accelerate processing at the local node. Local tables are shareable, when desired, through a scatter/gather mechanism. eXtremeDB’s Clustering provides fault-tolerance; it also dramatically increases available net processing power, reduces costs by enabling system expansion via low-cost (i.e. “commodity”) servers, enables more evenly balanced workloads, and delivers more scalable data management.
Another replication option is the eXtremeDB Data Relay technology that is part-and-parcel of eXtremeDB’s transaction logging capability. Data Relay enables selective data sharing between real-time systems based on eXtremeDB, and external systems such as enterprise DBMSs. It “opens up” eXtremeDB’s transaction buffer; for every object affected by a transaction, the buffer provides a coded value that indicates whether each operation was an insert, update, or delete. Data Relay provides a database cursor that iterates over objects in the buffer and extracts the coded value. Based on this reading, the application can take the next step of drilling down into the object, analyzing changes and determining whether to replicate the change to an enterprise system.
Does eXtremeDB Cluster offer high performance networking support to accelerate communication between database cluster nodes?
eXtremeDB Clusters on Linux support Infiniband switched fabric interconnect as well as the Message Passing Interface (MPI) standard. Open and widely-used, MPI provides higher speed and scalability, and the additional benefits of developer familiarity and more portable code. In McObject’s tests, eXtremeDB Cluster using Infiniband and MPI performed more than 7.5x faster than the same system implemented with “plain vanilla” gigabit networking (TCP/IP and Ethernet). MPI can also be used over Ethernet-TCP, and in McObject’s tests this combination performed almost 1.5x faster than using TCP/IP over Ethernet.