Vertical Data Storage

The eXtremeDB sequence data type provides the capability to store data in vertical or columnar form. A sequence is an unbounded array of eXtremeDB-supported scalar data elements (i.e. [u]int[1|2|3|4], time, date, float, double,char and datetime). Sequences are dynamic; they don't require layout placement through the dictionary and can be created and removed at runtime. The database schema defines the sequence fields, and, for ordered sequences, whether the sequence is ascending or descending.

Applications access the sequence through its handle, the nature of which is similar to an eXtremeDB object handle. Multiple sequences within a class can be treated as a group, which effectively forms a time series. See page Using Sequences for implementation details. Also a library of analytics functions is also provided which perform a variety of operations and statistical calculations on sequences.

Consider the following example:

    class Quote {
        char<8> symbol;
        char<8> exchange;
        sequence<time asc> timestamp;
        sequence<float> bid;
        sequence<float> ask;
        sequence<uint4> volume;
        hash<symbol> by_sym[1000];

This example defines a class Quote with a field named symbol that occurs once per object of this class, and for which unique values are enforced through the hash index named by_sym. The class includes a time series consisting of six sequences. The time series should be treated in ascending order by the timestamp.

Note: eXtremeDB does not sort the elements of an ascending or descending sequence. The ascending / descending modifier causes eXtremeDB to ensure that the data is inserted in the proper order, and returns an error if it is not, and causes the schema compiler to generate additional functions for ordered sequences.

Column versus row layout

Conceptually, sequences can be considered to be organized in a columnar fashion.

Thus, with sequences, eXtremeDB is a hybrid of a conventional row-oriented database, and a columnar database. “Normal” fields/columns in a class/table are stored in the normal row-oriented manner, which is natural and efficient for usual data processing needs. Sequences, however, are stored with a vertical, or columnar, layout that is far more efficient when the data processing pattern requires iterating over a large number of values of one or more columns. With a columnar layout, a database page (the unit of storage in database systems) only contains values for that one column and no other columns. The advantage is that the database system is not using processing cycles to retrieve data that is not needed, which would be the case with a row-based layout. In other words, if a page contains 40 rows of a table, and each row has 15 columns, but only 1 column is needed for a processing task, then for each page read by the DBMS, only about 6.7% of the data transferred with the page read is useful; the rest is discarded. (The page has 40 X 15 = 600 values, of which only 40 are needed for the processing, which equates to about 6.7%.) Conversely, with a columnar layout, a database page contains only values for that column, and 100% of data transferred with the page read is useful and there is no wastage.

A sequence can be ordered (timestamp above) or unordered (all other sequences above). An ordered sequence can be searched by value, otherwise only by index (meaning, only by position within the sequence). The ascending attribute does not instruct the runtime to sort the input data. Instead, the runtime checks that the data is inserted in the proper order and returns an error if it is not.

Parallel Processing of Sequences

Further performance optimization can be achieved in multi-processor environments by thread pool management in eXtremeSQL. (Please see page Parallel Processing with Sequences for further details.)