Optimizing Data Access with Sequences

Sequences

A sequence is an unbounded array of eXtremeDB-supported scalar data elements (i.e. [u]int[1234], time, date, float and double). Sequences are dynamic; they don't require layout placement through the dictionary and can be created and removed at runtime. The database schema defines the sequence type, and possibly specifies whether the sequence is ascending or descending.

Applications access the sequence through its handle, the nature of which is similar to an eXtremeDB object handle. Multiple sequences within a class can be treated as a group, which effectively forms a time series.

Consider the following example:

 
    class Quote {
        char<8> symbol;
        sequence<time asc> timestamp;
        sequence<float> high;
        sequence<float> low;
        sequence<float> open;
        sequence<float> close;
        sequence<uint4> volume;
         
        hash<symbol> by_sym[1000];
    };
         

This example defines a class Quote with a field named symbol that occurs once per object of this class, and for which unique values are enforced through the hash index named by_sym. The class includes a time series consisting of six sequences. The time series should be treated in ascending order by the timestamp.

Note: eXtremeDB does not sort the elements of an ascending or descending sequence. The ascending / descending modifier causes eXtremeDB to ensure that the data is inserted in the proper order, and returns an error if it is not, and causes the schema compiler to generate additional functions for ordered sequences.

Column Versus Row Layout

Conceptually, sequences can be considered to be organized in a columnar fashion.

Thus, with sequences, eXtremeDB is a hybrid of a conventional row-oriented database, and a columnar database. “Normal” fields/columns in a class/table are stored in the normal row-oriented manner, which is natural and efficient for usual data processing needs. Sequences, however, are stored with a vertical, or columnar, layout that is far more efficient when the data processing pattern requires iterating over a large number of values of one or more columns. With a columnar layout, a database page (the unit of storage in database systems) only contains values for that one column and no other columns. The advantage is that the database system is not using processing cycles to retrieve data that is not needed, which would be the case with a row-based layout. In other words, if a page contains 40 rows of a table, and each row has 15 columns, but only 1 column is needed for a processing task, then for each page read by the DBMS, only about 6.7% of the data transferred with the page read is useful; the rest is discarded. (The page has 40 X 15 = 600 values, of which only 40 are needed for the processing, which equates to about 6.7%.) Conversely, with a columnar layout, a database page contains only values for that column, and 100% of data transferred with the page read is useful and there is no wastage.

A sequence can be ordered (timestamp above) or unordered (all other sequences above). An ordered sequence can be searched by value, otherwise only by index (meaning, only by position within the sequence). The ascending attribute does not instruct the runtime to sort the input data. Instead, the runtime checks that the data is inserted in the proper order and returns an error if it is not.

(Please see page Using Sequences for details on usage and implementation of sequences.)