Step 7: Vector-based Analytics Functions

A powerful feature of eXtremeDB is the sequence data type. Apart from the memory and processing efficiency enabled by the possibility to store data in this "columnar" format, sequences are particularly suited to vector-based statistical functions. Typically, these SQL statistical functions are used to perform analytical operations on “time series”; i.e. a group of related sequence fields.

The following examples will perform some statistical functions via queries on a simple database class:

 
    class Quote {
        char<MAX_SYMBOL_LEN> symbol;
        sequence<date asc> day;
        sequence<float> low;
        sequence<float> high;
        sequence<float> open;
        sequence<float> close;
        sequence<uint4> volume;
        unique tree<symbol> by_sym;
    };
 

The data for these examples is taken from historical values from 2013 for IBM. This sample data and a group of script and shell command files that perform the select statements are included in directory eXtremeDB/samples/native/sql/xsql/scripts/financial. To create table QuoteIBM and load the sample data, open xSQL in this directory with the following command:

 
    xsql -i -size 100m -f IBM-q1-2013.sql
     

Sequence Interval

The first example will extract data belonging to a specified interval (January 2013). The ordered sequence day is used to select the interval and then this interval is projected to the other (unordered) sequences open and close with the following query:

 
    SELECT flattened symbol, seq_search(day, 20130101, 20130131) as Jan_2013,
        open@Jan_2013 as "open", close@Jan_2013 as "close"
    FROM Quote WHERE symbol='IBM';
     
    symbol  Jan_2013        open            close
    --------------------------------------------------
    IBM     20130102        194.089996      196.350006
    IBM     20130103        195.669998      195.270004
    IBM     20130104        194.190002      193.990005
    IBM     20130107        193.399994      193.139999
    IBM     20130108        192.919998      192.869995
    IBM     20130109        193.479996      192.320007
    IBM     20130110        192.649994      192.880005
    IBM     20130111        194.149994      194.449997
    IBM     20130114        192.820007      192.619995
    IBM     20130115        191.309998      192.500000
    IBM     20130116        192.000000      192.589996
    IBM     20130117        193.850006      193.649994
    IBM     20130118        194.029999      194.470001
    IBM     20130122        194.360001      196.080002
    IBM     20130123        203.500000      204.720001
    IBM     20130124        203.910004      204.419998
    IBM     20130125        204.449997      204.970001
    IBM     20130128        204.850006      204.929993
    IBM     20130129        204.339996      203.899994
    IBM     20130130        203.690002      203.520004
    IBM     20130131        203.320007      203.070007
     
    Selected records: 21
         

Note that this query can be run from the script file x3.sql:

 
    XSQL>script x3.sql
 

Note the use of key word flattened to transform the sequence fields into a table so that the result set appears as individual rows for each set of corresponding sequence elements with the scalar field symbol value repeated on each row. (Also note that the bounds specified for seq_search() are “inclusive”.)

Arithmetic on Sequences

We can perform arithmetic operations on sequence elements. In the following query the two sequences high and low are averaged by adding each element of high to the corresponding element of low and dividing by 2 to produce a result sequence:

 
    SELECT flattened symbol, seq_search(day, 20130101, 20130131) as Jan_2013,
        high@Jan_2013 as "high", low@Jan_2013 as "low",
        (high@Jan_2013 + low@Jan_2013) / 2 as "average"
    FROM Quote WHERE symbol='IBM';
         
    symbol  Jan_2013        high            low             average
    ------------------------------------------------------------------
    IBM     20130102        196.350006      193.800003      195.075012
    IBM     20130103        196.289993      194.440002      195.364990
    IBM     20130104        194.460007      192.779999      193.619995
    IBM     20130107        193.779999      192.339996      193.059998
    IBM     20130108        193.300003      191.610001      192.455002
    IBM     20130109        193.490005      191.649994      192.570007
    IBM     20130110        192.960007      191.279999      192.119995
    IBM     20130111        195.000000      192.899994      193.949997
    IBM     20130114        193.279999      191.750000      192.514999
    IBM     20130115        192.729996      190.389999      191.559998
    IBM     20130116        193.179993      191.350006      192.264999
    IBM     20130117        194.460007      193.240005      193.850006
    IBM     20130118        195.000000      193.800003      194.399994
    IBM     20130122        196.080002      194.009995      195.044998
    IBM     20130123        208.580002      203.360001      205.970001
    IBM     20130124        205.059998      203.080002      204.070007
    IBM     20130125        205.179993      204.130005      204.654999
    IBM     20130128        206.220001      204.289993      205.255005
    IBM     20130129        205.729996      203.639999      204.684998
    IBM     20130130        204.880005      203.190002      204.035004
    IBM     20130131        204.470001      202.960007      203.714996
     
    Selected records: 21
     

And this query can be run from the script file x4.sql.

Filtering Sequence Elements

The following query will extract the days of February 2013 when the close value was greater than open:

 
    SELECT symbol, seq_search(day, 20130201, 20130228) as Feb_2013,
        close@Feb_2013 as Feb_Close, open@Feb_2013 as Feb_Open,
        seq_filter(Feb_Close > Feb_Open, Feb_2013) as "up_Feb",
        seq_filter(Feb_Close > Feb_Open, Feb_Open) as "up_Open",
        seq_filter(Feb_Close > Feb_Open, Feb_Close) as "up_Close"
    FROM Quote WHERE symbol='IBM';
     
    symbol
    Feb_2013{}
    Feb_Close{}
    Feb_Open{}
    up_Feb{}
    up_Open{}
    up_Close{}
    ------------------------------------------------------------------------------
    IBM
    {20130201, 20130204, 20130205, 20130206, 20130207, 20130208,
    20130211, 20130212, 20130213, 20130214, 20130215, 20130219,
    20130220, 20130221, 20130222, 20130225, 20130226, 20130227,
    20130228}
    {205.179993, 203.789993, 202.789993, 201.020004, 199.740005, 201.679993,
    200.160004, 200.039993, 200.089996, 199.649994, 200.979996, 200.320007,
    199.309998, 198.330002, 201.089996, 197.509995, 199.139999, 202.330002,
    200.830002}
    {204.649994, 204.190002, 204.309998, 200.389999, 200.619995, 199.970001,
    200.979996, 200.009995, 200.649994, 199.729996, 199.979996, 200.600006,
    200.619995, 198.630005, 199.229996, 201.669998, 198.630005, 198.889999,
    202.179993}
    {20130201, 20130206, 20130208, 20130212, 20130215, 20130222,
    20130226, 20130227}
    {204.649994, 200.389999, 199.970001, 200.009995, 199.979996, 199.229996,
    198.630005, 198.889999}
    {205.179993, 201.020004, 201.679993, 200.039993, 200.979996, 201.089996,
    199.139999, 202.330002}
     
    Selected records: 1
     

The up_Feb sequence contains the dates when the close value was greater than the open for February of 2013. To illustrate we have highlighted the date value for February 8, 2013 in the up_Feb sequence which corresponds with the 3rd element of the up_Open and up_Close sequences and that satisfy the filter condition; i.e. 201.68 > 199.97.

Note that the repeated calls of function seq_filter() are necessary to produce the filtered open and close sequences which make the output more easily understandable, as the Feb_2013, Feb_Open and Feb_Close sequences (which cannot be suppressed) somewhat obscure the essential information.

Please view the Analytics Function Library page for many more examples and further explanation of these powerful analytic functions.