A powerful feature of eXtremeDB is the
sequence
data type. Apart from the memory and processing efficiency enabled by the possibility to store data in this "columnar" format, sequences are particularly suited to vector-based statistical functions. Typically, these SQL statistical functions are used to perform analytical operations on “time series”; i.e. a group of related sequence fields.The following examples will perform some statistical functions via queries on a simple database class:
class Quote { char<MAX_SYMBOL_LEN> symbol; sequence<date asc> day; sequence<float> low; sequence<float> high; sequence<float> open; sequence<float> close; sequence<uint4> volume; unique tree<symbol> by_sym; };The data for these examples is taken from historical values from 2013 for IBM. This sample data and a group of script and shell command files that perform the select statements are included in directory
eXtremeDB/samples/native/sql/xsql/scripts/financial
. To create table QuoteIBM and load the sample data, open xSQL in this directory with the following command:xsql -i -size 100m -f IBM-q1-2013.sqlSequence Interval
The first example will extract data belonging to a specified interval (January 2013). The ordered sequence
day
is used to select the interval and then this interval is projected to the other (unordered) sequencesopen
andclose
with the following query:SELECT flattened symbol, seq_search(day, 20130101, 20130131) as Jan_2013, open@Jan_2013 as "open", close@Jan_2013 as "close" FROM Quote WHERE symbol='IBM'; symbol Jan_2013 open close -------------------------------------------------- IBM 20130102 194.089996 196.350006 IBM 20130103 195.669998 195.270004 IBM 20130104 194.190002 193.990005 IBM 20130107 193.399994 193.139999 IBM 20130108 192.919998 192.869995 IBM 20130109 193.479996 192.320007 IBM 20130110 192.649994 192.880005 IBM 20130111 194.149994 194.449997 IBM 20130114 192.820007 192.619995 IBM 20130115 191.309998 192.500000 IBM 20130116 192.000000 192.589996 IBM 20130117 193.850006 193.649994 IBM 20130118 194.029999 194.470001 IBM 20130122 194.360001 196.080002 IBM 20130123 203.500000 204.720001 IBM 20130124 203.910004 204.419998 IBM 20130125 204.449997 204.970001 IBM 20130128 204.850006 204.929993 IBM 20130129 204.339996 203.899994 IBM 20130130 203.690002 203.520004 IBM 20130131 203.320007 203.070007 Selected records: 21Note that this query can be run from the script file
x3.sql
:XSQL>script x3.sqlNote the use of key word
flattened
to transform the sequence fields into a table so that the result set appears as individual rows for each set of corresponding sequence elements with the scalar fieldsymbol
value repeated on each row. (Also note that the bounds specified forseq_search()
are “inclusive”.)Arithmetic on Sequences
We can perform arithmetic operations on sequence elements. In the following query the two sequences
high
andlow
are averaged by adding each element of high to the corresponding element of low and dividing by 2 to produce a result sequence:SELECT flattened symbol, seq_search(day, 20130101, 20130131) as Jan_2013, high@Jan_2013 as "high", low@Jan_2013 as "low", (high@Jan_2013 + low@Jan_2013) / 2 as "average" FROM Quote WHERE symbol='IBM'; symbol Jan_2013 high low average ------------------------------------------------------------------ IBM 20130102 196.350006 193.800003 195.075012 IBM 20130103 196.289993 194.440002 195.364990 IBM 20130104 194.460007 192.779999 193.619995 IBM 20130107 193.779999 192.339996 193.059998 IBM 20130108 193.300003 191.610001 192.455002 IBM 20130109 193.490005 191.649994 192.570007 IBM 20130110 192.960007 191.279999 192.119995 IBM 20130111 195.000000 192.899994 193.949997 IBM 20130114 193.279999 191.750000 192.514999 IBM 20130115 192.729996 190.389999 191.559998 IBM 20130116 193.179993 191.350006 192.264999 IBM 20130117 194.460007 193.240005 193.850006 IBM 20130118 195.000000 193.800003 194.399994 IBM 20130122 196.080002 194.009995 195.044998 IBM 20130123 208.580002 203.360001 205.970001 IBM 20130124 205.059998 203.080002 204.070007 IBM 20130125 205.179993 204.130005 204.654999 IBM 20130128 206.220001 204.289993 205.255005 IBM 20130129 205.729996 203.639999 204.684998 IBM 20130130 204.880005 203.190002 204.035004 IBM 20130131 204.470001 202.960007 203.714996 Selected records: 21And this query can be run from the script file
x4.sql
.Filtering Sequence Elements
The following query will extract the days of February 2013 when the
close
value was greater thanopen
:SELECT symbol, seq_search(day, 20130201, 20130228) as Feb_2013, close@Feb_2013 as Feb_Close, open@Feb_2013 as Feb_Open, seq_filter(Feb_Close > Feb_Open, Feb_2013) as "up_Feb", seq_filter(Feb_Close > Feb_Open, Feb_Open) as "up_Open", seq_filter(Feb_Close > Feb_Open, Feb_Close) as "up_Close" FROM Quote WHERE symbol='IBM'; symbol Feb_2013{} Feb_Close{} Feb_Open{} up_Feb{} up_Open{} up_Close{} ------------------------------------------------------------------------------ IBM {20130201, 20130204, 20130205, 20130206, 20130207, 20130208, 20130211, 20130212, 20130213, 20130214, 20130215, 20130219, 20130220, 20130221, 20130222, 20130225, 20130226, 20130227, 20130228} {205.179993, 203.789993, 202.789993, 201.020004, 199.740005, 201.679993, 200.160004, 200.039993, 200.089996, 199.649994, 200.979996, 200.320007, 199.309998, 198.330002, 201.089996, 197.509995, 199.139999, 202.330002, 200.830002} {204.649994, 204.190002, 204.309998, 200.389999, 200.619995, 199.970001, 200.979996, 200.009995, 200.649994, 199.729996, 199.979996, 200.600006, 200.619995, 198.630005, 199.229996, 201.669998, 198.630005, 198.889999, 202.179993} {20130201, 20130206, 20130208, 20130212, 20130215, 20130222, 20130226, 20130227} {204.649994, 200.389999, 199.970001, 200.009995, 199.979996, 199.229996, 198.630005, 198.889999} {205.179993, 201.020004, 201.679993, 200.039993, 200.979996, 201.089996, 199.139999, 202.330002} Selected records: 1The
up_Feb
sequence contains the dates when the close value was greater than the open for February of 2013. To illustrate we have highlighted thedate
value for February 8, 2013 in theup_Feb
sequence which corresponds with the 3rd element of theup_Open
andup_Close
sequences and that satisfy the filter condition; i.e. 201.68 > 199.97.Note that the repeated calls of function
seq_filter()
are necessary to produce the filteredopen
andclose
sequences which make the output more easily understandable, as the Feb_2013,Feb_Open
andFeb_Close
sequences (which cannot be suppressed) somewhat obscure the essential information.Please view the Analytics Function Library page for many more examples and further explanation of these powerful analytic functions.