RLE Compression

Another form of optimization with sequences is to use the implementation of RLE (Run Length Encoding) compression. Here instead of storing “raw” sequences, repeating values are stored with a value and repeat count. On sequences with many duplicate elements RLE exhibits better performance and reduces storage requirements. For example, if x = {1,1,1,1,1,1,1,1,1,1}, and y = {2,2,2,2,2,2,2,2,2,2}, to calculate x+y it is only necessary to do one addition instead of 10 as when the raw sequence is stored.

A strong advantage of RLE is better use of the runtime stack. With the non-compressed implementation the number of elements in a tile is predefined and is the same for all data types – eg. 100 integers or 100 doubles. This is necessary to avoid extra verifications and load on demand while calculating binary expressions (operators). With compression enabled these verifications must be performed regardless, so the tile can be loaded with 200 integers or doubles.

The downside of using RLE is that the implementation requires more complicated logic, which negatively affects performance when calculating expressions, and requires some unavoidable extra space to keep the counters (an additional byte for each element). So, if the sequence contains predominately unique elements, then the non-compressed implementation performs better.

For applications that make use of sequences with duplicates, especially securities data, RLE will be beneficial. Virtually all vertical data storage systems (Vertica, SciDB, Vectorwise, etc.) use compression since one or another implementation of a compression is much more effective over a vertical data layout than it is over horizontal. While working with various benchmarks, it has been our experience that intraday prices don't change very often; it’s very likely that there will be many duplicates.

The RLE implementation is in a different library (target/mcoseqrle) than the non-compression implementation. So, the application is almost completely independent of the sequence storage implementation. It is "almost" independent because, though the API is the same, the tiles returned by the RLE function calls contain extra counters; so if the application makes use of these tiles this has to be accounted for. Specifically, in the C API, the non-compression tile processing code looks something like this:

 
    while ( close_iterator.next(&close_iterator) == MCO_S_OK ) 
    {
        mco_size_t i, tile_size = close_iterator.tile_size;
        for (i = 0; i < tile_size; i++) 
        {
            close_sum += close_iterator.tile.arr_float[i];
        }
    }
     

When RLE is used, the same code would look like:

 
    while (close_iterator.next(&close_iterator) == MCO_S_OK) 
    {
        mco_size_t i, tile_size = close_iterator.tile_size;
        for (i = 0; i < tile_size; i++) 
        {
            close_sum +=  close_iterator.tile.arr_float[i] * MCO_SEQ_RLE_COUNT(iterators.close.tile, i);
        }
    }
     

Note that in actual practice both of the code fragments above would rarely be used, as the mco_seq_add() function calculates the sum more conveniently and many other aggregate functions operate directly on sequence data (see Statistical Function Library for examples), applications will seldom (if ever) use tiles directly.

 


Send feedback on this topic to McObject.