Data Compression

For large databases it can be important to conserve memory and/or disk space by using data compression techniques. As explained in the following sections, eXtremeDB provides this capability for in-memory and persistent databases.

It is also possible to compress the data sent in communications between server and device in eXtremeDB Active Replication Fabric networks. Operating in low-bandwidth networks requires techniques to compress the network traffic. To solve this problem for all network components of eXtremeDB at once, the compression is implemented at the SAL layer as a wrapper for the mco_socket_t abstraction. This allows applying compression to any kind of supported socket types (plain TCP, SSL, local-domain, UDP, etc.). The actual work is done with the third-party stream compression library Zlib .

Note that database compression is not compatible with encryption; you must choose either to use encryption or compression. But not both.

In-Memory Database Compression

In-memory compression may be useful in cases like the following:

1. The data is very sparse; i.e. there are a lot of fixed size zero padded strings, mostly empty arrays, scalar fields containing mostly zeros, etc.

2. The database doesn't fit in main memory without compression.

3. Persistence is not needed or not possible (no disk).

When in-memory compression is enabled, all data is compressed; i.e. pages are stored in memory in compressed format and are decompressed when accessed by the application.

Note that the database parameters compression_mask and expected_compression_ratio allow source code licensees to modify the data compression process for in-memory pages. But additional configuration settings are also required. Please contact McObject Technical Support for implementation details if in-memory page compression is desired.

Note also that in-memory compression cannot be used for hybrid databases (containing both transient and persistent classes) because this would require substituting the in-memory page manager with the disk page manager. These page managers are implemented in two separate libraries and only one page manager library can be linked into the application.

Persistent Database Compression

Persistent database compression is available only on Unix-like systems (Linux, MacOS, Solaris, etc.). It uses an LZ compression algorithm which builds a dictionary mapping integer codes to unique sequences of characters. (Please refer to the following link for a more detailed description of the algorithm: https://en.wikipedia.org/wiki/Lempel-Ziv-Welch).

Note that database compression is not compatible with encryption; you must choose either to use encryption or compression. But not both.

Note also that in-memory compression cannot be used for hybrid databases (containing both transient and persistent classes) because this would require substituting the in-memory page manager with the disk page manager. These page managers are implemented in two separate libraries and only one page manager library can be linked into the application.

Further, note that CRC checking may not be enabled simultaneously with compression because compression implements its own CRC.

RLE Compression for Sequences

For sequence data it is possible to optimize space using "Run Length Encoding (RLE ) compression. Please see page RLE Compression for implementation details.)

Data Compression for IoT Communications

In eXtremeDB Active Replication Fabric networks, that data transfer within the network can be compressed. The communications between IoT server-device pairs will be compressed only if both sides (server and device) have appropriate compression level settings. All communicating components must use the same compression level value. But the IoT server can define two connections (two listen ports), one with and one without compression, and devices must connect to one of these ports depending on their compression settings.

The actual efficiency of the compression can be evaluated by examining compression ratio parameters that show the ratio between the original and compressed size of the data transmitted, both for incoming and outgoing communications. The formula is ratio = (original_size - compressed_size) * 100 / original_size.

So 0% means the worst case or no compression: original size is equal to compressed size; 100% means the ideal compression: compressed size is negligible compared to the original size. For example, if data was compressed from 100Kb to 30Kb, the compression ratio will be 70%.

Native Language APIs

The APIs for data compression are specific to the programming language used. Please use the links below to view detailed explanations and examples for your development environment:

C Data Compression in C
C++ Data Compression in C++
xSQL Data Compression using xSQL
Java Data Compression in Java
Python Data Compression in Python
C# Data Compression in C#

 


Send feedback on this topic to McObject.