Data Compression in C

Please refer to the Data Compression page for a discussion of the advantages of in-memory and persistent database compression.

In-Memory Database Compression

In-memory database compression is enabled by setting the MCO_DB_INMEMORY_COMPRESSION flag in the mco_db_params_t.mode_mask parameter passed to mco_db_open_dev(). (Note that this flag does not apply to disk-based or “direct pointers” databases).

There are three parameters controlling compression which are also set in the mco_db_params_t structure:

In-memory compression and encryption use the same page hash table, in which parameters are controlled by the mco_db_params_t.max_active_pages and mco_db_params_t.page_hash_bundles. The compressor allocates buffers of size max_active_pages * page_hash_bundles * db_max_connection * mem_page_size for decompressing accessed pages.

Note that in-memory compression can be used only for pure in-memory databases (i.e. no disk manager is used). So if this mode is specified when the mcovtdsk library is used, the function mco_db_open_dev() or mco_db_load() will return error code MCO_E_ILLEGAL_PARAM.

Persistent Database Compression

To enable compression, set the MCO_FILE_OPEN_COMPRESSED flag in the dev.file.flags field of the file device specification. For example:

 
    dev[2].type       = MCO_MEMORY_FILE;
    dev[2].assignment = MCO_MEMORY_ASSIGN_PERSISTENT;
    sprintf( dev[2].dev.file.name, FILE_PREFIX "%s.dbs", dbName );
    dev[2].dev.file.flags = MCO_FILE_OPEN_COMPRESSED;
            
     

If this flag is not set, compression will not be performed and the file will function as a normal file. (Note that it is possible to use compression for some databases / files and not use it for other files).

Two additional parameters can be used to modify the compression process. These are the compression_level and disk_max_database_size elements of the mco_db_params_t structure passed to mco_db_open_dev(). The disk_max_database_size value determines the size of the memory allocation bitmap. If it is not set (0 or INFINITE_DATABASE_SIZE), then the file size limit is 1Tb (which requires a 2Gb bitmap).

The compression_level is an integer value of 0..9 that specifies the level of compression. A value of 0 disables compression. Values of 1 and above specify different compression strategies as follows:

1 : The default strategy; means conditional compression based on the following criteria:

  • Require 25% compression rate or not worth compressing
  • Stop if no compression in the first 1Kb
  • Stop history lookup if a match of 128 bytes is found
  • Lower the ‘good match’ size by 10% at each iteration

2 – 9: Correspond to increasingly complex unconditional compression (always compress) strategies which have the following criteria:

  • Continue even if there is a low rate of compression
  • Stop history lookup if a match of 128 bytes is found
  • Increasing attempts to find a good match with higher compression level numbers

Note that eXtremeDB implements LZ compression in two special file system libraries: mcofu98zip and mcofu98ziplog which are only available on Unix systems like Linux, MacOS and Solaris. The application developer chooses one of these implementations by simply linking one or the other of these file system libraries in place of the standard filesystem library (mcofuni, or mcofu98).

The mcofu98zip implementation uses compression only for random access I/O operations, and consequently is used only for compressing the database file (and not for compressing a transaction log file). It compresses the main database file (dbname.dbs) and creates two additional files. Because the size of an updated page in the database file can be changed, the pages have to be reallocated. So, a page map file (dbname.pagemap) is built to provide indirect access to the pages and a memory allocation bitmap file (dbname.bitmap) to manage the space within the file.

Instead, the mcofu98ziplog implementation is used for compressing transaction log files. It uses the same compression algorithm as mcofu98zip, but it overrides only the sequential I/O methods. As compression is applied sequentially and pages are not overwritten, it is not necessary to maintain extra page maps and allocation map. So only the compressed log file is produced, no additional files.

Note that it is not possible to provide compression both for database and transaction log files in a single application, because only one compression implementation (either mcofu98zip or mcofu98ziplog) can be linked into the application.

Further, note that CRC checking may not be enabled simultaneously with compression because compression implements its own CRC.

Data Compression for IoT Communications

Data compression for IoT communications is enabled by setting the compression_level element of structure mco_iot_comm_paramst_t . The default level of 0 means no compression. As explained in the Data Compression page, the communications between each server-device pair will be compressed only if both sides (server and device) have compression_level greater than 0, and all communicating components must use the same compression level value.

Also explained in the Data Compression page is the formula used to determine the compression ratio. The mco_iot_conn_get_stat() API can be called to retrieve the statistics for a connection; elements sent_compression_ratio and recv_compression_ratio of returned structure mco_iot_connection_stat_t show the compression efficiency for incoming and outgoing communications.