ALGORITHM FOR COMPRESSION OF FLOATING-POINT DATA IN SCIENTIFIC RESEARCH SUPPORT SYSTEMS
Keywords:
Data compression, source compression coding, floating-point arithmetic, data streamsAbstract
The paper presents an original algorithm and its implementation for single pass real-time
compression of streams of numeric floating-point data. The purpose of the research is to develop
and formalize a single-pass algorithm of stream floating-point data compression in order to increase
performance of both encoding and decoding, because a use of existing implementations
provides insufficient speed of compression, are too restrictive on hardware resources and limited
in applicability to real-time stream data compression when it comes to floating-point data.
For that, the following issues have been addressed. The developed mathematical model and the
algorithm for compression of scalar floating-point data are described together with results of experimental
research of the compression method applied to single-dimensional and twodimensional
scientific data. The model is based upon the commonly-used binary_64 representation,
of the IEEE-754 standard, onto which extended real-line values are mapped. The algorithm
can be implemented as part of high-performance distributed systems in which performance of
input-output operations, as well as internetwork communication, are critical to overall efficiency.
The performance and applicability of the algorithm in data stream compression result from its
single-pass behaviour, relatively low requirements to a priori known and statically defined size of
memory required to implement history of compression, which the predictor, used in compression
and decompression, is based on. Indeed, the measured compression ratios are comparable to ones
which result from using more resource-intensive universal coders but providing significantly lower
latency. Provided synchronization of parameters of both compressor and decompressor applied to
a stream of vector values and assuming a correlation between absolute values of scalars of the
same dimension within the vectors, further improvement of the predictor performance can be attained
by means of SIMD-class parallelism which, in turn, is beneficial for overall performance of
compression and decompression, provided that the underlying hardware is capable of addressing
random-access memory based on offsets in a vector register, such as by employment of the
VGATHER class instructions of Intel processors. In order to reduce the bottlenecks associated
with input-output, an implementation of the algorithm is employed by the authors as part of a
computing system used for parallel simulation of wave fields which is distributed via a network.
The experiments described in the paper demonstrate significant performance increase of the proposed
coder compared to well-known universal compressors, RAR, ZIP and 7Z, while the achieved
compression factors remain comparable.








