АЛГОРИТМ ПОТОКОВОЙ КОМПРЕССИИ ДАННЫХ С ПЛАВАЮЩЕЙ ЗАПЯТОЙ В ИНФОРМАЦИОННЫХ СИСТЕМАХ ОБЕСПЕЧЕНИЯ НАУЧНЫХ ЭКСПЕРИМЕНТОВ

А.А. Chusov; М.А. Kopaeva

А.А. Chusov Far-Eastern Federal University
М.А. Kopaeva Far-Eastern Federal University

Keywords: Data compression, source compression coding, floating-point arithmetic, data streams

Abstract

The paper presents an original algorithm and its implementation for single pass real-time
compression of streams of numeric floating-point data. The purpose of the research is to develop
and formalize a single-pass algorithm of stream floating-point data compression in order to increase
performance of both encoding and decoding, because a use of existing implementations
provides insufficient speed of compression, are too restrictive on hardware resources and limited
in applicability to real-time stream data compression when it comes to floating-point data.
For that, the following issues have been addressed. The developed mathematical model and the
algorithm for compression of scalar floating-point data are described together with results of experimental
research of the compression method applied to single-dimensional and twodimensional
scientific data. The model is based upon the commonly-used binary_64 representation,
of the IEEE-754 standard, onto which extended real-line values are mapped. The algorithm
can be implemented as part of high-performance distributed systems in which performance of
input-output operations, as well as internetwork communication, are critical to overall efficiency.
The performance and applicability of the algorithm in data stream compression result from its
single-pass behaviour, relatively low requirements to a priori known and statically defined size of
memory required to implement history of compression, which the predictor, used in compression
and decompression, is based on. Indeed, the measured compression ratios are comparable to ones
which result from using more resource-intensive universal coders but providing significantly lower
latency. Provided synchronization of parameters of both compressor and decompressor applied to
a stream of vector values and assuming a correlation between absolute values of scalars of the
same dimension within the vectors, further improvement of the predictor performance can be attained
by means of SIMD-class parallelism which, in turn, is beneficial for overall performance of
compression and decompression, provided that the underlying hardware is capable of addressing
random-access memory based on offsets in a vector register, such as by employment of the
VGATHER class instructions of Intel processors. In order to reduce the bottlenecks associated
with input-output, an implementation of the algorithm is employed by the authors as part of a
computing system used for parallel simulation of wave fields which is distributed via a network.
The experiments described in the paper demonstrate significant performance increase of the proposed
coder compared to well-known universal compressors, RAR, ZIP and 7Z, while the achieved
compression factors remain comparable.

References

1. Engelson V., Fritzson D., Fritzson P. Lossless compression of high-volume numerical data
from simulations, In Data Compression Conference, 2000, pp. 574-586.
2. Ratanaworabhan P., Ke J., Burtscher M. Fast lossless compression of scientific floating-point
data, In Proceedings of Data Compression Conference, 2006, pp. 133-142.
3. Lindstrom P. Isenburg M. Fast and efficient compression of floating-point data, IEEE Transactions
on Visual and Computer Graphics, 2006, Vol. 12, No. 5, pp. 1245-1250.
4. IEEE 754: Standard for binary floating-point arithmetic, 2008.
5. Gomez L.A., Cappello F. Improving floating point compression through binary masks, In Proc.
2013 IEEE Int. Conf. Big Data, 2013, pp. 326-331.
6. Imai S., Fukuma S., Mori S. A Floating Point Data Compression Using Inter-Extrapolative
Predictor, IEEE 61st International Midwest Symposium on Circuits and Systems (MWSCAS),
2018, pp. 546-549.
7. Burtscher M., Ratanaworabhan P. FPC: A High-Speed Compressor for Double-Precision
Floating-Point Data, IEEE transactions on computers, 2009, Vol. 58, No. 1, pp. 18-31.
8. Knorr F., Thoman P., Fahringer T. ndzip: A High-Throughput Parallel Lossless Compressor
for Scientific Data, 2021 Data Compression Conference (DCC), 2021, pp. 103-112.
9. Hildebrandt J., Habich D. and Lehner W. BOUNCE: Memory-Efficient SIMD Approach for
Lightweight Integer Compression, 2022 IEEE 38th International Conference on Data Engineering
Workshops (ICDEW), 2022, pp. 123-128. DOI: 10.1109/ICDEW55742.2022.00025.
10. Tomari H., Inaba M., Hiraki K. Compressing floating-point number stream for numerical applications,
2010 First International Conference on Networking and Computing, 2010, pp. 112-119.
11. Katahira K., Sano K., Yamamoto S. FPGA-based lossless compressors of floating-point data
streams to enhance memory bandwidth, In Proceedings of the International Conference on
Application-specific Systems, Architectures and Processors, 2010, pp. 246-253.
12. Mondigo A., Ueno T., Tanaka D., Sano K., Yamamoto S. Design and scalability analysis of
bandwidth-compressed stream computing with multiple fpgas, In Proceedings of 2017 12th International
Symposium on Reconfigurable Communication-centric Systems-on-Chip
(ReCoSoC), 2017, pp. 108-115.
13. Ueno T., Kono Y., Sano K., Yamamoto S. Parameterized Design and Evaluation of Bandwidth
Compressor for Floating-Point Data Streams in FPGA-Based Custom Computing. Berlin, Heidelberg:
Springer Berlin Heidelberg. 2013, pp. 90-102.
14. Ueno T., Sano K., Furusawa T. Performance Analysis of Hardware-Based Numerical Data Compression
on Various Data Formats, 2018 Data Compression Conference, 2018, pp. 345-354.
15. Ueno T., Sano K., Yamamoto S. Bandwidth Compression of Floating-Point Numerical Data
Streams for FPGA-Based High-Performance Computing, ACM Transactions on Reconfigurable
Technology and Systems, 2017, Vol. 10, No. 3, pp. 1-22.
16. Yang A., Mukka H., Hesaaraki F., Burtscher M. MPC: A Massively Parallel Compression
Algorithm for Scientific Data, IEEE International Conference on Cluster Computing, 2015.
17. Claggett S., Azimi S., Burtscher M. SPDP: An Automatically Synthesized Lossless Compression
Algorithm for Floating-Point Data, Data Compression Conference, 2018.
18. Burtscher M., Hesaaraki F., Mukka H., Yang A. Real-Time Synthesis of Com-pression Algorithms
for Scientific Data, ACM/IEEE International Conference for High-Performance Computing,
Networking, Storage and Analysis, 2016, pp. 264-275.
19. Kopaeva M.A., Chusov A.A. Algoritm i sistemnaya realizatsiya kompressii potokov chisel s
plavayushchey tochkoy pri realizatsii obrabotki eksperimental'nykh nauchnykh dannykh [An
algorithm and systematic approach to compression of floating-point data streams for processing
of scientific data], Radioelektronika. Problemy i perspektivy razvitiya: Sb. trudov
Sed'moy vserossiyskoy molodezhnoy nauchnoy konferentsii [Radioelectronics. Problems and
future developements: Seventh Russian Youth Scientific Conference]. Tambov: Izd. tsentr
FGBOU VO «TGTU», 2022.
20. Ainsworth M., Klasky S., Whitney B. Compression Using Lossless Decimation: Analysis and
Application, SIAM J. Sci. Comput., 2017, Vol. 39 (4), pp. B732-B757.

ALGORITHM FOR COMPRESSION OF FLOATING-POINT DATA IN SCIENTIFIC RESEARCH SUPPORT SYSTEMS

Abstract

References