Search
Search Results
-
OPTIMIZATION OF THE COMPUTATIONAL SCHEME FOR THE INTERPOLATION OF DECADAL METEOROLOGICAL DATA BY INVERSE DISTANCE WEIGHTING WITH PARALLEL PROCESSING OF MULTIPLE TIME SLICES
О.М. Golozubov , А.V. Kozlovskiy , E.V. Melnik , Y.E. Melnik , А.N. Samoylov22-322025-12-30Abstract ▼The present study is devoted to solving the problem of computational inefficiency in spatial interpolation of large arrays of decadal meteorological data using the inverse distance weighting method. Traditional approaches involving sequential and independent processing of each time slice demonstrate a linear increase in execution time and significant RAM consumption, which becomes a critical barrier to the rapid construction of detailed and geographically linked raster fields in GeoTIFF format. This significantly limits the use of the method in tasks requiring rapid processing of long-term data archives. The purpose of this work is to develop and validate an optimized computational scheme that can radically reduce time costs while maintaining the completeness and accuracy of the results. The key scientific novelty of the proposed approach lies in the fundamental rethinking of the computational process. Instead of repeating identical operations many times, a scheme is proposed based on a single calculation of the full vector of geodetic distances from each grid cell to all weather stations. This most resource-intensive operation is performed only once. Subsequently, the resulting distance vector is applied to all time slices (decades) to calculate the interpolated values, which eliminates the main computational redundancy and ensures a sublinear dependence of processing time on the number of decades. To further improve performance, a parallel processing mechanism is used at the CPU level, implemented by dynamically dividing the raster into independent computing units (batches). The size of the batches is adaptively adjusted taking into account the available RAM, which guarantees the stability and scalability of the solution on systems of various capacities. The testing of the method on real meteorological data for the period 2015-2024 demonstrated a radical reduction in the execution time. In particular, processing ten decade time slices on a standard laptop takes less than 3.5 minutes, and on a server platform it takes about 3 minutes, which represents a multiple acceleration compared to traditional implementations. Thus, the developed solution makes the operational processing of large spatial and temporal meteorological arrays a reality for a wide range of researchers, opening up new opportunities for climate monitoring, agrometeorology and geoinformation analysis without the need for specialized expensive equipment
-
TEXT VECTORIZATION USING DATA MINING METHODS
Ali Mahmoud Mansour , Juman Hussain Mohammad, Y. A. Kravchenko2021-07-18Abstract ▼In the text mining tasks, textual representation should be not only efficient but also interpretable,
as this enables an understanding of the operational logic underlying the data mining
models. Traditional text vectorization methods such as TF-IDF and bag-of-words are effective and
characterized by intuitive interpretability, but suffer from the «curse of dimensionality», and they
are unable to capture the meanings of words. On the other hand, modern distributed methods effectively
capture the hidden semantics, but they are computationally intensive, time-consuming,
and uninterpretable. This article proposes a new text vectorization method called Bag of weighted
Concepts BoWC that presents a document according to the concepts’ information it contains. The
proposed method creates concepts by clustering word vectors (i.e. word embedding) then uses the
frequencies of these concept clusters to represent document vectors. To enrich the resulted document
representation, a new modified weighting function is proposed for weighting concepts based
on statistics extracted from word embedding information. The generated vectors are characterized
by interpretability, low dimensionality, high accuracy, and low computational costs when used in
data mining tasks. The proposed method has been tested on five different benchmark datasets in
two data mining tasks; document clustering and classification, and compared with several baselines,
including Bag-of-words, TF-IDF, Averaged GloVe, Bag-of-Concepts, and VLAC. The results
indicate that BoWC outperforms most baselines and gives 7 % better accuracy on average








