ALGORITHMIC COMPLEXITY OF CALCULATING EXACT APPROXIMATIONS OF PROBABILITY DISTRIBUTIONS OF STATISTICAL VALUES BY SOLVING THE EQUATION OF THE FIRST MULTIPLICITY OF TYPES

  • A.K. Melnikov
Keywords: Probability, statistics, exact distribution, an accurate approximation of the vector of multiplicity of types, the linear equation algorithmic complexity

Abstract

We consider the algorithmic complexity of calculating the exact probability distributions of
statistical values and their exact approximations by solving the first multiplicity equation. As exact
approximations of probability distributions of statistical values, we consider their Δ−exact distributions
that differ from the exact distributions by no more than a predetermined, arbitrarily small
valueΔ. It is shown that the basis of the method for calculating the exact probability distributions
of statistical values is the enumeration of elements of the search area for solutions to a linear
equation of multiplicity of types, composed of vectors of multiplicity of types, each element of
which is the number of occurrences of elements of a certain type (any sign of the alphabet) in the
sample under consideration. At the same time, it is shown that the method of limiting the search
area for solutions is used to calculate exact approximations of the probability distribution of statistical
values. An expression is given that defines the algorithmic complexity of calculating exact
distributions by solving the first multiplicity equation. The given expression is finite and allows for
each value of the alphabet power to determine the maximum sample size for which, using a limited
computational resource, exact distributions can be calculated by solving the first multiplicity
equation. The range of parameters represented by the sample size and alphabet power for which
exact distributions can be calculated with a limited computing resource is defined. To estimate the
algorithmic complexity of calculating exact approximations of distributions, we present an expression
for the first time obtained for the number of solutions to the equation of the first multiplicity
with a restriction on the coordinate values of the solution vectors. An expression is given that defines
the algorithmic complexity of calculating exact approximations by solving the first multiplicity
equation with a restriction on the coordinate values of the solution vectors. As a parameter for
limiting the coordinates of solution vectors, the maximum frequency statistic value is used, the
probability of exceeding it is less than a pre-set, arbitrarily small valueΔ, which allows calculating
exact approximations of distributions that differ from their exact distributions by no more than the
selected value Δ. The given expression is finite and allows for each value of the alphabet to determine
the maximum sample size for which, when using a limited computational resource, exact
approximations can be calculated by solving the equation of the first multiplicity under the restrictions
set using the valueΔ. The results of calculations of the maximum sample volumes for
which exact approximations can be calculated are presented. It is shown that the algorithmiccomplexity of calculating exact distributions exceeds the complexity of calculating their exact approximations
by many orders of magnitude. It is shown that the use of the first multiplicity method
for calculating exact approximations allows for the same values of the alphabet power to increase
the sample volume by two or more times compared to the calculation of exact distributions.

References

1. Mel'nikov A.K. Sravnenie effektivnosti obrabotki tekstov pri primenenii v statisticheskikh
kriteriyakh tochnykh i predel'nykh priblizheniy bazovykh raspredeleniy veroyatnostey
znacheniy testovykh statistik [Comparison of the efficiency of text processing when applying
exact and marginal approximations of the basic probability distributions of test statistics in statistical
criteria], Obozrenie prikladnoy i promyshlennoy matematiki [Review of Applied and
Industrial Mathematics], 2018, Vol. 25, Issue 4, pp. 375-378. ISSN 0869-8325. Available at:
https://tvp.ru/conferen/vsppmXIX/repso114.pdf (accessed 14 January 2019).
2. Zelyukin N.B., Mel'nikov A.K. Slozhnost' rascheta tochnykh raspredeleniy veroyatnosti
znacheniy statistik i oblast' primeneniya predel'nykh raspredeleniy [The complexity of calculating
accurate probability distributions of statistical values and the scope of limit distributions],
Elektronnye sredstva i sistemy upravleniya: Mater. dokladov XIII Mezhdunar. nauch.-
prakt. konf. (29 noyabrya – 1 dekabrya 2017 g.) [Electronic means and control systems: Materials
of reports of the XIII International Scientific and Practical Conference (November 29 –
December 1, 2017)]: in 2 parts. Part 2. Tomsk: V-Spektr, 2017, pp. 84-90. Available at:
https://storage.tusur.ru/files/115115/2017-2.pdf (accessed 13 July 2018).
3. Mel'nikov A.K. Slozhnost' rascheta tochnykh raspredeleniy veroyatnosti simmetrichnykh
additivno razdelyaemykh statistik i oblast' primeneniya predel'nykh raspredeleniy [The complexity
of calculating the exact probability distributions of symmetric additive-separated statistics
and the application of limit distributions], Doklady TUSUR [Proceedings of Tomsk State
University]. Tomsk, 2017, Vol. 20, No. 4, pp. 126-130. ISSN 1818-0442.
4. Chepovskiy A.M. Informatsionnye modeli v zadachakh obrabotki tekstov na estestvennykh
yazykakh [Information models in tasks of processing of natural language texts]. Moscow:
Natsional'nyy otkrytyy universitet «INTUIT», 2015, 228 p. ISBN 978-5-9556-0176-2.
5. Mel'nikov A.K. Analiz tochnykh i predel'nykh priblizheniy raspredeleniy veroyatnostey
znacheniy statistik [Analysis of exact and marginal approximations of probability distributions
of statistical values], Superkomp'yuternye tekhnologii (SKT-2018): Mater. 5-y Vserossiyskoy
nauchno-tekhnicheskoy konferentsii [Supercomputer Technologies (SCT-2018): Proceedings
of the 5th All-Russian Scientific and Technical Conference]: in 2 vol. Vol. 1. Rostov-on-Don;
Taganrog: Izd-vo YuFU, 2018, pp. 100-104. ISBN 978-9275-2834-9.
6. Mel'nikov A.K. Primenenie tochnykh raspredeleniy v protsedure dvukhetapnoy obrabotki
tekstov [Application of exact distributions in the procedure of two-step text processing],
Obozrenie prikladnoy i promyshlennoy matematiki [Review of applied and industrial mathematics],
2018, Vol. 25, Issue 2, pp. 89-95. ISSN 0869-8325. Available at:
https://tvp.ru/conferen/vsppmXIX/repso051.pdf (accessed 24 January 2019).
7. Mel'nikov A.K., Ronzhin A.F. Obobshchennyy statisticheskiy metod analiza tekstov, osnovannyy na
raschete raspredeleniy veroyatnosti znacheniy statistik [Generalized statistical method of text analysis
based on calculation of probability distribution of statistical values] Informatika i ee primeneniya
[Informatics and its applications], 2016, Vol. 10, Issue 4, pp. 91-97. ISSN 1992-2264.
8. Mel'nikov A.K. Primenenie tochnykh i predel'nykh priblizheniy raspredeleniy veroyatnostey
znacheniy statistik pri reshenii zadachi obrabotke tekstov [Application of exact and limit approximations
of statistics probability distributions for the problem of text processing], Izvestiya YuFU.
Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2018, No. 8 (202), pp. 114-135.
9. Mel'nikov A.K. Otnositel'naya algoritmicheskaya slozhnost' rascheta tochnykh priblizheniy
raspredeleniy veroyatnostey znacheniy statistik [Relative algorithmic complexity of exact approximations
for probability distributions of statistics values], Izvestiya YuFU. Tekhnicheskie
nauki [Izvestiya SFedU. Engineering Sciences], 2019, No. 7 (209), pp. 35-45.
10. Mel'nikov A.K. Napravlenie modernizatsii chastotnogo metoda rascheta tochnykh
raspredeleniy veroyatnostey znacheniy statistik [Direction of modernization of the frequency
method of exact probability distributions for statistics values], Obozrenie prikladnoy i
promyshlennoy matematiki [Review of applied and industrial mathematics], 2017, Vol. 24, Issue
5, pp. 324-325. Available at: https://tvp.ru/conferen/vsppmXVIII/kisso073.pdf (accessed
12 July 2018).
11. Zuev Yu.A. Sovremennaya diskretnaya matematika: Ot perechislitel'noy kombinatoriki do
kriptografii XXI veka [Modern discrete mathematics: From enumerative combinatorics to
cryptography of the XXI century]. Moscow: LENARD, 2019, 720 p. (Fundamentals of
Information security No. 17). ISBN 978-5-9710-5660-7.
12. Mel'nikov A.K. Metodika rascheta raspredeleniya veroyatnostey znacheniy simmetrichnykh
additivno razdelyaemykh statistik, priblizhennykh k ikh tochnomu raspredeleniyu [A method
for calculating the probability distribution of values of symmetric additively separated statistics
that are close to their exact distribution], Nauchnyy vestnik NGTU [Science bulle-tin of the
Novosibirsk state technical university], 2018, No. 1 (70), pp. 153-166. ISBN 1814-1196.
Doi: 10.17212/1814-1196-2018-1-153-166.
13. Kramer G. Matematicheskie metody statistiki [Mathematical methods of statistics]. Moscow:
Mir, 1975, 648 p.
14. Knut D.E. Iskusstvo programmirovaniya dlya EVM. T. 3. Sortirovka i poisk [The art of computer
programming. Vol. 3. Sorting and Searching]. Moscow: Mir, 1978, 844 p.
15. Spisok TOR 500 55-y redaktsiya ot 22 iyunya 2020 goda [TOP 500 list 55 edition of june 22,
2020].
16. Voevodin V.V., Voevodin Vl.V. Parallel'nye vychisleniya [Parallel computing]. St. Petersburg:
BKhV-Peterburg, 2002, 608 p. ISBN 5-94157-160-7.
17. Murav'ev S.A. Istoriya otechestvennoy elektronnoy vychislitel'noy tekhniki [History of Russian
electronic computer technology], under the General editorship of the Director of the Department
of radio-electronic industry of the Ministry of industry and trade of Russia S.V.
Khokhlov. Moscow: OOO "Izdatel'skiy dom "Stolichnaya entsiklopediya", 2017, 680 p. ISBN
978-5-903989-34-8.
18. Guzik V.F., Kalyaev I.A., Levin I.I. Rekonfiguriruemye vychislitel'nye sistemy [Reconfigurable
computing systems], under the General ed. of I.A. Kalyayev. Taganrog: Izd-vo YuFU, 2016,
472 p. ISBN 978-5-9275-1918-7.
19. Sachkov V.N. Vvedenie v kombinatornye metody diskretnoy matematiki [Introduction to combinatorial
methods of discrete mathematics]. Moscow: Nauka. Gl. red. fiz.-mat. lit., 1982, 384 p.
20. Svidetel'stvo o gosudarstvennoy registratsii programmy dlya EVM № 2018664980. Raschet
veroyatnostey znacheniy statistiki maksimal'noy chastoty. Pravoobladatel' Mel'nikov A.K. Avtor:
Mel'nikov A.K. i Zelyukin N.B. Zayavka № 2018662123. Data postupleniya 01 noyabrya 2018
g. Data gosudarstvennoy registratsii v Reestre programm dlya EVM 27 noyabrya 2018 g. [Certificate
of state registration of computer software № 2018664980. Calculation of probabilities of
maximum frequency statistics. Proprietor – A.K. Melnikov. Authors: A.K. Melnikov and
N.B. Zeliukin. Application № 2018662123. Date of filing – November 01, 2018. Date of state
registration in the Register of computer software – November 27, 2018].
21. Mel'nikov A.K. Otnositel'naya algoritmicheskaya slozhnost' rascheta tochnykh priblizheniy
raspredeleniy veroyatnostey znacheniy statistik [Relative algorithmic complexity of exact approximations
for probability distributions of statistics values], Modeli, metody i tekhnologii
intellektual'nogo upravleniya (IU-2019): Mater. 12-y Mul'tikonferentsii po problemam
upravleniya (MKPU-2019) [Models, methods and technologies of intelligent management (IU-
2019): Materials of the 12th Multi-Conference on Management Problems (ICPU-2019)]: in 4 vol.
Vol. 1. Rostov-on-Don; Taganrog: Izd-vo YuFU, 2019, pp. 108-112. ISBN 978-5-9275-3189-9.
Published
2021-02-25
Section
SECTION II. MATHEMATICAL AND SYSTEM SOFTWARE FOR SUPERCOMPUTERS