MATHEMATICAL MODEL FOR CALCULATING RELIABILITY INDICES OF SCALABLE COMPUTER SYSTEMS WITH SWITCHING TIME

  • V.A. Pavsky Kemerovo State University
  • K.V. Pavsky Rzhanov Institute of Semiconductor Physics Siberian Branch of Russian Academy of Sciences, Siberian State University of Telecommunications and Informatics
Keywords: Computer systems, scalability, failures, switching time, mathematical model, analysis, reliability and robustness indices, analytical solutions

Abstract

The main feature of scalable computer systems is modularity. Increasing performance in
such systems is achieved by increasing the same type of elements, elementary machines (EM, for
example, a computing node). As a result of failures, the system performance is changed. Thus,
scalability of computer systems (CS), on the one hand, increases performance, but on the other
hand, computer resource growth exacerbates the problem of reliability and increases the complexity
of organizing effective functioning. Analysis of reliability and potential capabilities of computing
systems is still an urgent problem. For quantitative analysis of the functioning of scalable
computing systems, robustness indices related to reliability are used. For example, indices of potential
robustness of CS take into account the fact that all operable elementary machines are used
in solving tasks, the number of which (EM) changes over time as a result of failures and recoveries.
When analyzing reliability, models based on the theory of Markov processes and Queuing
theory (QT) are popular in the theory of computing systems. Most QT analytical models do not
consider the switching time (reconfiguration) in a separate parameter, due to the complexity of thesolution. Usually, models are simplified by the fact that the recovery time and switch combined in a
single parameter. Analytical solutions of a system of differential equations with three parameters
(failure, recovery, and switching) for calculating reliability and potential robustness are obtained on
the example of the QT model. This allows the user to determine whether the switching time should be
taken into account. Also it is shown that solutions of the three-parameter model are reduced to solutions
of the two-parameter model if the switching time is not taken into consideration.

References

1. Khoroshevskiy V.G. Arkhitektura vychislitel'nykh sistem [Architecture of computing systems].
Moscow: MGTU im. Baumana, 2008, 520 p.
2. TOP500 Supercomputers Official Site. TOP500 Lists. Available at: http://www.top500.org
(accessed 25 March 2020).
3. Gupta S., Patel T., Engelmann C., Tiwari D. Failures in large scale systems: long-term measurement,
analysis, and implications, SC '17: Proceedings of the International Conference for
High Performance Computing, Networking, Storage and Analysis, Article No. 44, Denver,
Colorado – November 12-17, 2017.
4. Schroeder В., Gibson Garth. A large-scale study of failures in high-performance computing
systems, Proceedings of the International Conference on Dependable Systems and Networks
(DSN2006), Philadelphia, PA, USA, June 25-28, 2006, 10 р.
5. Vishnevskiy V.M. Teoreticheskie osnovy proektirovaniya komp'yuternykh setey [Theoretical
foundations of computer network design]. Moscow: Tekhnosfera, 2003, 512 p.
6. Khoroshevskiy V.G. Modeli analiza i organizatsii funktsionirovaniya bol'shemasshtabnykh
raspredelennykh vychislitel'nykh system [Models of analysis and organization of functioning
of large-scale distributed computing systems], Elektronnoe modelirovanie [Electronic simulation].
Kiev, 2003, Vol. 25, No. 6.
7. Blischke W.R., Murthy D.N.P. Reliability. New York: Wiley, 2000.
8. Hoyland A., Rausand M. System reliability theory. New York: Wiley, 1994.
9. Xie M., Dai Y.S., Poh K.L. Computing system reliability: models and analysis. New York:
Kluwer academic publishers, 2004.
10. Kuo W., Zuo M.J. Optimal reliability modeling: principles and applications. New York: Wiley,
2003.
11. Chechel'nitskiy A.A., Kucherenko O.V. Statsionarnye kharakteristiki parallel'no
funktsioniruyushchikh sistem obsluzhivaniya s dvumernym vkhodnym potokom [Stationary
characteristics of parallel functioning service systems with two-dimensional input flow], Sb.
nauchnykh statey [Collection of scientific articles]. Minsk, 2009, Issue 2, pp. 262-268.
12. Nazarov A.A., Terpugov A.F. Teoriya massovogo obsluzhivaniya [Queueing theory]. Tomsk:
Izd-vo NTL, 2010, 228 p.
13. Saati T.L. Elementy teorii massovogo obsluzhivaniya i ee prilozheniya [Elements of Queuing
theory and its applications]. 3 ed. Moscow: Knizhnyy dom «LIBROKOM», 2010, 520 p.
14. Kleynrok L. Teoriya massovogo obsluzhivaniya [Theory of Queuing]. Moscow:
Mashinostroenie, 1979, 432 p.
15. Borovkov A.A. Veroyatnostnye protsessy v teorii massovogo obsluzhivaniya [Probabilistic
processes in the theory of Queuing]. Moscow: Nauka, 1972, 368 p.
16. Venttsel' E.S. Teoriya sluchaynykh protsessov i ee inzhenernye prilozheniya [Theory of random
processes and its engineering applications]. Moscow: Nauka, 1991, 384 p.
17. Mor Harchol-Balter Performance Modeling and Design of Computer Systems: Queueing Theory
in Action. Cambridge University Press, 2013.
18. Feller V. Vvedenie v teoriyu veroyatnostey i ee prilozheniya [Introduction to probability theory
and its applications]: in 2 vol. Vol. 1. Moscow: «LIBROKOM», 2010, 528 p.
19. Pavskiy V.A., Pavskiy K.V., Khoroshevskiy V.G. Vychislenie pokazateley zhivuchesti
raspredelennykh vychislitel'nykh sistem i osushchestvimosti resheniya zadach [Calculating the
survivability indicators of distributed computing systems and the feasibility of solving problems],
Iskusstvennyy intellect [Artificial intelligence], 2006, No. 4, pp. 28-34.
20. Takach L. Kombinatornye metody v teorii sluchaynykh protsessov [Combinatorial methods in
the theory of random processes]. Moscow: Mir, 1971, 264 p.
21. Rayder G.Dzh. Kombinatornaya matematika [Combinatorial mathematics]. Moscow: Mir,
1966, 154 p.
Published
2020-07-20
Section
SECTION II. COMPUTING AND INFORMATION AND CONTROL SYSTEMS