IMPLEMENTATION OF THE DNA ASSEMBLY PROBLEM ON RECONFIGURABLE COMPUTER SYSTEMS

  • A.I. Levina Supercomputers and Neurocomputers Research Center
  • E.E. Semernikova Supercomputers and Neurocomputers Research Center
  • D.A. Sorokin Supercomputers and Neurocomputers Research Center
Keywords: Field-programmable gate array, reconfigurable computing structure, de novo sequencing assembly, computing acceleration

Abstract

The paper deals with research of methods and tools for the problem of DNA assembly, which provide considerable reducing of the processing time for the specified accuracy in compari-son with other methods and tools. We are considering using of reconfigurable computer systems for the assembly problems. As an example we use implementation of a key procedure of the algo-rithm of genome assembly Velvet Assembler – a procedure of contigs generation VelvetH. The base of the Velvet Assembler is a new generation method which implies generation of a de Bruijn graph, and, as a result, causes considerably variable density of data flows in the nondeterministic polynomial time complete problem of DNA assembly. That is why, in addition to the technology of structural-procedural organization of calculations, which is traditional for reconfigurable com-puter systems, we used special methods of synthesis of parallel-pipeline applications to provide a possibility in principle of implementation of such problems on reconfigurable computer systems. For evaluation of efficiency of reconfigurable computer systems use, we have developed, using a procedure VelvetН, a parallel-pipeline application which assembles a genome from short reads of a DNA Staphylococcus aureus. We have taken data from the database Sequence Read Archive from the website National Center for Biotechnology Information. The parallel-pipeline application was tested on a reconfigurable computer system “Tertius”, designed on the base of four Xilinx Kintex UltraScale XCKU 095 FPGAs. Use of this reconfigurable computer system provides 24-fold (and more) reduction of the execution time of the contig generation procedure for the DNA assembly problem against the existing analogs. Due to this we can conclude, that use of reconfigu-rable computer systems for implementation of the DNA assembly problem is a promising direction, which requires further scientific and technical research.

References

1. Ohashi H., Hesegawa M., Wakimoto K., Miyamoto - Sato E. Next - generation technologies for multiomics approaches including interactome sequencing, BioMed Research Internat ional, 2015, Vol. 2015. Article No.104209.
2. Zerbino D., Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs, Genome Research, 2008, Vol. 18, No. 5, pp. 821-829.
3. Ohashi H., Hesegawa M., Wakimoto K., Miyamoto - Sato E. Next - generation technologies for multiomics approaches including interactome sequencing, BioMed Research Internat ional, 2015, Vol. 2015. Article No.104209.
4. Chapman J.A., Ho I., Sunkara S., Luo S., Schroth G.P., et al. Meraculous: De Novo Genome Assembly with Short Paired-End Reads. PLoS ONE 6(8)e23501, 2011. Doi:10.1371/journal.pone.0023501.
5. Romanenko K.V., Sal'nikov A.H., Alekseevskiy A.V. Parallel'nyy metod ob"edineniya rezul'tatov raboty programm po sborke genoma [Parallel method of combining the results of genome assembly programs], Vestnik YuUrGU. Seriya: Vychislitel'naya matematika i informatika [Vestnik SUSU. Series: Computational Mathematics and Computer Science], 2016, Vol. 5, No. 1, pp. 24-34. DIO: 10.14529/cmse160103.
6. Sychev A.A. Analiz genomnykh dannykh s ispol'zovaniem graficheskikh uskoriteley: magisterskaya dissertatsiya, po spetsial'nosti 09.04.01.01 «Vysokoproizvoditel'nye vychislitel'nye sistemy». Nauchnyy rukovoditel' Kuz'min D.A [Analysis of genomic data using graphics accelerators: master's thesis: Мaster thesis specialty 09.04.01.01 “High-performance computing systems”. Scientific advisor Kuzmin D.А.]. Krasnoyarsk, 2016, 42 p. Available at: http://elib.sfu-kras.ru/bitstream/handle/2311/29230/sychev_zashifrovan.pdf?sequence= 2&isAllowed=y (accessed 15 November 2018).
7. Kirilova A.A. Razrabotka algoritma netochnogo poiska chteniy v genome s primeneniem vychisleniy na videokartakh: magisterskaya dissertatsiya, po spetsial'nosti 01.03.02 «Prikladnaya matematika i informatika». Nauchnyy rukovoditel' Shalyto A.A [Development fuzzy string search algorithm of readings in the genome using computing on video cards: Mas-ter thesis, specialty 01.03.02 “Applied Mathematics and Computer Science”, scientific advisor Shalyto A.A.]. Sankt-Peterburg, 2014. pp. 45. Available at: http://is.ifmo.ru/diploma-theses/2015/master/kirillova/kirillova.pdf (accessed 15 November 2018).
8. Nazipova N.N., Isaev E.A., Kornilov V.V., Pervukhin D.V., Morozova A.A., Gorbunov A.A., Ustinin M.N. Bol'shie dannye v bioinformatike [Big data in bioinformatics], Matematicheskaya biologiya i bioinformatika [Mathematical biology and bioinformatics], 2017, No. 12:1, pp. 102-119.
9. Kalyaev A.V., Levin I.I. Modul'no-narashchivaemye mnogoprotsessornye sistemy so strukturno-protsedurnoy organizatsiey vychisleniy [Modular-scalable multiprocessor system with structural-procedural organization of computing]. Moscow: Yanus-K, 2003, 380 p.
10. Sorokin D.A., Dordopulo A.I. Metodika sokrashcheniya apparatnykh zatrat v slozhnykh sistemakh pri reshenii zadach s sushchestvenno-peremennoy intensivnost'yu potokov dannykh [Technique of reducing hardware costs in complex systems in solution with greatly-varying in-tensity of data flows], Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2012, No. 4 (129), pp. 213-219.
11. Levin I.I., Sorokin D.A., Mel'nikov A.K., Dordopulo A.I. Reshenie zadach s sushchestvenno-peremennoy intensivnost'yu potokov dannykh na rekonfiguriruemykh vy-chislitel'nykh sistemakh [Solving problems with essentially variable intensity of data flows on reconfigura-ble computing systems], Vestnik komp'yuternykh i informatsionnykh tekhnologiy [Bulletin of computer and information technologies], 2012, No. 2, pp. 49-56.
12. Kalyaev A.V., Levin I.I., Semernikov E.A., Shmoylov V.I. Rekonfiguriruemye mul'tikonveyrnye vychislitel'nye struktury [Multicompany reconfigurable computing structures]. 2nd ed., by general ed. of I.A. Kalyaeva. Rostov-on-Don: Izd-vo YUNTS RAN, 2009, 344 p. ISBNN 978-5-902982-61-6.
13. Dordopulo A.I. Kalyaev I.A., Levin I.I., Semernikov E.A. Semeystvo mnogoprotsessornykh vychislitel'nykh sistem s dinamicheski perestraivaemoy arkhitekturoy [Family of multiproces-sor computer systems with dynamically reconfigurable architecture], Mnogoprotsessornye vychislitel'nye i upravlyayushchie sistemy: Materialy nauchno-tekhnicheskoy konferentsii [Multiprocessor computing and control systems. Materials of scientific and technical conference]. Taganrog, 2007, pp. 11-17.
14. Kalyaev I.A., Levin I.I., Semernikov E.A., Dordopulo A.I. Rekonfiguriruemye vychislitel'nye sistemy na osnove PLIS semeystva VERTEX-6 [Reconfigurable computing system based on FPGA family of VERTEX-6], Parallel'nye vychislitel'nye tekhnologii (PAVT’2011): Trudy mezhdunarodnoy nauchnoy konferentsii [Parallel computational technologies (PCT ' ’2011). Proceedings of the international scientific conference], 2011, pp. 203-211.
15. Raskladkin M.K. Biblioteka masshtabiruemykh interfeysov dlya rekonfiguriruemykh vychislitel'nykh sistem na osnove PLIS [Library of scalable interfaces for reconfigurable com-puting systems based on FPGA], Vysokoproizvoditel'nye parallel'nye vychisleniya na klasternykh sistemakh. Materialy devyatoy mezhdunarodnoy konferentsii-seminara [High-performance parallel computing on cluster systems: Proceedings of the ninth international con-ference seminar]. Vladimir, 2009, 438 p. ISBN 978-5-89368-958-7.
16. Dasgupta S., Papadimitriu Kh., Vazirani U. Algoritmy [Algorithms]: transl. from engl., ed. by A. Shenya. Moscow: MTSNMO, 2014, 320 p. ISBN 978-5-4439-0236-4.
17. Illumina whole genome shotgun sequencing of genomic DNA library 'Solexa-1123' containing sample ROAD: SEQUENCING_SAMPLE:15760.0. Available at: https://www.ncbi.nlm.nih.gov/ sra/SRR022823 (accessed 10 November 2018).
18. Boyko V.A. Razrabotka algoritma sborki i analiza bol'shikh genomov [Development algorithm for the assembly and analysis of big genomes], Molodoy uchenyy [Young Scientist], 2017, No. 3, pp. 27-28. Available at: https://moluch.ru/archive/137/38530/ (accessed 10 November 2018).
19. Zerbino D.R., McEwen G.K., Margulies E.H., Birney E. Pebble and rock band: heuristic reso-lution of repeats and scaffolding in the velvet short-read de novo assembler, PLoS One, 2009, No. 4 (12): e8407.
20. Alekseeva A.E., Brusnigina N.F. Vozmozhnosti i perspektivy primeneniya metodov massivnogo parallel'nogo sekvenirovaniya v diagnostike i epidemiologicheskom nadzore za infektsionnymi zabolevaniyami [Opportunities and prospects of application of methods of massive parallel sequencing in diagnostics and epidemiological surveillance of infectious dis-eases], MediAl' [Medial], 2014, No. 2 (12).
21. Levina A.I. Apparatnaya realizatsii sborki genoma iz korotkikh chteniy na osnove grafa de Bryuyna [Hardware implementation of genome Assembly from short readings based on de Bruijn graph], ХIV ezhegodnaya molodezhnaya nauchnaya konferentsiya studentov, aspirantov i molodykh uchenykh «Dostizheniya i perspektivy molodykh uchenykh v interesakh razvitiya Yuga Rossii» [XIV annual youth scientific conference of students, postgraduates and young scientists "Achievements and prospects of young scientists in the interests of development of the South of Russia"]. Rostov-on-Don, 2018, pp. 75.
22. Varma B.S.C., Paul K., Balakrishnan M. and Lavenier D. Hardware acceleration of de novo genome assembly, Int. J. Embedded Systems, 2017, Vol. 9, No. 1, pp. 74-89.
Published
2019-04-04
Section
SECTION IV. RECONFIGURABLE AND NEURAL NETWORK COMPUTING SYSTEMS