АНАЛИЗ СИСТЕМ ОПРЕДЕЛЕНИЯ И КЛАССИФИКАЦИИ ЭМОЦИЙ ЧЕЛОВЕКА ПО ДАННЫМ ЗВУКОВОГО ПОТОКА

А.А. Egorchev; D. М. Pashin; N. А. Sarambaev; А. F. Fakhrutdinov

А.А. Egorchev Kazan Federal University
D. М. Pashin Kazan Federal University
N. А. Sarambaev Kazan Federal University
А. F. Fakhrutdinov Kazan Federal University

Keywords: Noninvasive monitoring system, machine learning, biomedical monitoring, smartphone sensors, acoustic signal analysis, emotion recognition

Abstract

In today's rapidly changing and demanding work environment, the ability to quickly and accurately
assess an employee's emotional state is crucial to protecting human lives and reducing material risks.
Emotional well-being plays an important role in workplace safety, productivity, and overall mental health.
Therefore, the development of effective tools for monitoring negative emotions and responding to them is
an urgent task of our time. The purpose of this study is to develop an algorithm capable of classifying
emotions using audio data recorded by a user's smartphone. Such a tool is especially useful if integrated
into a broader health monitoring system that allows you to evaluate human health indicators in real time
using non-invasive methods. This article presents a new solution that uses acoustic signals picked up by a
smartphone microphone to detect and classify user emotions. Using convolutional neural networks
(CNNS), a type of deep learning algorithm known for its effectiveness in processing audio and visual data,
the proposed system can determine the user's emotional state. The CNN model is trained to recognize
patterns in audio data corresponding to various emotional manifestations, focusing on detecting negative
emotions such as anger or sadness. The results of the study demonstrate the effectiveness of the system:
the error rate in determining negative emotions is 19.5% for false positive results (errors of the first kind)
and 20.1% for false negative results (errors of the second kind). These indicators indicate its potential for
practical application in real conditions. By integrating this solution into existing biomedical monitoring
systems, organizations can expand their ability to monitor the emotional well-being of employees, potentially
preventing negative consequences such as industrial accidents or mental health crises. The integration
of emotion recognition using smartphones into health monitoring systems represents significant progress
in the field of non-invasive biomedical monitoring, using the ubiquitous presence of smartphones
and machine learning capabilities.

References

1. Opensmile 2.4.2, PyPI - Ukazatel' paketov Python'a [PyPI - Python Package Index]. Available at:
https://pypi.org/project/opensmile/ (accessed 12 December 2022).
2. Sentiment-Predictor-for-Stress-Detection-using-Voice // GitHub. – URL: https://github.com/
sidmulajkar/sentiment-predictor-for-stress-detection (accessed 10 October 2022).
3. Audio Emotion | Part 1 - Explore data, Kaggle. Available at: https://www.kaggle.com/code/ejlok1/
audio-emotion-part-1-explore-data (accessed 10 October 2022).
4. Voronina V.V., Mikheev A.V., Yarushkina N.G., Svyatov K.V. Teoriya i praktika mashinnogo
obucheniya: ucheb. posobie [Theory and practice of machine learning: a textbook]. Ul'yanovsk:
UlGTU, 2017, 290 p.
5. Surrey Audio-Visual Expressed Emotion (SAVEE) Database, The Centre for Vision, Speech and Signal
Processing. – URL: http://kahlan.eps.surrey.ac.uk/savee/ (accessed 29 November 2022).
6. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal
set of facial and vocal expressions in North American English, PLOS. Available at:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196391 (accessed 28 November 2022).
7. CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset, The National Center for Biotechnology
Information. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4313618/ (accessed
30 November 2022).
8. Lektsiya 1. Pervichnyy analiz rechevykh signalov [Primary analysis of speech signals], Alpha Cephei
speech Recognition, Available at: https://alphacephei.com/ru/research (accessed 12 December 2022).
9. Panfilov I.A., Alekseev M.S., Sivtsova E.I. Izvlechenie priznakov golosovogo korseta [Extraction of
signs of a vocal corset], Tsifrovaya transformatsiya ekonomicheskikh sistem: problemy i perspektivy
(ekoprom-2022) [Digital transformation of economic systems: problems and prospects (ecoprom-
2022)]. Saint Petersburg: POLITEKH-PRESS, 2022, pp. 794-796.
10. El Ayadi M., Kamel M.S., Karray F. Survey on speech emotion recognition: Features, classification
schemes, and databases, Pattern recognition, 2011, Vol. 44, No. 3, pp. 572-587.
11. Abbaschian B.J., Sierra-Sosa D., Elmaghraby A. Deep learning techniques for speech emotion recognition,
from databases to models, Sensors, 2021, Vol. 21, No. 4, pp. 1249.
12. Swain M., Routray A., Kabisatpathy P. Databases, features and classifiers for speech emotion recognition:
a review, International Journal of Speech Technology, 2018, Vol. 21, pp. 93-120.
13. Chen L., Su W., Feng Y., Wu M., She J., Hirota K. Two-layer fuzzy multiple random forest for speech
emotion recognition in human-robot interaction, Information Sciences, 2020, Vol. 509, pp. 150-163.
14. Gokilavani M., Katakam, H. Basheer. Ravdness, crema-d, tess based algorithm for emotion recognition
using speech, 2022 4th International conference on smart systems and inventive technology
(ICSSIT). IEEE, 2022, pp. 1625-1631.
15. Zielonka M., Piastowski A., Czyżewski A., Nadachowski P., Operlejn M., Kaczor K. Recognition of
emotions in speech using convolutional neural networks on different datasets, Electronics, 2022, Vol.
11, No. 22, pp. 3831.
16. Xie Y., Liang R., Liang Z., Huang C., Zou C., Schuller B. Speech emotion classification using attention-
based LSTM, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, Vol.
27, No. 11, pp. 1675-1685.
17. Zhao J., Mao X., Chen L. Speech emotion recognition using deep 1D & 2D CNN LSTM networks,
Biomedical signal processing and control, 2019, Vol. 47, pp. 312-323.
18. Wang J., Xue M., Culhane R., Diao E., Ding J., Tarokh V. Speech emotion recognition with dualsequence
LSTM architecture, ICASSP 2020-2020 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 6474-6478.
19. Huang Z., Dong M., Mao Q., Zhan Y. Speech emotion recognition using CNN, Proceedings of the
22nd ACM international conference on Multimedia, 2014, pp. 801-804.
20. Anvarjon T., Mustaqeem, Kwon S. Deep-net: A lightweight CNN-based speech emotion recognition
system using deep frequency features, Sensors, 2020, Vol. 20, No. 18, pp. 5212.

EMOTION DETECTION AND CLASSIFICATION SYSTEM BASED ON SOUND FLOW DATA

Abstract

References