1. bookVolumen 22 (2021): Heft 4 (November 2021)
Zeitschriftendaten
License
Format
Zeitschrift
eISSN
1407-6179
Erstveröffentlichung
20 Mar 2000
Erscheinungsweise
4 Hefte pro Jahr
Sprachen
Englisch
access type Uneingeschränkter Zugang

Aviation Profiling Method Based on Deep Learning Technology for Emotion Recognition by Speech Signal

Online veröffentlicht: 20 Nov 2021
Seitenbereich: 471 - 481
Zeitschriftendaten
License
Format
Zeitschrift
eISSN
1407-6179
Erstveröffentlichung
20 Mar 2000
Erscheinungsweise
4 Hefte pro Jahr
Sprachen
Englisch
Abstract

This paper proposes a method of automatic speaker-independent recognition of human psycho-emotional states by analyzing the speech signal based on Deep Learning technology to solve the problems of aviation profiling. For this purpose, an algorithm to classify seven human psycho-emotional states, including anger, joy, fear, surprise, disgust, sadness, and neutral state was developed. The algorithm is based on the use of Mel-frequency cepstral coefficients and Mel spectrograms as informative features of speech signals audio recordings. These informative features are used to train two deep convolutional neural networks on the generated dataset. The developed classifier testing on a delayed verification dataset showed that the metric for the multiclass fraction of correct answers’ accuracy is 0.93. The solution proposed in the paper can be in demand in human-machine interfaces creation, medicine, marketing, and in the field of air transportation.

1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J. et al. (2016) TensorFlow: A system for large-scale machine learning. In: Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), USENIX Association, 2016, 265-283. Search in Google Scholar

2. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B. (2005) A database of German emotional speech. In: Proceedings of the Interspeech 2005, Lissabon, Portugal, 2005, 1517–1520.10.21437/Interspeech.2005-446 Search in Google Scholar

3. Cornelius, R. R. (1996) The science of emotion: Research and tradition in the psychology of emotions. Prentice-Hall, Upper Saddle River, NJ. Search in Google Scholar

4. Ekman, P. (1971). Universals and cultural differences in facial expressions of emotion. Nebraska Symposium on Motivation, 19, 207–283. Search in Google Scholar

5. El Ayadi, M.M.H., Kamel, M.S., Karray, F. (2007) Speech emotion recognition using Gaussian mixture vector autoregressive models. In: Proceedings of ICASSP 2007, 4, 957–960. Search in Google Scholar

6. Garofolo, John S. et al. (1993) TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. In: Proceedings of the Philadelphia: Linguistic Data Consortium, 1993. DOI: 10.35111/17gk-bn40. Search in Google Scholar

7. Go, H., Kwak, K., Lee, D., Chun, M. (2003) Emotion recognition from the facial image and speech signal. In: Proceedings of the IEEE SICE 2003, 3, 2890–2895. Search in Google Scholar

8. Goodfellow, I., Bengio, Y., Courville, A. (2016) Deep Learning. Cambridge, MA: MIT Press. Search in Google Scholar

9. Haq, S., Jackson, P.J.B. (2009) Speaker-Dependent Audio-Visual Emotion Recognition, In: Proceedings of International Conference on Auditory-Visual Speech Processing, 53-58. Search in Google Scholar

10. Jont, B. Allen (1977) Short Time Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform. In: Proceedings of IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-25(3), 235–238. Search in Google Scholar

11. Kingma, D., Ba, J. (2014) Adam: A Method for Stochastic Optimization. In: Proceedings of 3rd International Conference for Learning Representations, San Diego, 2015, 1-15. Search in Google Scholar

12. Kun, Z., Berrak, S., Rui, L., Haizhou, L. (2021) Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6-11 June 2021.Toronto, Ontario, Canada. DOI: 10.1109/ICASSP39728.2021.9413391.10.1109/ICASSP39728.2021.9413391 Search in Google Scholar

13. Lawrence, R., Ronald, S. (2007) Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing, 1(1–2), 1-194. DOI: 10.1561/2000000001.10.1561/2000000001 Search in Google Scholar

14. Lee, C., Narayanan, S. (2005) Toward detecting emotions in spoken dialogs. In: Proceedings of IEEE Trans. Speech Audio Process, 13(2), 293–303. Search in Google Scholar

15. Livingstone, S., Russo, F. (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5). DOI: 10.1371/journal.pone.0196391.10.1371/journal.pone.0196391 Search in Google Scholar

16. Martin, O., Kotsia, I., Macq, B., Pitas, I. (2006) The INTERFACE’05 Audio-Visual Emotion Database. In: Proceedings of Data Engineering Workshops, Proceedings. 22nd International Conference. DOI: 10.1109/ICDEW.2006.145.10.1109/ICDEW.2006.145 Search in Google Scholar

17. Pichora-Fuller, M. K., Dupuis, K. (2020) Toronto emotional speech set (TESS). Scholars Portal Dataverse, V1. DOI: 10.5683/SP2/E8H2MF. Search in Google Scholar

18. Razuri, J. G., Sundgren, D., Rahmani, R., Larsson, A., Moran, A. C., Bonet, I. (2015) Speech emotion recognition in emotional feedback for Human-Robot. Interaction. International Journal of Advanced Research in Artificial Intelligence, 4(2). DOI: 10.14569/IJARAI.2015.040204.10.14569/IJARAI.2015.040204 Search in Google Scholar

19. Schuller, B. (2002) Towards intuitive speech interaction by the integration of emotional aspects. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, 6. DOI: 10.1109/ICSMC.2002.1175635.10.1109/ICSMC.2002.1175635 Search in Google Scholar

20. Schuller, B., Rigoll, G., Lang, M. (2003) Hidden Markov model-based speech emotion recognition. In: Proceedings of the International Conference on Multimedia and Expo (ICME), 1, 401–404. Search in Google Scholar

21. Schuller, B., Rigoll, G., Lang, M. (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of the ICASSP 2004, 1, 577–580.10.1109/ICASSP.2004.1326051 Search in Google Scholar

22. Uday, K., John, L., Whitaker, J. (2019) Deep Learning for NLP and Speech Recognition. Springer Nature Switzerland AG. Search in Google Scholar

23. Zwicker, E., Fastl, H. (1990) Psycho-acoustics. Springer-Verlag, 2nd Edition. Search in Google Scholar

Empfohlene Artikel von Trend MD

Planen Sie Ihre Fernkonferenz mit Scienceendo