International Core Journal of Engineering 2020-26

2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM) Sound Event Recognition Based in Feature Combination with Low SNR Guolin Yan *Mei Wang Guilin University of Technology College of Information Science and Engineering Guilin, China e-mail:[email protected] Guilin University of Technology College of Information Science and Engineering Guilin, China *e-mail: [email protected] X. Liu Xiyu Song Guilin University of Technology Guangxi key Laboratory Fund of Embedded Technology and Intelligent System Guilin, China e-mail:[email protected] Guilin University of Electronic Technology Key Laboratory of Cognitive Radio and Information Processing Guilin, China e-mail:[email protected] Abstract—In the sound environment of instability noise, the recognition rate of sound events would be reduced if using a single sound feature to train public sound data. To solve this problem, we propose a new method of sound event recognition based in feature combination with low SNR in this paper. It improves the recognition of sound events in real scenes by adding noise to public sound data sets. On the other hand, in order to overcome the inaccuracy of single features in noise recognition and significantly improved the recognition rate, we propose a new combination method which combining multiple features and drawing on the advantages of image recognition. Actual test shows that the public sound data set with noise has good generalization ability to the actual scene. The designed combined features can further improve the classification recognition performance, and the recognition rate of real environmental acoustic events can reach 75%, meeting the requirements of the actual system. of the selection and the related aspects. At this stage, the sound features we use are not specifically characterized by sound event recognition and are basically the features proposed by sound recognition. Therefore, existing features need to be improved to some extent. On the other hand, Traditional classification methods use SVM, HMM, and nowadays, many scholars also use various machine learning algorithms for classification. The recognition rate of the difference between a part of the machine learning classification and the actual scene sound is not greatly affected, but the recognition sound similar to the actual environmental sound is even reduced by about 20% under the influence of the environmental sound. In addition to excluding the effects of training sets and network construction, the features used are also factors that affect the recognition rate. Different single feature sounds have different recognition rates in different environments, so the combination feature is one of the ways to solve this problem. Keywords—sound recognition; ambient acoustic event; combined feature; Convolutional neural network In this paper, we propose a new method named sound event recognition based in feature combination with low SNR to solve the problem of the influence of ambient sound on event recognition in the above actual scene. First, the public sound data set is subjected to noise-adding processing according to the signal-to-noise ratio, so that the test data set more closely matches the trained data. In practical application scenarios, the model recognition rate after noise addition is higher than that of raw model recognition rate. Owing to the influence of noise and quantity, the noise- added sound data set can improve the generalization ability of the network. In terms of feature processing, we select representative feature short-term energy, short-time zero- crossing rate, MFCC, Log-Mel spectrogram from time domain and frequency domain, and combine them into a new feature instead of using a single feature. Feature combination method draws on the advantages of convolutional neural network and the advantages of image recognition. Extract two sets of 2D features from one sound data, and the dimensions and frames of the two sets of features are the same. Then, two sets of two-dimensional feature arrays are superimposed from the channel to obtain a I. I NTRODUCTION Low signal-to-noise ratio (SNR) sound events recognition is to detect and recognize sound events mixed with various noises in actual scenes. At present, the main research directions of acoustic event recognition include: features and improvement of sound events [1-3] ; selection or improvement of sound events classifier [4, 5] ; detection and classification of indoor sound scene [6] ; Sound event detection of sound events in specific environment [7, 8] and so on. Detection of sound events in specific environment and so on. These studies are of great significance for sound event recognition. However, in some real environmental acoustic events, the recognition rate has different degrees of influence due to noise instability. It is a major factor in the judgment of our sound categories that there are too many types of sound information in real sounds, and the characteristics of event sounds may not be the most prominent [9] . In order to improve the recognition rate of demand events in environmental sound, we mainly improve it from two aspects. On the one hand, it is to improve the characteristics 978-1-7281-4691-1/19/$31.00 ©2019 IEEE DOI 10.1109/AIAM48774.2019.00029 109

International Core Journal of Engineering 2020-26 | Page 131