International Core Journal of Engineering 2020-26 | Page 131
2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)
Sound Event Recognition Based in Feature
Combination with Low SNR
Guolin Yan
*Mei Wang
Guilin University of Technology
College of Information Science and Engineering
Guilin, China
e-mail:[email protected]
Guilin University of Technology
College of Information Science and Engineering
Guilin, China
*e-mail: [email protected]
X. Liu
Xiyu Song
Guilin University of Technology
Guangxi key Laboratory Fund of Embedded Technology
and Intelligent System
Guilin, China
e-mail:[email protected] Guilin University of Electronic Technology
Key Laboratory of Cognitive Radio and Information
Processing
Guilin, China
e-mail:[email protected]
Abstract—In the sound environment of instability noise, the
recognition rate of sound events would be reduced if using a
single sound feature to train public sound data. To solve this
problem, we propose a new method of sound event recognition
based in feature combination with low SNR in this paper. It
improves the recognition of sound events in real scenes by
adding noise to public sound data sets. On the other hand, in
order to overcome the inaccuracy of single features in noise
recognition and significantly improved the recognition rate, we
propose a new combination method which combining multiple
features and drawing on the advantages of image recognition.
Actual test shows that the public sound data set with noise has
good generalization ability to the actual scene. The designed
combined features can further improve the classification
recognition performance, and the recognition rate of real
environmental acoustic events can reach 75%, meeting the
requirements of the actual system. of the selection and the related aspects. At this stage, the
sound features we use are not specifically characterized by
sound event recognition and are basically the features
proposed by sound recognition. Therefore, existing features
need to be improved to some extent. On the other hand,
Traditional classification methods use SVM, HMM, and
nowadays, many scholars also use various machine learning
algorithms for classification. The recognition rate of the
difference between a part of the machine learning
classification and the actual scene sound is not greatly
affected, but the recognition sound similar to the actual
environmental sound is even reduced by about 20% under
the influence of the environmental sound. In addition to
excluding the effects of training sets and network
construction, the features used are also factors that affect the
recognition rate. Different single feature sounds have
different recognition rates in different environments, so the
combination feature is one of the ways to solve this problem.
Keywords—sound recognition; ambient acoustic event;
combined feature; Convolutional neural network
In this paper, we propose a new method named sound
event recognition based in feature combination with low
SNR to solve the problem of the influence of ambient sound
on event recognition in the above actual scene. First, the
public sound data set is subjected to noise-adding processing
according to the signal-to-noise ratio, so that the test data set
more closely matches the trained data. In practical
application scenarios, the model recognition rate after noise
addition is higher than that of raw model recognition rate.
Owing to the influence of noise and quantity, the noise-
added sound data set can improve the generalization ability
of the network. In terms of feature processing, we select
representative feature short-term energy, short-time zero-
crossing rate, MFCC, Log-Mel spectrogram from time
domain and frequency domain, and combine them into a
new feature instead of using a single feature. Feature
combination method draws on the advantages of
convolutional neural network and the advantages of image
recognition. Extract two sets of 2D features from one sound
data, and the dimensions and frames of the two sets of
features are the same. Then, two sets of two-dimensional
feature arrays are superimposed from the channel to obtain a
I. I NTRODUCTION
Low signal-to-noise ratio (SNR) sound events
recognition is to detect and recognize sound events mixed
with various noises in actual scenes. At present, the main
research directions of acoustic event recognition include:
features and improvement of sound events [1-3] ; selection or
improvement of sound events classifier [4, 5] ; detection and
classification of indoor sound scene [6] ; Sound event detection
of sound events in specific environment [7, 8] and so on.
Detection of sound events in specific environment and so on.
These studies are of great significance for sound event
recognition. However, in some real environmental acoustic
events, the recognition rate has different degrees of influence
due to noise instability. It is a major factor in the judgment of
our sound categories that there are too many types of sound
information in real sounds, and the characteristics of event
sounds may not be the most prominent [9] .
In order to improve the recognition rate of demand events
in environmental sound, we mainly improve it from two
aspects. On the one hand, it is to improve the characteristics
978-1-7281-4691-1/19/$31.00 ©2019 IEEE
DOI 10.1109/AIAM48774.2019.00029
109