International Core Journal of Engineering 2020-26 | Page 135
verification. The sound of the child playing is reduced
because of the similarity with the ambient sound. However,
under the actual scene test condition, since the test set has
environmental noise at this time, the network recognition
effect of the training after the addition of noise is obvious.
The recognition rate of engine idling is directly higher than
that of the noiseless network by 26%, and the recognition
rate of the sound in the real environment is higher than that
of the network without the noise processing.
using features, compares the network with a single feature
without noise training, and the combined features and
scoring after training are more effective. Table II describes
the comparison of the noise-added results under Multi-
dimensional features. It can be seen that the recognition rate
has increased, and the test is also performed in the actual
scene, and the effect is better. The test recognition rate of a
sample with no features added by a single feature was
increased to 76.5%.
B. Data noise recognition test under multi-feature
combination
This section, which is different from the previous section
T ABLE II M ULTI - DIMENSIONAL FEATURES COMPARED TO A SINGLE UN - NOISE ADDED NETWORK AFTER NOISE INCREASE
Sound category Single feature
pre-noise test
air conditioner
car horn
children playing
dog bark
drilling
engine idling
gun shot
jackhammer
police siren
street music 0.52
0.71
0.70
0.77
0.63
0.62
0.79
0.60
0.70
0.72
Multi-dimensional
feature pre-noise
test
0.55
0.77
0.66
0.81
0.67
0.67
0.83
0.62
0.77
0.71
Real recording
Real-time recording
before single
after Multi-dimensional
feature noise
feature noise
0.44
0.58
0.61
0.76
0.60
0.77
0.65
0.76
0.51
0.66
0.32
0.67
0.74
0.81
0.55
0.63
0.68
0.78
0.58
0.73
Subband Filter for Acoustic Event Detection in Noisy Environments
Using Wavelet Packets [J]. IEEE/ACM Transactions on Audio
Speech & Language Processing, 2015, 23(2): 361-372
[2] Grzeszick R., Plinge A., Fink G. A. Bag-of-Features Methods for
Acoustic Event Detection and Classification [J]. IEEE/ACM
Transactions on Audio Speech & Language Processing, 2017, 25(6):
1242-1252
[3] Mcloughlin I. V., Zhang H., Xie Z., Yan S., Wei X. Robust Sound
Event Classification Using Deep Neural Networks[J]. IEEE/ACM
Transactions on Audio Speech & Language Processing, 2015, 23(3):
540-552
[4] Bear H. L., Heittola T., Mesaros A., Benetos E., Virtanen T. City
classification from multiple real-world sound scenes[J]. arXiv
preprint arXiv:190500979, 2019:
[5] Fonseca E., Plakal M., Ellis D. P., Font F., Favory X., Serra X.
Learning sound event classifiers from web audio with noisy labels[C].
ICASSP 2019-2019 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). 2019: 21-25
[6] Wang W., Seraj F., Meratnia N., Havinga P. J. Privacy-aware
environmental sound classification for indoor human activity
recognition[C]. Proceedings of the 12th ACM International
Conference on PErvasive Technologies Related to Assistive
Environments. 2019: 36-44
[7] Ramezanizadeh M., Faramarzi S. Experimental Investigation of Dust
Influences on the Airplanes Sound Pressure Emission[J]. Modares
Mechanical Engineering, 2019, 19(4): 815-823
[8] Wang Y., Ma H., Wei S., Zhang S., Feng Z., Wei Z. Sound Detection
and Alarm System of Unmanned Aerial Vehicle[A]. Recent
Developments in Intelligent Computing, Communication and
Devices[C]. Springer, 2019.885-898
[9] Salomons E. L., van Leeuwen H., Havinga P. J. Impact of multiple
sound types on environmental sound classification[C]. 2016 IEEE
SENSORS: 1-3
[10] Hannun A., Case C., Casper J., Catanzaro B., Diamos G., Elsen E.,
Prenger R., Satheesh S., Sengupta S., Coates A. Deep speech: Scaling
up end-to-end speech recognition[J]. arXiv preprint arXiv:14125567,
2014:
[11] Salamon J., Bello J. P. Deep convolutional neural networks and data
augmentation for environmental sound classification[J]. IEEE Signal
Processing Letters, 2017, 24(3): 279-283
[12] Gemmeke J. F., Ellis D. P., Freedman D., Jansen A., Lawrence W.,
Moore R. C., Plakal M., Ritter M. Audio set: An ontology and
human-labeled dataset for audio events[C]. 2017 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP).
Comparing the case of a single feature in the case of no
noise in the training set, it can be seen that the recognition
rate is also significantly increased only after the training test
with the training set, in which the air_conditioner,
enginge_idling, street_music effect is extremely significant.
The common feature of these kinds of sounds is that the
actual scene sound is not prominent enough, even some data
samples themselves are similar to the ambient sound, the
existence of environmental sound has a greater impact on it,
but because we increase the generalization ability of their
network after noise increase, and the more accurate
description of the combined features makes the recognition
rate increase.
V. C ONCLUSIONS
In this paper, we discussed a challenging and critical
problem in the sound event classification, sound event
recognition based in feature combination with low SNR can
achieve good performance under the condition with
instability of noise. At the same time, the combination of
multiple features also has different effects on different sound
situations, and some have the effect of improving the
recognition rate. Although some effects are not obvious, the
overall effect is improved. The next step is to further test the
combined features, to get a higher recognition rate.
A CKNOWLEDGMENTS
This research work was supported by the National
Natural Science Foundation of China (No.61961010), the
Guangxi Key Research and Development (R&D) Program
(Guike AB17292058), the Guangxi key Laboratory Fund of
Embedded Technology and Intelligent System under Grant
No. 2018B-1, the Key Laboratory Found of Cognitive Radio
and Information Processing, Ministry of Education (Guilin
University of Electronic Technology) under Grant No.
CRKL180203.
R EFERENCES
[1]
Feng Z. R., Zhou Q., Zhang J., Ping J., Yang X. W. A Target Guided
113