International Core Journal of Engineering 2020-26 | Page 135

verification. The sound of the child playing is reduced because of the similarity with the ambient sound. However, under the actual scene test condition, since the test set has environmental noise at this time, the network recognition effect of the training after the addition of noise is obvious. The recognition rate of engine idling is directly higher than that of the noiseless network by 26%, and the recognition rate of the sound in the real environment is higher than that of the network without the noise processing. using features, compares the network with a single feature without noise training, and the combined features and scoring after training are more effective. Table II describes the comparison of the noise-added results under Multi- dimensional features. It can be seen that the recognition rate has increased, and the test is also performed in the actual scene, and the effect is better. The test recognition rate of a sample with no features added by a single feature was increased to 76.5%. B. Data noise recognition test under multi-feature combination This section, which is different from the previous section T ABLE II M ULTI - DIMENSIONAL FEATURES COMPARED TO A SINGLE UN - NOISE ADDED NETWORK AFTER NOISE INCREASE Sound category Single feature pre-noise test air conditioner car horn children playing dog bark drilling engine idling gun shot jackhammer police siren street music 0.52 0.71 0.70 0.77 0.63 0.62 0.79 0.60 0.70 0.72 Multi-dimensional feature pre-noise test 0.55 0.77 0.66 0.81 0.67 0.67 0.83 0.62 0.77 0.71 Real recording Real-time recording before single after Multi-dimensional feature noise feature noise 0.44 0.58 0.61 0.76 0.60 0.77 0.65 0.76 0.51 0.66 0.32 0.67 0.74 0.81 0.55 0.63 0.68 0.78 0.58 0.73 Subband Filter for Acoustic Event Detection in Noisy Environments Using Wavelet Packets [J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2015, 23(2): 361-372 [2] Grzeszick R., Plinge A., Fink G. A. Bag-of-Features Methods for Acoustic Event Detection and Classification [J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2017, 25(6): 1242-1252 [3] Mcloughlin I. V., Zhang H., Xie Z., Yan S., Wei X. Robust Sound Event Classification Using Deep Neural Networks[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2015, 23(3): 540-552 [4] Bear H. L., Heittola T., Mesaros A., Benetos E., Virtanen T. City classification from multiple real-world sound scenes[J]. arXiv preprint arXiv:190500979, 2019: [5] Fonseca E., Plakal M., Ellis D. P., Font F., Favory X., Serra X. Learning sound event classifiers from web audio with noisy labels[C]. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019: 21-25 [6] Wang W., Seraj F., Meratnia N., Havinga P. J. Privacy-aware environmental sound classification for indoor human activity recognition[C]. Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments. 2019: 36-44 [7] Ramezanizadeh M., Faramarzi S. Experimental Investigation of Dust Influences on the Airplanes Sound Pressure Emission[J]. Modares Mechanical Engineering, 2019, 19(4): 815-823 [8] Wang Y., Ma H., Wei S., Zhang S., Feng Z., Wei Z. Sound Detection and Alarm System of Unmanned Aerial Vehicle[A]. Recent Developments in Intelligent Computing, Communication and Devices[C]. Springer, 2019.885-898 [9] Salomons E. L., van Leeuwen H., Havinga P. J. Impact of multiple sound types on environmental sound classification[C]. 2016 IEEE SENSORS: 1-3 [10] Hannun A., Case C., Casper J., Catanzaro B., Diamos G., Elsen E., Prenger R., Satheesh S., Sengupta S., Coates A. Deep speech: Scaling up end-to-end speech recognition[J]. arXiv preprint arXiv:14125567, 2014: [11] Salamon J., Bello J. P. Deep convolutional neural networks and data augmentation for environmental sound classification[J]. IEEE Signal Processing Letters, 2017, 24(3): 279-283 [12] Gemmeke J. F., Ellis D. P., Freedman D., Jansen A., Lawrence W., Moore R. C., Plakal M., Ritter M. Audio set: An ontology and human-labeled dataset for audio events[C]. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Comparing the case of a single feature in the case of no noise in the training set, it can be seen that the recognition rate is also significantly increased only after the training test with the training set, in which the air_conditioner, enginge_idling, street_music effect is extremely significant. The common feature of these kinds of sounds is that the actual scene sound is not prominent enough, even some data samples themselves are similar to the ambient sound, the existence of environmental sound has a greater impact on it, but because we increase the generalization ability of their network after noise increase, and the more accurate description of the combined features makes the recognition rate increase. V. C ONCLUSIONS In this paper, we discussed a challenging and critical problem in the sound event classification, sound event recognition based in feature combination with low SNR can achieve good performance under the condition with instability of noise. At the same time, the combination of multiple features also has different effects on different sound situations, and some have the effect of improving the recognition rate. Although some effects are not obvious, the overall effect is improved. The next step is to further test the combined features, to get a higher recognition rate. A CKNOWLEDGMENTS This research work was supported by the National Natural Science Foundation of China (No.61961010), the Guangxi Key Research and Development (R&D) Program (Guike AB17292058), the Guangxi key Laboratory Fund of Embedded Technology and Intelligent System under Grant No. 2018B-1, the Key Laboratory Found of Cognitive Radio and Information Processing, Ministry of Education (Guilin University of Electronic Technology) under Grant No. CRKL180203. R EFERENCES [1] Feng Z. R., Zhou Q., Zhang J., Ping J., Yang X. W. A Target Guided 113