International Core Journal of Engineering 2020-26 | Page 133

of neural networks. (a) (a) (b) Figure.3 Dog bark waveform before and after noise 2) MFCC Compared with the Mel spectrogram, the MFCC feature is the same as the previous work, and the last obtained Mel spectrum is logarithmically operated. The formula is: B. Feature selection and combination 1) Log-Mel spectrogram The calculation process of Log-Mel spectrogram is shown in Figure 4. The key part of the short-time sound frame Fourier transform is usually a large-scale map covering various information. In order to obtain sound features of appropriate size, It is converted to the Mel spectrum by a Mel-scale filter bank [11] . The Meyer filter bank simulates the human ear to process the audio, wherein the Mel frequency has a nonlinear relationship with the Hz frequency, which can be approximated by the following formula: Mel(f) (b) Figure.5 Log-Mel spectrogram before and after noise log(Mel  Spectrum) (3) logX[k] And proceed to the spectrum analysis: Take the logarithm: logH[k]  logE[k] (4) logX[K] Perform an inverse transformation: 2595 * lg(1  f/700) (2) h[k]  e[k] (5) x[k] The sensitivity of the human ear to the acoustic sound is less and less sensitive as the loudness increases, and it is not felt until the loudness is greatly increased. This auditory characteristic of the human ear to the acoustic sound is called a "logarithmic" characteristic. Therefore, based on the "logarithmic" property present in the simulated human ear, the logarithmic operation is performed on the Mel spectrogram [12] . FRQWLQXRXV VSHHFK 3UH DJJUDYDWLRQ )UDPLQJ The cepstrum coefficient h[k] obtained above the Mel spectrum is called the Mel Frequency Cepstral Coefficient. Among them, the inverse transform is generally realized by DCT (Discrete Cosine Transform). 3) Short-term energy and short-time zero-crossing rate The short-term average energy is the average energy of a frame. The short-term energy formula of the sound signal in frame I is as follows: :LQGRZLQJ E(i) L  1 ¦ y 2 i (n),1 d i d fn (6) n 0 ))7 0HOILOWHU EDQNV /RJDULWKPLF RSHUDWLRQ ORJ PHOVSHFWURJUDP The short-time average zero-crossing rate is the average zero-crossing rate within a frame event. The formula is: Figure.4 Log-Mel spectrogram extraction process L  1 Taking the training dog bark as an example, the spectrum of the sound before and after the noise is analyzed, as shown in Fig. 5, (a) is the Log-Mel spectrogram of the dog bark before the noise is added, and (b) the dog after the noise is added. The Log-Mel spectrogram of the humming sound compares the two spectra. (a) The characteristic contour is obvious at the low frequency and the high frequency part is slightly blurred. The classification effect is ideal as a feature, but the real scene dog bark (b), it can be seen that the noisy spectrum in the middle and low frequency bands is not enough to accurately classify the event sound. Therefore, it is necessary to perform noise processing on the public data set used in the project. Increased generalization ability and noise immunity of late CNN. Z(i) = 1 2 ¦ | sgn[y i (n)]  sgn[y i (n  1)] |,1 d i d fn (7) n 0 Where sgn[x] is a symbolic function, namely: sgn[x] ­ 1, x t 0 (8) ® ¯  1, x  0 4) Feature combination The individual features required herein are as described above, and the combined feature LMCE consists of two sets of two-dimensional arrays. The characteristics of the first group of two-dimensional data are: Log-Mel spec, short- time energy, in which Log-Mel spec extracts the features of 13-dimensional 200 frames, and performs first-order and second-order differences to obtain 39-dimensional 200- 111