International Core Journal of Engineering 2020-26 | Page 133
of neural networks.
(a)
(a)
(b)
Figure.3 Dog bark waveform before and after noise
2) MFCC
Compared with the Mel spectrogram, the MFCC feature
is the same as the previous work, and the last obtained Mel
spectrum is logarithmically operated. The formula is:
B. Feature selection and combination
1) Log-Mel spectrogram
The calculation process of Log-Mel spectrogram is
shown in Figure 4. The key part of the short-time sound
frame Fourier transform is usually a large-scale map
covering various information. In order to obtain sound
features of appropriate size, It is converted to the Mel
spectrum by a Mel-scale filter bank [11] . The Meyer filter
bank simulates the human ear to process the audio, wherein
the Mel frequency has a nonlinear relationship with the Hz
frequency, which can be approximated by the following
formula:
Mel(f)
(b)
Figure.5 Log-Mel spectrogram before and after noise
log(Mel Spectrum) (3)
logX[k]
And proceed to the spectrum analysis:
Take the logarithm:
logH[k] logE[k] (4)
logX[K]
Perform an inverse transformation:
2595 * lg(1 f/700) (2)
h[k] e[k] (5)
x[k]
The sensitivity of the human ear to the acoustic sound is
less and less sensitive as the loudness increases, and it is not
felt until the loudness is greatly increased. This auditory
characteristic of the human ear to the acoustic sound is
called a "logarithmic" characteristic. Therefore, based on the
"logarithmic" property present in the simulated human ear,
the logarithmic operation is performed on the Mel
spectrogram [12] .
FRQWLQXRXV
VSHHFK
3UH
DJJUDYDWLRQ
)UDPLQJ
The cepstrum coefficient h[k] obtained above the Mel
spectrum is called the Mel Frequency Cepstral Coefficient.
Among them, the inverse transform is generally realized by
DCT (Discrete Cosine Transform).
3) Short-term energy and short-time zero-crossing rate
The short-term average energy is the average energy of a
frame. The short-term energy formula of the sound signal in
frame I is as follows:
:LQGRZLQJ
E(i)
L 1
¦ y
2
i
(n),1 d i d fn (6)
n 0
))7
0HOILOWHU
EDQNV
/RJDULWKPLF
RSHUDWLRQ
ORJ
PHOVSHFWURJUDP
The short-time average zero-crossing rate is the average
zero-crossing rate within a frame event. The formula is:
Figure.4 Log-Mel spectrogram extraction process
L 1
Taking the training dog bark as an example, the
spectrum of the sound before and after the noise is analyzed,
as shown in Fig. 5, (a) is the Log-Mel spectrogram of the
dog bark before the noise is added, and (b) the dog after the
noise is added. The Log-Mel spectrogram of the humming
sound compares the two spectra. (a) The characteristic
contour is obvious at the low frequency and the high
frequency part is slightly blurred. The classification effect is
ideal as a feature, but the real scene dog bark (b), it can be
seen that the noisy spectrum in the middle and low
frequency bands is not enough to accurately classify the
event sound. Therefore, it is necessary to perform noise
processing on the public data set used in the project.
Increased generalization ability and noise immunity of late
CNN.
Z(i) = 1 2 ¦ | sgn[y i (n)] sgn[y i (n 1)] |,1 d i d fn (7)
n 0
Where sgn[x] is a symbolic function, namely:
sgn[x]
1, x t 0
(8)
®
¯ 1, x 0
4) Feature combination
The individual features required herein are as described
above, and the combined feature LMCE consists of two sets
of two-dimensional arrays. The characteristics of the first
group of two-dimensional data are: Log-Mel spec, short-
time energy, in which Log-Mel spec extracts the features of
13-dimensional 200 frames, and performs first-order and
second-order differences to obtain 39-dimensional 200-
111