International Core Journal of Engineering 2020-26 | Page 49
2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)
Improving LSTM Based Acoustic Model with
Dropout Method
Zhuoshu He
International Department, Zhengzhou Foreign Language School
Zhengzhou澿澳China
[email protected]
There have been some works combining the dropout and
LSTM. In [12], dropout is used on simple DNN and CNN. In
[13] and [14], the using of dropout is explored with RNN. And
in [15], dropout is combined with LSTM to give better
performance. In this work, we explored two kinds of dropout
methods with LSTM. One is per-element dropout, and the
other is per-frame dropout. The experiments done on THCHS-
30 [16] corpus showed that both methods can improve the
performance of LSTM based acoustic model. Also, when
combining these two methods, we can gain a 5% relative
reduction compared to the baseline LSTM based acoustic
model.
Abstract—The deep neural network model applied on speech
recognition has become one of the most successful application of
deep learning. Speech recognition can be used for surgical
record. Among all kinds of neural networks, recurrent neural
network (RNN) is the best one for sequence modeling, because
of its capacity of molding long term dependency. As a specific
unit of RNN, long short-term memory (LSTM) has been widely
used for speech recognition, especially for acoustic modeling.
Also, when training a neural network, dropout is often used to
prevent overfitting and improve model’s generalization capacity.
This paper explores the application of dropout with LSTM
based acoustic modeling. Through combining the per-element
and per-frame dropout methods, the accuracy of speech
recognition is finally improved by more than 5% relatively. The
experiments were done on THCHS-30 corpus.
Keywords—LSTM,
THCHS-30
speech
recognition,
RNN,
The paper starts by describing prior work in Section II,
including LSTM and dropout. Then the implementation of
dropout with LSTM will be described in Section III.
Experiment setup will be presented in Section IV. The results
will be discussed in Section V, followed by the conclusions in
Section VI.
dropout,
I. I NTRODUCTION
In recent years, deep learning [1] has become one of the
most popular research directions in the academic circles. As
an important part of artificial intelligence, automatic speech
recognition (ASR) has developed a lot in the last decade, deep
learning-based speech recognition has made great progress
[2,3,4,5,6]. Artificial neural networks (ANN), including deep
neural network (DNN), convolution neural network (CNN)
and recurrent neural network (RNN) has been widely used for
automatic speech recognition, especially RNN.
II. P RIOR W ORK
This section will give a brief introduction about LSTM and
dropout.
A. LSTM
LSTM was firstly proposed in [8], and then developed in
[9] and [10]. The equations of a standard LSTM can be
Equation (1) to (6).
However, the vanilla RNN is difficult to train due to the
problem of vanishing gradient and exploding gradient [7].
Thus, the more complicated units with gate mechanism were
proposed, like long short-term memory (LSTM) [8,9,10] and
gated recurrent unit (GRU) [11]. With its gate mechanism,
which is consist of input gate, output gate and forget gate,
LSTM cell can control the flow of information and gradient,
and thus has a good modeling of long-term dependency. In
recent years, LSTM based speech recognition has become a
popular and strong baseline for researchers.
= (
= (
=
⊙
= (
=
+
+
= tanh(
Meanwhile, there are still some works focusing on the
training of ANN, such as to speed the convergence, improve
the generalization capacity and to prevent overfitting. Among
these technologies, dropout is an easier and flexible method
[12]. The main idea of dropout is to randomly set part of the
activations of each layer to zero and to simulate to introduce
noise in training, which is equal to introduce noise on the
training data. Also, the complicated LSTM based acoustic
model has a powerful capacity of learning, to prevent
overfitting, dropout can make the model more robust and thus
give better results on unseen data when testing.
978-1-7281-4691-1/19/$31.00 ©2019 IEEE
DOI 10.1109/AIAM48774.2019.00012
+
+
+
+
+
+
⊙
)
+
+
)
)
+
+
)
⊙ tanh( )
Where ∗ are weight matrices, ∗ are diagonal matrices
and ∗ are bias vector. For example,
is the matrix of
weights from cell activation vectors to input gate, is the bias
vector of input gate. , and are the input gate, forget gate
and
are cell activation
and output gate respectively.
vector and output. All of , , , and have the same
dimension. Besides, is the input candidate. Ŕ stands for
element-wise multiplication.
27