International Core Journal of Engineering 2020-26

2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM) Improving LSTM Based Acoustic Model with Dropout Method Zhuoshu He International Department, Zhengzhou Foreign Language School Zhengzhou澿澳China [email protected] There have been some works combining the dropout and LSTM. In [12], dropout is used on simple DNN and CNN. In [13] and [14], the using of dropout is explored with RNN. And in [15], dropout is combined with LSTM to give better performance. In this work, we explored two kinds of dropout methods with LSTM. One is per-element dropout, and the other is per-frame dropout. The experiments done on THCHS- 30 [16] corpus showed that both methods can improve the performance of LSTM based acoustic model. Also, when combining these two methods, we can gain a 5% relative reduction compared to the baseline LSTM based acoustic model. Abstract—The deep neural network model applied on speech recognition has become one of the most successful application of deep learning. Speech recognition can be used for surgical record. Among all kinds of neural networks, recurrent neural network (RNN) is the best one for sequence modeling, because of its capacity of molding long term dependency. As a specific unit of RNN, long short-term memory (LSTM) has been widely used for speech recognition, especially for acoustic modeling. Also, when training a neural network, dropout is often used to prevent overfitting and improve model’s generalization capacity. This paper explores the application of dropout with LSTM based acoustic modeling. Through combining the per-element and per-frame dropout methods, the accuracy of speech recognition is finally improved by more than 5% relatively. The experiments were done on THCHS-30 corpus. Keywords—LSTM, THCHS-30 speech recognition, RNN, The paper starts by describing prior work in Section II, including LSTM and dropout. Then the implementation of dropout with LSTM will be described in Section III. Experiment setup will be presented in Section IV. The results will be discussed in Section V, followed by the conclusions in Section VI. dropout, I. I NTRODUCTION In recent years, deep learning [1] has become one of the most popular research directions in the academic circles. As an important part of artificial intelligence, automatic speech recognition (ASR) has developed a lot in the last decade, deep learning-based speech recognition has made great progress [2,3,4,5,6]. Artificial neural networks (ANN), including deep neural network (DNN), convolution neural network (CNN) and recurrent neural network (RNN) has been widely used for automatic speech recognition, especially RNN. II. P RIOR W ORK This section will give a brief introduction about LSTM and dropout. A. LSTM LSTM was firstly proposed in [8], and then developed in [9] and [10]. The equations of a standard LSTM can be Equation (1) to (6). However, the vanilla RNN is difficult to train due to the problem of vanishing gradient and exploding gradient [7]. Thus, the more complicated units with gate mechanism were proposed, like long short-term memory (LSTM) [8,9,10] and gated recurrent unit (GRU) [11]. With its gate mechanism, which is consist of input gate, output gate and forget gate, LSTM cell can control the flow of information and gradient, and thus has a good modeling of long-term dependency. In recent years, LSTM based speech recognition has become a popular and strong baseline for researchers. = ( = ( = ⊙ = ( = + + = tanh( Meanwhile, there are still some works focusing on the training of ANN, such as to speed the convergence, improve the generalization capacity and to prevent overfitting. Among these technologies, dropout is an easier and flexible method [12]. The main idea of dropout is to randomly set part of the activations of each layer to zero and to simulate to introduce noise in training, which is equal to introduce noise on the training data. Also, the complicated LSTM based acoustic model has a powerful capacity of learning, to prevent overfitting, dropout can make the model more robust and thus give better results on unseen data when testing. 978-1-7281-4691-1/19/$31.00 ©2019 IEEE DOI 10.1109/AIAM48774.2019.00012 + + + + + + ⊙ ) + + ) ) + + ) ⊙ tanh( ) Where ∗ are weight matrices, ∗ are diagonal matrices and ∗ are bias vector. For example, is the matrix of weights from cell activation vectors to input gate, is the bias vector of input gate. , and are the input gate, forget gate and are cell activation and output gate respectively. vector and output. All of , , , and have the same dimension. Besides, is the input candidate. Ŕ stands for element-wise multiplication. 27

International Core Journal of Engineering 2020-26 | Page 49