International Core Journal of Engineering 2020-26 | Page 52

CER of 22.02% and a PER of 8.91%. For the per-element dropout, it achieves a CER of 21.77% and a PER of 8.17%, with relative reductions of 1% and 8% respectively. [7] [8] When using the per-frame dropout, the performance is better than that of per-element dropout, whose CER and PER are 20.85% and 7.92% respectively, with relative reductions of 5% and 11% respectively, compared with the baseline LSTMP. [9] [10] When combining the both dropout methods, we got the best performance, with a CER of 20.74% and a PER of 7.76%. Compared with the baseline LSTMP, it has a 5.8% reduction of CER and a 9.8% reduction of PER. [11] [12] The experiments above showed that the dropout methods can improve the neural networks based acoustic model’s performance. With its application to LSTMP, we got the best performance in the THCHS-30 task. [13] VI. C ONCLUSIONS In this work, a baseline LSTMP acoustic model is built for Chinese continuous speech recognition. Two kinds of dropout methods are explored, i.e. the per-element dropout and per- frame dropout. The experiments on THCHS-30 corpus showed that both dropout methods can improve the performance of LSTMP based acoustic model. Also, when combining these two methods, the best performance can be got. [14] [15] [16] [17] [18] In the future, firstly, we may try to do some experiments on English corpus and larger corpus. Also, the dropout methods can be used here to improve the performance of bidirectional LSTM (BLSTM) [24, 25]. [19] [20] R EFERENCES [1] [2] [3] [4] [5] [6] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436. Dahl, George E., et al. "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition." IEEE Transactions on audio, speech, and language processing 20.1 (2012): 30-42. Sak, Haşim, Andrew Senior, and Françoise Beaufays. "Long short-term memory recurrent neural network architectures for large scale acoustic modeling." Fifteenth annual conference of the international speech communication association. 2014. Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. "Speech recognition with deep recurrent neural networks." Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013. Xiong, Wayne, et al. "The Microsoft 2016 conversational speech recognition system. "Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017. Saon, George, et al. "English Conversational Telephone Speech Recognition by Humans and Machines." Eighteenth annual conference of the international speech communication association .2017. [21] [22] [23] [24] [25] 30 Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, Mar 1994. Sepp Hochreiter and Jurgen Schmidhuber, ”Long short-term memory, ” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. Felix A Gers, Jurgen Schmidhuber, and Fred Cummins, ”Learning to forget: Continual prediction with lstm,” Neural Computation, vol. 12, no. 10, pp. 2451–2471, 2000. Felix A Gers, Nicol N Schraudolph, and Jurgen Schmidhuber, ”Learning precise timing with lstm recurrent networks,” Journal of machine learning research, vol. 3, no. Aug, pp. 115–143, 2002. Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." NIPS 2014 Workshop on Deep Learning, December 2014. 2014. Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958. Gal, Yarin, and Zoubin Ghahramani. "A theoretically grounded application of dropout in recurrent neural networks." Advances in neural information processing systems. 2016. Pham, Vu, et al. "Dropout improves recurrent neural networks for handwriting recognition." Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE, 2014. Cheng, Gaofeng, et al. "An exploration of dropout with LSTMs." Proc. Interspeech. 2017. Wang, Dong, and Xuewei Zhang. "Thchs-30: A free chinese speech corpus." arXiv preprint arXiv:1512.01882 (2015). Povey, Daniel, et al. "The Kaldi speech recognition toolkit." IEEE 2011 workshop on automatic speech recognition and understanding. No. EPFL-CONF-192584. IEEE Signal Processing Society, 2011. Ko, Tom, et al. "Audio augmentation for speech recognition." Sixteenth Annual Conference of the International Speech Communication Association. 2015. Saon, George, et al. "Speaker adaptation of neural network acoustic models using i-vectors." ASRU. 2013. Young, Steve J., Julian J. Odell, and Philip C. Woodland. "Tree-based state tying for high accuracy acoustic modelling." Proceedings of the workshop on Human Language Technology. Association for Computational Linguistics, 1994. Huang, Xuedong D., Yasuo Ariki, and Mervyn A. Jack. "Hidden Markov models for speech recognition." (1990): 60-80. Povey, Daniel, Xiaohui Zhang, and Sanjeev Khudanpur. "Parallel training of DNNs with natural gradient and parameter averaging." arXiv preprint arXiv:1410.7455 (2014). Chen, Kai, and Qiang Huo. "Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24.7 (2016): 1185-1193. Graves, Alex, and Jürgen Schmidhuber. "Framewise phoneme classification with bidirectional LSTM and other neural network architectures." Neural Networks 18.5-6 (2005): 602-610. Graves, Alex, Navdeep Jaitly, and Abdel-rahman Mohamed. "Hybrid speech recognition with deep bidirectional LSTM." Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013.