International Core Journal of Engineering 2020-26

Fig. 4. The architecture of the proposed network. As is shown in Fig. 5, the Gradient Boosting Decision Tree classifier has best classification performance among the five traditional classifiers. However, CNN has the higher accuracy than GBDT. Therefore, CNN has the better performance than traditional machine learning methods. Loss =w k * y ik * log p(c k ) Where N is the number of training data, y ik is the k th label of the i th sample, and w k is the weight of the k th class. According to the actual application scenario, the misclassification risk of positive samples is greater than the misclassification risk of negative samples, so we try to use weighted softmax with loss to improve the performance. In our experiment, our data set has two categories, so K = 2 in equation (2), then we set the loss weight from 1.0 to 2.0 for positive samples while 1.0 for negative samples. The results of different loss weights are shown in Fig. 6. IV. E XPERIMENT We conduct experiments on English handwriting data set from junior high school and high school students’ homework. We compare the traditional machine learning methods with CNN on the same data set. A. Data set The data set has a total about 24000 images. we divided the samples into award-wining and non-award-winning two categories. The two categories have about 4800 images and 19200 images respectively. Then we divide the data set into training sets, validation sets, and test sets in a ratio of about 4: 1: 1. B. Evaluation We mainly compare CNN methods with conventional machine learning methods. For traditional machine learning methods, we extract fourteen statistical features about the number of rows, the number of words, word space, word width, word height, word slope (the average slope of all the nearest vertical lines in the word), and the number of words deviating from the baseline. Then we apply the tree-based feature selection method. By comparing the importance of the features, two of the less important features are removed. Finally, we adopt Logistic Regression (LR), Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), and Gradient Boosting Decision Tree (GBDT) five classifiers to evaluate the classification results. The results of machine learning methods and CNN method are showed as Fig. 5. Fig. 6. The results of different loss weights. As we can see in Fig. 6, the model with loss weight of 1.0 is just the original model. Compared to the original model, recall for most models using weighted softmax with loss increases, which met our expectations. Among all the models, the model with loss weight of 1.5 performs best, which achieves the highest accuracy of 96.0% and F1 value of 95.6%. Therefore, we set the loss weight to 1.5 for positive samples while 1.0 for negative samples, that is, w 0 = 1.5 and w 1 = 1.0 in equation(2). The specific results are shown in TABEL I. TABLE I. T HE R ESULTS O F D IFFERENT L OSS F UNCTIONS Performance Recall Precision Accuracy Softmax with loss 91.8% 96.0% 94.5% Weighted softmax with loss 94.0% 97.2% 96.0% V. C ONCLUSION To meet the requirement of automatic evaluation of English handwriting quality in English teaching, this paper has proposed an automatic evaluation algorithm for offline Fig. 5. The results of machine learning methods and CNN method. The horizontal axis is a variety of models, and the vertical axis is the classification accuracy. 20

International Core Journal of Engineering 2020-26 | Page 42