International Core Journal of Engineering 2020-26 | Page 42
Fig. 4. The architecture of the proposed network.
As is shown in Fig. 5, the Gradient Boosting Decision
Tree classifier has best classification performance among the
five traditional classifiers. However, CNN has the higher
accuracy than GBDT. Therefore, CNN has the better
performance than traditional machine learning methods.
Loss =w k * y ik * log p(c k )
Where N is the number of training data, y ik is the k th label
of the i th sample, and w k is the weight of the k th class.
According to the actual application scenario, the
misclassification risk of positive samples is greater than the
misclassification risk of negative samples, so we try to use
weighted softmax with loss to improve the performance. In
our experiment, our data set has two categories, so K = 2 in
equation (2), then we set the loss weight from 1.0 to 2.0 for
positive samples while 1.0 for negative samples. The results
of different loss weights are shown in Fig. 6.
IV. E XPERIMENT
We conduct experiments on English handwriting data set
from junior high school and high school students’ homework.
We compare the traditional machine learning methods with
CNN on the same data set.
A. Data set
The data set has a total about 24000 images. we divided
the samples into award-wining and non-award-winning two
categories. The two categories have about 4800 images and
19200 images respectively. Then we divide the data set into
training sets, validation sets, and test sets in a ratio of about 4:
1: 1.
B. Evaluation
We mainly compare CNN methods with conventional
machine learning methods. For traditional machine learning
methods, we extract fourteen statistical features about the
number of rows, the number of words, word space, word
width, word height, word slope (the average slope of all the
nearest vertical lines in the word), and the number of words
deviating from the baseline. Then we apply the tree-based
feature selection method. By comparing the importance of
the features, two of the less important features are removed.
Finally, we adopt Logistic Regression (LR), Random Forest
(RF), Decision Tree (DT), Support Vector Machine (SVM),
and Gradient Boosting Decision Tree (GBDT) five
classifiers to evaluate the classification results. The results of
machine learning methods and CNN method are showed as
Fig. 5.
Fig. 6. The results of different loss weights.
As we can see in Fig. 6, the model with loss weight of 1.0
is just the original model. Compared to the original model,
recall for most models using weighted softmax with loss
increases, which met our expectations. Among all the models,
the model with loss weight of 1.5 performs best, which
achieves the highest accuracy of 96.0% and F1 value of
95.6%. Therefore, we set the loss weight to 1.5 for positive
samples while 1.0 for negative samples, that is, w 0 = 1.5 and
w 1 = 1.0 in equation(2). The specific results are shown in
TABEL I.
TABLE I. T HE R ESULTS O F D IFFERENT L OSS F UNCTIONS
Performance
Recall Precision Accuracy
Softmax with loss 91.8% 96.0% 94.5%
Weighted softmax
with loss 94.0% 97.2% 96.0%
V. C ONCLUSION
To meet the requirement of automatic evaluation of
English handwriting quality in English teaching, this paper
has proposed an automatic evaluation algorithm for offline
Fig. 5. The results of machine learning methods and CNN method. The
horizontal axis is a variety of models, and the vertical axis is the
classification accuracy.
20