International Core Journal of Engineering 2020-26 | Page 111
C. Downscaling Forecasting
By change the scales of the urban air quality grade in the
training dataset during the initial model training stage, the
model can achieve the downscaling prediction of the urban air
quality grade in the time domain and space domain. If the
downscaling is in the time domain, the air quality grade can
be released from daily to hourly level. When the city scale is
changed from urban scale to station scale, the air quality grade
forecasting will get a regional downscaling.
IV. E XPERIMENTS
A. Settings and Datasets
In the comparison experiments, the ensemble learning
algorithm Leveraging Bagging is applied. The base classifier
of the ensemble is the Hoeffding Tree with default parameters.
The ensemble classifier consists of 10 base classifiers. All the
experiments are carried out based on the Massive Online
Analysis (MOA) [9], which is designed for online data stream
learning. And all the experiments are carried out on the
machine with i7-6700HQ 2.60GHz, 8 GB RAM and Windows
10.
As for the datasets, the urban air quality data in Beijing
and Changsha are used. The training datasets contains 3165
hours from 20:00 on November 1, 2018, to 21:00 on March
27, 2019. The predicting dataset contains 136 hours from
22:00 on March 27, 2019 to 14:00 on April 2, 2019. Training
dataset of Beijing have 3029 instances and the training dataset
of Changsha has 3030 instances. Both the predicting dataset
of Beijing and Changsha have 136 instances.
B. Comparison of Initial Forecasting Model
In this section, four representative online learning methods
are compared, which include LB, OB, HoeffdingTree (HT),
and Naïve Bayes (NB) algorithm. Both the OB and LB use 10
HT as base models. All the base classifiers are initialized by
50 instances. All the comparison experiments are carried out
10 times and calculate the average results. At the same time,
using two different single classifier algorithms HT and NB to
verify that of the ensemble algorithms have a greater
advantage than the single model algorithms. By comparing the
prediction results of the two base classifier algorithms on the
same dataset, the optimal base classifier algorithm is
discriminated. As the algorithms learns the datasets in a test-
then-train manner, the accuracy of the algorithms can be
reported.
Fig. 1. accuracy curves of Beijing and Changsha
Second, the compared algorithms are conducted on the
datasets of the station scale, Beijing Olympic Center and
Changsha Southern Train Station which is shown in Figure 2.
In sum, LB still achieves a better performance than other
methods.
In the comparison of initial forecasting model on city scale,
all the algorithms obtain good forecasting performance. In the
training data set of Changsha City, the average prediction
accuracy of the four algorithms reach more than 80%, and the
trend accuracy is continuously improved with the increase of
training instances. The LB algorithm performs best, and its
prediction accuracy for the last set of instances in the training
set reaches 86%. The result is higher than 85% and meets the
air quality model prediction accuracy requirements. In the
comparison on the station scale, the forecast accuracy of all
algorithms is generally lower than the forecast accuracy of the
city scale. From the trend of the prediction accuracy of the
training model, the accuracy increases with the increase of the
instance, so when the training data set continues to expand, the
initial training model with better performance may be
obtained.
First, the compared algorithms are conducted on the
datasets of the city scale, Beijing and Changsha. Figure 1
shows the accuracy curves of Beijing and Changsha. The
accuracy of LB algorithm is much higher than the other three
algorithms. In the comparison of the single classifier
algorithm, the effect of the HT is better than the NB, which
shows that the LB using the HT as base classifier is optimal.
At the same time, the accuracy of the training model has
reached a high level when the learning procedure ends. The
accuracy of the models in both cities is close to 90%. This
shows that the training model can reach the stage of online
prediction.
89