International Core Journal of Engineering 2020-26 | Page 111

C. Downscaling Forecasting By change the scales of the urban air quality grade in the training dataset during the initial model training stage, the model can achieve the downscaling prediction of the urban air quality grade in the time domain and space domain. If the downscaling is in the time domain, the air quality grade can be released from daily to hourly level. When the city scale is changed from urban scale to station scale, the air quality grade forecasting will get a regional downscaling. IV. E XPERIMENTS A. Settings and Datasets In the comparison experiments, the ensemble learning algorithm Leveraging Bagging is applied. The base classifier of the ensemble is the Hoeffding Tree with default parameters. The ensemble classifier consists of 10 base classifiers. All the experiments are carried out based on the Massive Online Analysis (MOA) [9], which is designed for online data stream learning. And all the experiments are carried out on the machine with i7-6700HQ 2.60GHz, 8 GB RAM and Windows 10. As for the datasets, the urban air quality data in Beijing and Changsha are used. The training datasets contains 3165 hours from 20:00 on November 1, 2018, to 21:00 on March 27, 2019. The predicting dataset contains 136 hours from 22:00 on March 27, 2019 to 14:00 on April 2, 2019. Training dataset of Beijing have 3029 instances and the training dataset of Changsha has 3030 instances. Both the predicting dataset of Beijing and Changsha have 136 instances. B. Comparison of Initial Forecasting Model In this section, four representative online learning methods are compared, which include LB, OB, HoeffdingTree (HT), and Naïve Bayes (NB) algorithm. Both the OB and LB use 10 HT as base models. All the base classifiers are initialized by 50 instances. All the comparison experiments are carried out 10 times and calculate the average results. At the same time, using two different single classifier algorithms HT and NB to verify that of the ensemble algorithms have a greater advantage than the single model algorithms. By comparing the prediction results of the two base classifier algorithms on the same dataset, the optimal base classifier algorithm is discriminated. As the algorithms learns the datasets in a test- then-train manner, the accuracy of the algorithms can be reported. Fig. 1. accuracy curves of Beijing and Changsha Second, the compared algorithms are conducted on the datasets of the station scale, Beijing Olympic Center and Changsha Southern Train Station which is shown in Figure 2. In sum, LB still achieves a better performance than other methods. In the comparison of initial forecasting model on city scale, all the algorithms obtain good forecasting performance. In the training data set of Changsha City, the average prediction accuracy of the four algorithms reach more than 80%, and the trend accuracy is continuously improved with the increase of training instances. The LB algorithm performs best, and its prediction accuracy for the last set of instances in the training set reaches 86%. The result is higher than 85% and meets the air quality model prediction accuracy requirements. In the comparison on the station scale, the forecast accuracy of all algorithms is generally lower than the forecast accuracy of the city scale. From the trend of the prediction accuracy of the training model, the accuracy increases with the increase of the instance, so when the training data set continues to expand, the initial training model with better performance may be obtained. First, the compared algorithms are conducted on the datasets of the city scale, Beijing and Changsha. Figure 1 shows the accuracy curves of Beijing and Changsha. The accuracy of LB algorithm is much higher than the other three algorithms. In the comparison of the single classifier algorithm, the effect of the HT is better than the NB, which shows that the LB using the HT as base classifier is optimal. At the same time, the accuracy of the training model has reached a high level when the learning procedure ends. The accuracy of the models in both cities is close to 90%. This shows that the training model can reach the stage of online prediction. 89