International Core Journal of Engineering 2020-26 | Page 110

diversity of the weights and modifying the input space of the classifiers inside the ensemble. Second, LB adds randomization at the output of the ensemble using output codes. In standard ensemble methods, all classifiers try to predict the same function. However, using output codes each classifier will predict a different function. This may reduce the effects of correlations between the classifiers, and increase diversity of the ensemble. the AQI indicators, the main pollutants involved in air quality assessment are PM2.5, PM10, SO2, NO2, O3, and CO. According to AQI, the urban air quality can be divided into 6 grades which is shown Table I. TABLE I. AQ G RADE AQ Index 0~50 51~100 101~150 151~200 201~300 >300 AQ Grade 0 1 2 3 4 5 AQ Description Excellent Good Mild pollution Moderate pollution Severe pollution Serious pollution Color Green Yellow Orange Red Purple Maroon B. Forecasting Process The downscaling air quality grade forecasting includes two stages. First, is the model initialization stage. Second, is the online forecasting and incremental learning stage. B. Data Acquisition and Preprocessing The datasets used in experiments consists of time information, weather conditions, future air quality forecast results, and air quality at the city and monitoring sites. These data are from the public weather forecast websites PM2.5.in and PM.kksk.org. All data is automatically obtained by a web crawler. The main attributes of the datasets are shown in Table II: In the model initialization stage, the ensemble learning algorithm will use the training samples to generate forecasting model. A training instance is the combination of the current urban air quality data with the measured values of the next hour urban air quality grade. The ensemble learning algorithm LB is used to train an initial forecasting model using the training datasets. Then, the performance of the initial forecasting model is evaluated by the test-then-train method and the results are also reported. Finally the model will be used as the initial forecasting model in next stage. TABLE II. A TTRIBUTES OF D ATASETS Data Types Time Weather AQI Data attributes year, month, day Temperature, precipitation, wind direction, wind speed, relative humidity, comfort, body temperature AQI value, AQ grade, Major pollutant, PM2.5, PM10, CO, NO2, O3_1, O3_8, So2 In the online forecasting and incremental learning stage, the web crawler keeps collecting data from the website and form the training instances. At T 0 , web crawler gets the urban air quality data at T 0 and the initial forecasting model will use the data to predict the urban air quality grade of T 1 . At T 1 , web crawler gets the urban real air quality grade and the urban air quality data of T 1 . And the urban air quality data at T 0 and the urban real air quality grade of T 1 are combined to generate a training instance. And the forecasting model will incrementally learn this training instance. Then, the updated forecasting model will predict the air quality grade of T 2 based on the urban air quality data of T 1 . By analogy, the online prediction task and the incremental learning process of the model are completed. Algorithm 1 shows the process of the online forecasting and incremental learning algorithm. The preprocessing of the dataset mainly includes the following work: First is filling the missing items in the datasets. Then, for the anomaly data in the datasets, we used the adjacent normal data for overlay padding. Finally, the form of the datasets is transformed to make sure that the learning algorithm can address them. A. LeveragingBagging Ensemble Learning Algorithm Online ensemble learning is an incremental learning method, which addresses the arriving instances one by one and gradually evolves the learning model. Oza and Russell [8] modified the traditional bagging to online learning condition and proposed Online Bagging (OB). OB contains an ensemble E that has M base model h m (m=1, …, M). When a new instance arrives, the times that each base model trained for the instances are determined by Poisson (1). Therefore, the diversity of the base models is generated by different training times. At last, all the base models make joint prediction by voting. Algorithm1: OnlineForecastingAndIncrementalLearning Input: 1. Model: the initial forecasting model 2. Stream: the air quality dataset 3. Resfile: output file of prediction result Output: Model: the updated forecasting model Process: 1. while (Stream.hasNext()) do 2. testInst = Stream. nextInstance() 3. if testInst.classIsMissing() then 4. prediction = Model.getPrediction(testInst) 5. Resfile.println(prediction) 6. else 7. Model.TrainOnInstance(testInst) 8. end if 9. end while 10. Return Model Bifet et al. [7] improved OB and proposed online Leveraging Bagging (LB) algorithm. They leveraged the performance of bagging with two randomizations improvements: increasing resampling and using output detection codes. First, LB increases the weights of this resampling using a larger value λ to compute the value of the Poisson distribution. Using a value λ>1 will increase the When the air quality dataset still has instances, the model will keeps get instances from the dataset. If an instance has the urban air quality grade, the model will incrementally train on it. Otherwise, the model will make prediction on it and the results will be stored into a file to make analysis. Once all the instances in the dataset are addressed, the updated model will be returned. III. A IR Q UALITY GRADE F ORECAST ING B ASED ON E NSEMBLE L EARNING 88