International Core Journal of Engineering 2020-26 | Page 110
diversity of the weights and modifying the input space of the
classifiers inside the ensemble. Second, LB adds
randomization at the output of the ensemble using output
codes. In standard ensemble methods, all classifiers try to
predict the same function. However, using output codes each
classifier will predict a different function. This may reduce the
effects of correlations between the classifiers, and increase
diversity of the ensemble.
the AQI indicators, the main pollutants involved in air quality
assessment are PM2.5, PM10, SO2, NO2, O3, and CO.
According to AQI, the urban air quality can be divided into 6
grades which is shown Table I.
TABLE I. AQ G RADE
AQ Index
0~50
51~100
101~150
151~200
201~300
>300
AQ Grade
0
1
2
3
4
5
AQ Description
Excellent
Good
Mild pollution
Moderate pollution
Severe pollution
Serious pollution
Color
Green
Yellow
Orange
Red
Purple
Maroon
B. Forecasting Process
The downscaling air quality grade forecasting includes
two stages. First, is the model initialization stage. Second, is
the online forecasting and incremental learning stage.
B. Data Acquisition and Preprocessing
The datasets used in experiments consists of time
information, weather conditions, future air quality forecast
results, and air quality at the city and monitoring sites. These
data are from the public weather forecast websites PM2.5.in
and PM.kksk.org. All data is automatically obtained by a web
crawler. The main attributes of the datasets are shown in Table
II:
In the model initialization stage, the ensemble learning
algorithm will use the training samples to generate forecasting
model. A training instance is the combination of the current
urban air quality data with the measured values of the next
hour urban air quality grade. The ensemble learning algorithm
LB is used to train an initial forecasting model using the
training datasets. Then, the performance of the initial
forecasting model is evaluated by the test-then-train method
and the results are also reported. Finally the model will be used
as the initial forecasting model in next stage.
TABLE II. A TTRIBUTES OF D ATASETS
Data Types
Time
Weather
AQI
Data attributes
year, month, day
Temperature, precipitation,
wind direction,
wind speed,
relative humidity,
comfort,
body temperature
AQI value,
AQ grade,
Major pollutant,
PM2.5, PM10, CO,
NO2, O3_1, O3_8, So2
In the online forecasting and incremental learning stage,
the web crawler keeps collecting data from the website and
form the training instances. At T 0 , web crawler gets the urban
air quality data at T 0 and the initial forecasting model will use
the data to predict the urban air quality grade of T 1 . At T 1 , web
crawler gets the urban real air quality grade and the urban air
quality data of T 1 . And the urban air quality data at T 0 and the
urban real air quality grade of T 1 are combined to generate a
training instance. And the forecasting model will
incrementally learn this training instance. Then, the updated
forecasting model will predict the air quality grade of T 2 based
on the urban air quality data of T 1 . By analogy, the online
prediction task and the incremental learning process of the
model are completed. Algorithm 1 shows the process of the
online forecasting and incremental learning algorithm.
The preprocessing of the dataset mainly includes the
following work: First is filling the missing items in the
datasets. Then, for the anomaly data in the datasets, we used
the adjacent normal data for overlay padding. Finally, the form
of the datasets is transformed to make sure that the learning
algorithm can address them.
A. LeveragingBagging Ensemble Learning Algorithm
Online ensemble learning is an incremental learning
method, which addresses the arriving instances one by one and
gradually evolves the learning model. Oza and Russell [8]
modified the traditional bagging to online learning condition
and proposed Online Bagging (OB). OB contains an ensemble
E that has M base model h m (m=1, …, M). When a new
instance arrives, the times that each base model trained for the
instances are determined by Poisson (1). Therefore, the
diversity of the base models is generated by different training
times. At last, all the base models make joint prediction by
voting. Algorithm1: OnlineForecastingAndIncrementalLearning
Input:
1. Model: the initial forecasting model
2. Stream: the air quality dataset
3. Resfile: output file of prediction result
Output: Model: the updated forecasting model
Process:
1. while (Stream.hasNext()) do
2. testInst = Stream. nextInstance()
3. if testInst.classIsMissing() then
4. prediction = Model.getPrediction(testInst)
5. Resfile.println(prediction)
6. else
7. Model.TrainOnInstance(testInst)
8. end if
9. end while
10. Return Model
Bifet et al. [7] improved OB and proposed online
Leveraging Bagging (LB) algorithm. They leveraged the
performance of bagging with two randomizations
improvements: increasing resampling and using output
detection codes. First, LB increases the weights of this
resampling using a larger value λ to compute the value of the
Poisson distribution. Using a value λ>1 will increase the When the air quality dataset still has instances, the model
will keeps get instances from the dataset. If an instance has the
urban air quality grade, the model will incrementally train on
it. Otherwise, the model will make prediction on it and the
results will be stored into a file to make analysis. Once all the
instances in the dataset are addressed, the updated model will
be returned.
III. A IR Q UALITY GRADE F ORECAST ING B ASED ON E NSEMBLE
L EARNING
88