Historical Data Preparation
Historical Data Preparation
1. Historical data should include a set of characteristics and a
target variable. All of scorecard development methods quantify
the relationship between the characteristics (input columns)
and “Good/Bad” performance (target column).
2. Example of borrowers characteristics.
Scorecard characteristics are similar
to those used in subjective expert
judgment .
Credit Product
Financial
Credit History
Social
• Amount
• Term
• Document Goal
• Assets
• Debts
• Monthly Income
• Monthly Expenses
• In Current Bank
• In Other Banks
• Credit Bureau Data
3. Those characteristics,
whose usage is not
reasonable, are
excluded. For example:
on the picture you can
see that the “Good/
Bad” distribution does
not depend on the
Home Ownership
characteristics.
4. All borrowers should
be marked in the
target column as
“Good” or “Bad” by
a certain rule. For
example: all the
borrowers to pay in 30
days, are “Good”, but
borrowers with a delay
of more than 90 days
are marked as “Bad”.
• Work experience
• Time of residence at
current address
• Marital status
Good
Intermediate
Bad
Bad
Bad
Bad
0
Exclusions
Certain types of accounts need to be excluded from the
dataset. For example: bank workers or VIP clients records
could be excluded from data set.
outlier
Data Cleansing
Borrowers portfolio data can contain the following
anomalies that should be replaced or deleted :
• Outliers - values that lie far outside the main volume
• Data entry errors
• Missing values
missing value
www.plug-n-score.com
data entry error
30
60
90
120
180+
Days