Anyone
who has ever
worked with data
understands that
no data set is
ever “clean.”
most successful businesses do both, just
like the Oakland A’s. Whether knee-deep
in a big data implementation or just starting
to explore the options, companies should
consider some tips, pitfalls and best practices for getting the maximum value from
their data. A good way to start is to make
data analytics decisions with eyes wide
open about what is truly required for setup, which tools are most effective for the
organization, and how to maximize alwayslimited resources.
NOTE TO SELF: DATA IS NOT PERFECT
Anyone who has ever worked with
data understands that no data set is ever
“clean.” The situation becomes even more
complicated when organizations are pulling
data from multiple production applications.
A NA L Y T I C S
A few examples highlight the enormous,
unavoidable challenges associated with
data inconsistencies.
Consider an international company
looking to identify fraud in offices worldwide. The company may start with a database of countries with the highest risk of
corruption, and then evaluate transactions
for those countries. In different production
applications, countries may be noted in
multiple different ways depending on the
system, the purpose for which the information was captured, and the individual who
entered the data. For example, South Korea may be entered as a standard two-letter
abbreviation such as “KR” in one system,
and specified in various other standard text
formats such as “South Korea,” “Korea,
South” or “Republic of Korea.”
M A R C H / A P R I L 2 014
|
23