Why Data Catalogs
Should be the Linchpin in
Your Cloud Data Strategy
Joey Jablonski and Neal Matthews
Effective data-driven organizations
are using data catalogs to provide
total visibility into available data in an
easily consumable and centrally man-
aged location.
Data lakes have become a foundation for many orga-
nizations’ data environments. While these data lakes
provide new capabilities, many enterprises are strug-
gling to derive full value due to the operational over-
head of managing multiple new interfaces, tools, data
sets and integration points. These data lakes often
become “data swamps” due to the large amount of
data that is ingested with no clear method to find
data sets, separate them as needed and identify the
core elements of value to the business.
The Value of Data Catalogs
Why should you care about the deployment of data
catalog capability? Because while many organiza-
tions now grasp the importance of centralizing their
enterprise data, they often have not yet grappled
with how difficult it is to efficiently and securely
access that data. This difficulty arises because it is
ingested from many different places, with varying
amounts of structure.
Data catalogs are a critical element to all data lake
deployments to ensure that data sets are tracked,
identifiable by business terms, governed and man-
aged. Forbes contributor Dan Woods cautions orga-
nizations against using tribal knowledge as a strat-
egy, due to the inability to scale 1 . Data catalogs
crystallize corporate data governance policies into
practice, becoming the engine for enforcement and
the tool for auditing of compliance. The inclusive
nature of the data catalog enables it to be used for
collaboration and centralized sharing of information
in a known location, accessible across the
organization.
Data catalogs become the entry point for data scien-
tists and other analytical users across the organiza-
tion via the data engineers (Figure 1) who are focused
on creating enriched data sets for analytical uses.
Data catalogs ensure these dispersed teams can col-
laborate on data set quality, usage, and business
descriptions.
FALL 2017 | THE DOPPLER | 19