The Technology Headlines DEMAND FORCASTING & AI | Page 32
EXPERT ANALYSIS
THE TECHNOLOGY HEADLINES
DATA STORAGE IN A MODERN ANALYTICS ARCHITECTURE
By Meagan Longoria, consultant at Denny Cherry & Associates Consulting
Multi-Processing) databases, and analytical databases.
In theMicrosoft Azure data platform, this could translate
to Data Lake Storage Gen 2, SQL Data Warehouse, SQL
Database, and Analysis Services.
While introducing more components increasesdevelopment
complexity, it can more efficiently provide data to the people
who need it.
Threecommon mistakes to avoidwhen deciding where to
store your data are:
Meagan Longoria
CONSULTANT AT DENNY CHERRY & ASSOCIATES CONSULTING
I
n classic business intelligence environments, we
integrate, transform, and summarize data to monitor
business operations and conditions, helping decision
makers determinewhat actions to take. Where we were once
limited to a data warehouse and some reports, we now have
more choices for how to store, process, and analyze our data.
Organizations want to reduce the “time to value”: the
amount of time it takes to acquire, perform necessary
transformations, and deliver data to consumers.
Theyalsowant to expand their analytical capabilities to
include more types and sizes of data.
Modern analytics architectures often follow a polyglot
persistence strategy to help achieve those goals. This means
we use multiple types of data persistence layers, each
selected because it is the optimal choice for the type of data
and how it will be used. Common data storage services in
a modern analytics architecture include file storage, MPP
(Massively Parallel Processing) databases, SMP (Symmetric
AUGUST 2019
1)
Trying to store data where it doesn’t fit
2)
Transforming and standardizing data before its
value has been determined
3)
Losing track of sensitive data
Store Data Where It Best Fits
Data warehouses built in relational databases expect
standardized tabular data with a common schema. The
schema is imposed when the data is written to the table.
Although images can be stored in a relational database as
binary objects, it not optimal to store terabytes of images
there.
Data lakes are a good low-cost solution for storing data in a
variety of formats. They don’t require or impose an up-front
schema definition.Schema-on-read techniques are used to
impose structure and meaning at query time. This can be
useful when you have files that vary in format andnumber of
columns. Data lakes easily store images and video files.
Data Transformation for Business Value
Data lakes can be usefularchives oftransactional systems
andreference data. They often containstaging areas for
data waiting to be loaded to a data warehouse or other
application.
Data lakes can also serve as an exploration area for
data scientists to review new data without requiring
32