The Technology Headlines DEMAND FORCASTING & AI | Page 32

EXPERT ANALYSIS THE TECHNOLOGY HEADLINES DATA STORAGE IN A MODERN ANALYTICS ARCHITECTURE By Meagan Longoria, consultant at Denny Cherry & Associates Consulting Multi-Processing) databases, and analytical databases. In theMicrosoft Azure data platform, this could translate to Data Lake Storage Gen 2, SQL Data Warehouse, SQL Database, and Analysis Services. While introducing more components increasesdevelopment complexity, it can more efficiently provide data to the people who need it. Threecommon mistakes to avoidwhen deciding where to store your data are: Meagan Longoria CONSULTANT AT DENNY CHERRY & ASSOCIATES CONSULTING I n classic business intelligence environments, we integrate, transform, and summarize data to monitor business operations and conditions, helping decision makers determinewhat actions to take. Where we were once limited to a data warehouse and some reports, we now have more choices for how to store, process, and analyze our data. Organizations want to reduce the “time to value”: the amount of time it takes to acquire, perform necessary transformations, and deliver data to consumers. Theyalsowant to expand their analytical capabilities to include more types and sizes of data. Modern analytics architectures often follow a polyglot persistence strategy to help achieve those goals. This means we use multiple types of data persistence layers, each selected because it is the optimal choice for the type of data and how it will be used. Common data storage services in a modern analytics architecture include file storage, MPP (Massively Parallel Processing) databases, SMP (Symmetric AUGUST 2019 1) Trying to store data where it doesn’t fit 2) Transforming and standardizing data before its value has been determined 3) Losing track of sensitive data Store Data Where It Best Fits Data warehouses built in relational databases expect standardized tabular data with a common schema. The schema is imposed when the data is written to the table. Although images can be stored in a relational database as binary objects, it not optimal to store terabytes of images there. Data lakes are a good low-cost solution for storing data in a variety of formats. They don’t require or impose an up-front schema definition.Schema-on-read techniques are used to impose structure and meaning at query time. This can be useful when you have files that vary in format andnumber of columns. Data lakes easily store images and video files. Data Transformation for Business Value Data lakes can be usefularchives oftransactional systems andreference data. They often containstaging areas for data waiting to be loaded to a data warehouse or other application. Data lakes can also serve as an exploration area for data scientists to review new data without requiring 32