The Doppler Quarterly Spring 2019 | Page 74

applying the same classification and protection model across all your different data assets will create unnecessary risk or expense. This can become interesting when tagging data with those classifications, not necessarily an easy task. It can be a chal- lenge trying to classify chunks of data, since they have likely developed a life of their own in your environment and may no longer be clearly organized. Or, as is very common, many copies of data have proliferated, so it is difficult to deter- mine which is the source of truth. Migrating to cloud and considering whether to relocate this data is a prime oppor- tunity to address the organization, classification and tag- ging of data. Many companies have a data classification strategy, but an incomplete/inconsistent deployment of that strategy. Typi- cally this is caused by a limited capability, toolset or process for applying rules to their data. In some instances the only way data is identified by a classification is at the platform level—e.g., a “Confidential” MSSQL database, or all “Super Secret” files on a particular server. Making sure data classification is defined and applied via tagging is an important step to take, whether or not data is moving to the cloud. Furthermore, enforcing tagging through automation when data is generated, or when it is ingested into the cloud estate, aids with compliance and scalability. This helps identify the location of your confiden- tial data, and provides guidance on how to protect it. Organizations are leveraging their data to its fullest extent in an effort to improve operations and gain market share. Protecting data is therefore a mission-critical priority. 72 | THE DOPPLER | SPRING 2019 How Will People Use Your Data? Once you classify your data, you need to analyze how peo- ple use it. Where do they put it? Is it file system data, data- base data, data streams or a data lake? As you consider whether an application should move to the cloud, determine what data that application requires. If the app needs to span across the chasm between an on-prem- ises and a cloud-based environment, then data gravity will matter. That is because you will have to determine where your biggest chunks of data are and whether you need to do analytics near that data. If you are pulling data from one environment into another, you are apt to run up transfer costs or create a certain amount of latency across that chasm. If you have the data on one side of the chasm and your application on the same side, then the data gravity issue is less pressing. If people do not know where certain data resides, they might institute a rule to encrypt all the data across certain types. Or they may plan ETL work with that data, unaware of the transfer costs or protection requirements. So then you may lose visibility into where confidential data is located. In the cloud you want to define and tag your data to know where it is and how you want to encrypt it. Characteristics That Drive Data Protec- tion Decisions: Now you have an inventory with the key data characteris- tics (classification, identification and location) needed to develop your protection strategy. Next, you must deter- mine how to protect it and who should have access to it. But why develop this inventory for data protection pur- poses? First, you will likely have different requirements for different levels of risk, and the approach for each level will have different costs. Secondly, the tools for data protection on-premises, in the cloud and between cloud providers, are unlikely to have feature parity, so it is often necessary to implement differ- ent controls in each environment. For example, consider the need to shift the security focus in the cloud from perimeter protections to workloads. Many on-premises data protec- tion tools are perimeter oriented, or just not optimized for public cloud workloads.