applying the same classification and protection model
across all your different data assets will create unnecessary
risk or expense.
This can become interesting when tagging data with those
classifications, not necessarily an easy task. It can be a chal-
lenge trying to classify chunks of data, since they have likely
developed a life of their own in your environment and may
no longer be clearly organized. Or, as is very common, many
copies of data have proliferated, so it is difficult to deter-
mine which is the source of truth. Migrating to cloud and
considering whether to relocate this data is a prime oppor-
tunity to address the organization, classification and tag-
ging of data.
Many companies have a data classification strategy, but an
incomplete/inconsistent deployment of that strategy. Typi-
cally this is caused by a limited capability, toolset or process
for applying rules to their data. In some instances the only
way data is identified by a classification is at the platform
level—e.g., a “Confidential” MSSQL database, or all “Super
Secret” files on a particular server.
Making sure data classification is defined and applied via
tagging is an important step to take, whether or not data is
moving to the cloud. Furthermore, enforcing tagging
through automation when data is generated, or when it is
ingested into the cloud estate, aids with compliance and
scalability. This helps identify the location of your confiden-
tial data, and provides guidance on how to protect it.
Organizations are
leveraging their data to
its fullest extent in an
effort to improve
operations and gain
market share. Protecting
data is therefore a
mission-critical priority.
72 | THE DOPPLER |
SPRING 2019
How Will People Use Your Data?
Once you classify your data, you need to analyze how peo-
ple use it. Where do they put it? Is it file system data, data-
base data, data streams or a data lake?
As you consider whether an application should move to the
cloud, determine what data that application requires. If the
app needs to span across the chasm between an on-prem-
ises and a cloud-based environment, then data gravity will
matter. That is because you will have to determine where
your biggest chunks of data are and whether you need to
do analytics near that data. If you are pulling data from one
environment into another, you are apt to run up transfer
costs or create a certain amount of latency across that
chasm. If you have the data on one side of the chasm and
your application on the same side, then the data gravity
issue is less pressing.
If people do not know where certain data resides, they
might institute a rule to encrypt all the data across certain
types. Or they may plan ETL work with that data, unaware
of the transfer costs or protection requirements. So then
you may lose visibility into where confidential data is
located. In the cloud you want to define and tag your data to
know where it is and how you want to encrypt it.
Characteristics That Drive Data Protec-
tion Decisions:
Now you have an inventory with the key data characteris-
tics (classification, identification and location) needed to
develop your protection strategy. Next, you must deter-
mine how to protect it and who should have access to it.
But why develop this inventory for data protection pur-
poses? First, you will likely have different requirements for
different levels of risk, and the approach for each level will
have different costs.
Secondly, the tools for data protection on-premises, in the
cloud and between cloud providers, are unlikely to have
feature parity, so it is often necessary to implement differ-
ent controls in each environment. For example, consider the
need to shift the security focus in the cloud from perimeter
protections to workloads. Many on-premises data protec-
tion tools are perimeter oriented, or just not optimized for
public cloud workloads.