demonstrate a connection when a change to one is reflected
in the other. In the data world, data entanglement means
that when two data stores share common information, a
change in one is reflected in the other.
Yes, we are fundamentally talking data replication, but
there is a benefit to thinking about its challenges from the
perspective of entangled systems. This encourages you,
when considering use case scenarios, to make sure you are
taking all the potential impacts into account. Let us look at
some high-level scenarios and why we should consider
entanglements.
Living on the Edge in the World of IoT
With IoT devices, sensors of all types generate massive
amounts of data. These devices live out on the edge of the
cloud and create multiple data entanglement impact sce-
narios, as follows:
• Real-time system scenarios – In manufacturing,
data from IoT scenarios frequently demands real-
time acquisition and processing. Immediate response
is required, so manufacturers cannot wait for the data
to make it to the cloud and back. This means the data
store needs to be on or near the edge of the services
and applications doing the data processing. For
example, when sensors in a manufacturing line report
an issue in one part of the system, the response must
be immediate. Or, when a new smart car senses an
adverse driving condition, the analysis and response
cannot be delayed by latencies back to the cloud.
• Long-term analysis scenarios – It is often beneficial
to analyze IoT device data over the long term. For
example, when doing predictive maintenance. Such
applications and services do not need real-time data
access and capabilities, so inherent latency is not an
issue. The original data at the edge is entangled with
the data stores used by the long-term analytical
applications, which can be in an entirely different
location within the cloud.
• Feedback/updates to devices scenarios – Based on
the various analyses done on the data, it may be
important to send feedback/updates to the IoT
76 | THE DOPPLER |
SPRING 2019
devices. (In the smart car example, the analysis may
provide data that improves the performance of the
smart car features, so you would want that data to
upgrade all cars in the fleet.) The devices and back-
end systems are inexorably entangled, and changes
propagate/replicate/get modified/return back to the
devices involved.
As you can see from these high-level scenarios, data is
indeed entangled between systems, so we need to ensure
the right data is in the right place at the right time.
Data Replication Scenarios
Most replication needs fall into a small set of scenarios
(although, as usual, there are exceptions outside these
scenarios):
• Data synchronization scenarios – One of the best-
known examples is the full synchronization of data
between two or more databases, typically in a near
real-time fashion. All database systems have some
level of this capability built in. Potential negative
impacts can affect cost, resources and performance.
In this model, all systems perform reads and updates
and are kept in sync. Race conditions are always a
risk in this scenario, so a large amount of resources is
required to ensure data integrity. While conceptually
the simplest solution, this can be overkill for most
needs.
• Snapshot scenarios – The snapshot replication of
one or more data tables at predetermined time inter-
vals can be useful when there is no real-time need for
data access, and destinations only require read
access. For example, in the IoT examples above, feed-
back/updates could potentially be accomplished
using the snapshot technique.
• Transactional scenarios – Transactional replication
is a step down from pure data synchronization. Data
is copied from a master system to slave systems at or
near real time. This is usually thought of as an incre-
mental update from a snapshot, and it is frequently
used as a mechanism for backup and passive system
availability.