The Doppler Quarterly Spring 2019

demonstrate a connection when a change to one is reflected in the other. In the data world, data entanglement means that when two data stores share common information, a change in one is reflected in the other. Yes, we are fundamentally talking data replication, but there is a benefit to thinking about its challenges from the perspective of entangled systems. This encourages you, when considering use case scenarios, to make sure you are taking all the potential impacts into account. Let us look at some high-level scenarios and why we should consider entanglements. Living on the Edge in the World of IoT With IoT devices, sensors of all types generate massive amounts of data. These devices live out on the edge of the cloud and create multiple data entanglement impact sce- narios, as follows: • Real-time system scenarios – In manufacturing, data from IoT scenarios frequently demands real- time acquisition and processing. Immediate response is required, so manufacturers cannot wait for the data to make it to the cloud and back. This means the data store needs to be on or near the edge of the services and applications doing the data processing. For example, when sensors in a manufacturing line report an issue in one part of the system, the response must be immediate. Or, when a new smart car senses an adverse driving condition, the analysis and response cannot be delayed by latencies back to the cloud. • Long-term analysis scenarios – It is often beneficial to analyze IoT device data over the long term. For example, when doing predictive maintenance. Such applications and services do not need real-time data access and capabilities, so inherent latency is not an issue. The original data at the edge is entangled with the data stores used by the long-term analytical applications, which can be in an entirely different location within the cloud. • Feedback/updates to devices scenarios – Based on the various analyses done on the data, it may be important to send feedback/updates to the IoT 76 | THE DOPPLER | SPRING 2019 devices. (In the smart car example, the analysis may provide data that improves the performance of the smart car features, so you would want that data to upgrade all cars in the fleet.) The devices and back- end systems are inexorably entangled, and changes propagate/replicate/get modified/return back to the devices involved. As you can see from these high-level scenarios, data is indeed entangled between systems, so we need to ensure the right data is in the right place at the right time. Data Replication Scenarios Most replication needs fall into a small set of scenarios (although, as usual, there are exceptions outside these scenarios): • Data synchronization scenarios – One of the best- known examples is the full synchronization of data between two or more databases, typically in a near real-time fashion. All database systems have some level of this capability built in. Potential negative impacts can affect cost, resources and performance. In this model, all systems perform reads and updates and are kept in sync. Race conditions are always a risk in this scenario, so a large amount of resources is required to ensure data integrity. While conceptually the simplest solution, this can be overkill for most needs. • Snapshot scenarios – The snapshot replication of one or more data tables at predetermined time inter- vals can be useful when there is no real-time need for data access, and destinations only require read access. For example, in the IoT examples above, feed- back/updates could potentially be accomplished using the snapshot technique. • Transactional scenarios – Transactional replication is a step down from pure data synchronization. Data is copied from a master system to slave systems at or near real time. This is usually thought of as an incre- mental update from a snapshot, and it is frequently used as a mechanism for backup and passive system availability.

The Doppler Quarterly Spring 2019 | Page 78