IIC Journal of Innovation 16th Edition | Page 58

Design and Implementation of a Digital Twin for Live Petroleum Production Optimization
COMPUTATIONAL WORKFLOW : LIVE SENSOR DATA PROCESSING & SIMULATION Live sensor data processing :
Live sensor data processing starts with sensor data being added to an on-cloud source location where it can be read by a monitoring workflow system . The monitoring workflow system copies the newly changed files and starts the ETL ( extract , transform , load ) process . The system typically starts with a scheduling system such as Apache Airflow or Spotify ’ s Luigi which allow for workflows to be written as DAGs ( directed acyclic graphs ) of tasks . The scheduler executes these tasks on multiple workers following the specified dependencies between tasks and can be elastically scaled depending on load .
The ETL processes act as producers in a common messaging system workflow . The ETL process adds encoded sensor values as messages to a queue to be later read by consumers which store the data into a time series database . Processing queues such as Apache Kafka or Apache Flink create distributed durable queues for processing of queued data . Individual queue consumers can have purpose developed functionality for persisting sensor streams , creating new values and derived or calculated sensors . These durable queues provide a significant buffer of messages to be added if there is a spike in demand and consumers are not able to keep up with producers .
Eventually , the time series based sensor stream needs to be persisted in a time series aware storage system such as OpenTSDB or TimescaleDB . These time series storage solutions are purpose built data stores that store and query temporal data . Some of these stores can scale to millions of operations per second . Having a time series or temporal query engine becomes critical to effectively process sensor streams .
The design of the system allows for reprocessing of data if needed . The repeatable transformation process allows for better recovery from errors and bugs . The system is also performant . It is not uncommon to process 600k sensors per minute and the system can also scale to higher throughput by adding workers , queue partitions , or database nodes .
Simulation Workflow
After data processing , we are ready to use the data to generate simulation cases . The data is analyzed for parameter ranges to explore in simulation . Simulation cases are partitioned by field data and queued in a document database collection . Cloud instances configured with the commercial simulator software and a Python process consume the queue and process cases . Our commercial simulator can be called using a command line interface using a JSON file for input . The output from the simulator is saved to the same document with the case and the status is marked as completed .
IIC Journal of Innovation - 53 -