BMTA Newsletter BMTA Newsletter - Spring 2020 | Page 10 PREDICTIVE MODEL DEVELOPMENT Dr Yanfeng Liang Mathematician TÜV SÜD National Engineering Laboratory UNLOCKING THE POTENTIAL IN LARGE HISTORICAL DATA SET In a world where data is coming in large volumes and fast speeds, without the use of proper advanced modelling techniques such as machine learning models, the value and benefits from data cannot be optimised. In flow measurement, flow meters are capable of outputting digital data sets, which can be used to indicate the performance of flow meters under different operating conditions. Errors such as improper installations, deposition and the presence of a second phase can be predicted using predictive models to enable condition-based monitoring and increase the industry’s efficiency in fault diagnosis process. Thus, reducing operational cost. As the world is moving towards digitalisation, it has become increasingly important to utilise predictive models and machine learning algorithms to extract valuable information and obtain new insights from the available data. In an industry such as flow measurement, vast amounts of data are stored and generated by flow meters which have built-in digital transmitters outputting at a high frequency. This output contains valuable information on the performance of flow meters as well as their operating conditions. In other words, big data can be used to enable predictive maintenance and condition-based monitoring which will effectively reduce operating costs and improve the decision-making process. However, storing data alone is not enough to unlock these opportunities. Data is only useful if appropriate modelling strategies are used to extract the underlying values and obtain new insights. TÜV SÜD National Engineering Laboratory holds the UK’s national standard for measurement in density and flow. Over the years, their data acquisition systems have logged and archived 20 years’ worth of data detailing various flow meters’ performance, test facility configuration and operating conditions. It was observed, from multiple research projects conducted over the years, that any error such as improper meter installation, deposition such as wax and the presence of a second phase are manifested through drifts in a meters’ diagnostic variables. However, interpreting the data can be challenging as different errors can induce the same drift patterns in the same diagnostic variables. Consequently, it becomes extremely difficult for end-users to distinguish between different errors using basic visual observation tools. This increases fault diagnosis time and could delay rectification actions which ultimately impact the reliability and accuracy of the primary measurements generated by the flow meters. With more and more data becoming available, complex issues arise such as high dimensionality data (dataset with a large number of variables) as well as poorly structured databases with missing data labels. Diagnostic variables from flow measurement often have interrelationships with each other which increases the difficulty in analysing the underlying relationships between variables. Furthermore, in flow measurement, certain experiments can be expensive to conduct, resulting in a limited amount of data. For example, failure data on flow meters as well as erosion data are limited due to the fact that such tests are extremely costly to conduct and cause significant damage to the meters.