Making Factories Smarter Through Machine Learning
In terms of machine learning methodology, it is important to find the relevant variables to answer specific questions. This way, noise, overfitting and bandwidth usage is reduced. This technique is called“ feature subset selection.”
After the questions are selected, there are mainly two types of algorithms that should be selected, depending on the stage or the answer required. These algorithms are unsupervised and supervised learning 15. When it is not clear which type of information is going to be found, unsupervised learning( mainly clustering techniques) are used. This step is called knowledge discovery, where measuring techniques applied to data are used to group into different clusters or partitions. Those measuring techniques might be based on distances, densities, probability distributions, etc. The usage of each type of measurement depends on dataset complexity and size.
However, if you want to answer a specific question and you have examples to train the system with known results from those examples, supervised learning is used. By example, knowing that the tool tip behaves at certain values of temperature but not all of the values are known, one would use supervised learning to determine the unknown values. Here, a machine learning system is trained using the examples and then is tested with new examples.
To build the complete system, the workflow with data reduction, described in Figure 4, would be used. The data is taken from the manufacturing system and sent to a machine learning algorithm that uses the new data and other information, such as mathematical models( Finite Element Method( FEM) results, behavior equations, etc.), to produce the predictive system. While data is travelling within this process, a summarization is performed, helping to only move data that is needed to solve the asked question. This helps to reduce the bandwidth utilization and increase the response speed. This type of data is called Smart Data: For example, reduce the system’ s 50K variables to select the most important variables – the“ smart” metadata that addresses the key criteria is determined and communicated vs. flooding the network with non-critical data.
15
O. CHAPELLE, B. SCHOLKOPF y A. ZIEN, Semi-Supervised Learning, Cambridge, Massachusettes: The MIT Press, 2006
IIC Journal of Innovation- 35-