IIC Journal of Innovation 3rd Edition | Page 36

Making Factories Smarter Through Machine Learning
In terms of machine learning methodology , it is important to find the relevant variables to answer specific questions . This way , noise , overfitting and bandwidth usage is reduced . This technique is called “ feature subset selection .”
After the questions are selected , there are mainly two types of algorithms that should be selected , depending on the stage or the answer required . These algorithms are unsupervised and supervised learning 15 . When it is not clear which type of information is going to be found , unsupervised learning ( mainly clustering techniques ) are used . This step is called knowledge discovery , where measuring techniques applied to data are used to group into different clusters or partitions . Those measuring techniques might be based on distances , densities , probability distributions , etc . The usage of each type of measurement depends on dataset complexity and size .
However , if you want to answer a specific question and you have examples to train the system with known results from those examples , supervised learning is used . By example , knowing that the tool tip behaves at certain values of temperature but not all of the values are known , one would use supervised learning to determine the unknown values . Here , a machine learning system is trained using the examples and then is tested with new examples .
To build the complete system , the workflow with data reduction , described in Figure 4 , would be used . The data is taken from the manufacturing system and sent to a machine learning algorithm that uses the new data and other information , such as mathematical models ( Finite Element Method ( FEM ) results , behavior equations , etc .), to produce the predictive system . While data is travelling within this process , a summarization is performed , helping to only move data that is needed to solve the asked question . This helps to reduce the bandwidth utilization and increase the response speed . This type of data is called Smart Data : For example , reduce the system ’ s 50K variables to select the most important variables – the “ smart ” metadata that addresses the key criteria is determined and communicated vs . flooding the network with non-critical data .
15
O . CHAPELLE , B . SCHOLKOPF y A . ZIEN , Semi-Supervised Learning , Cambridge , Massachusettes : The MIT Press , 2006
IIC Journal of Innovation - 35 -