Securing the ML Lifecycle
• ML – Machine Learning
• SE – Software Engineering
• SDLC – Secure Development Lifecycle
Machine Learning can be defined as programming a computer so that it can learn from data 3 . Unlike in procedural development , a developer does not simply define the algorithmic approach to a solution , but rather the required steps and parameters to extract patterns from data to produce generalized models 4 . In other words , based on a ( large ) set of training data , a training process is executed , which leads to a trained model . This could be models for image recognition in radio-diagnostics or industrial maintenance , models for financial prediction , models for malware detection in networks 5 , models for speech recognition , or text-to-speech synthesis 6 .
These three core artefacts ( training data , training process description , trained model ) can be susceptible to inadvertent modifications or even intentional attacks . The verification and validation mechanisms used in standard software development ( e . g ., static code analysis or unit testing ) do not suffice to guarantee the overall quality of training data , training code , or trained models .
When discussing the ML Lifecycle ( Section 3.1 ), we also need to understand the stakeholders involved ( Section 3.2 ) before we can discuss the actual threats , attacks , and countermeasures for the individual stages of the ML lifecycle ( Section 4 ). The following two brief case studies will be referred to in the discussion .
A company produces complex machines consisting of several hundred individual parts . An image recognition model has been trained to recognize each part ( even if already combined with others ) and will display information about it to the engineers during the assembly process . Several dozen parts suppliers provide detailed technical descriptions and images for each part , but the many individual parts are only assembled on the company ’ s premises .
3
Geron , A . “ Hands-on Machine Learning with Scikit-Learn , Keras , and TensorFlow : Concepts , Tools , and Techniques to Build Intelligent Systems ” O ’ Reilly , 2019 .
4
Kelleher , J . et al . “ Fundamentals of Machine Learning for Predictive Data Analytics ” MIT Press , 2020 .
5
6
IIC Journal of Innovation 39