Becoming the Trainer: Attacking ML Training Infrastructure

Artificial Intelligence (AI) is quickly becoming a strategic investment for companies of all sizes and industries such as automotive, healthcare and financial services. To fulfill this rapidly developing business need, machine learning (ML) models need to be developed and deployed to support these AI-integrated products and services via the machine learning operations (MLOps) lifecycle. The most critical phase within the MLOps lifecycle is when the model is being trained within an ML training environment. If an attacker were to gain unauthorized access to any components within the ML training environment, this could affect the confidentiality, integrity, and availability of the models being developed.

This research includes a background on ML training environments and infrastructure, along with detailing different attack scenarios against the various critical components, such as Jupyter notebook environments, cloud compute, model artifact storage, and model registries. It will be outlined how to take advantage of the integrations between these various components to facilitate privilege escalation and lateral movement, as well as how to conduct ML model theft and poisoning. In addition to showing these attack scenarios, it will be described how to protect and defend these ML training environments.

Description

Below is an outline of the presentation. For more details, see the detailed write-up that I have attached to this CFP.

  • Introduction (2 minutes)
  • Background (8 minutes) - Give a background on the use cases for ML for various industries. Then I will cover an overview of the MLOps lifecycle, along with how infrastructure is setup and configured for ML training environments. This will also include highlighting the key components of the infrastructure.
    • Prior Work
      • Model Theft from Azure ML Model Registry
      • Abuse of SageMaker Notebook Lifecycle Configurations
    • Machine Learning Technology Use Cases
    • Machine Learning Operations Lifecycle
    • ML Training Environment Infrastructure
      • Jupyter Notebook Environment
      • Cloud Compute
      • Model Artifact Storage
      • Model Registry
  • Attacking ML Training Environments (25 minutes) - Discuss why an attacker would want to attack various components of an ML training environment, and then go into detailed attack scenarios against those components. This will also include the usage of demos to help the audience digest the topics.
    • Key ML Components to Attack
    • Attack Scenarios
      • Scenario 1: Lateral Movement from Jupyter Notebook Environment to MLFlow
      • Scenario 2: Model Theft from MLFlow Model Registry
      • Scenario 3: Lateral Movement from SCM System to SageMaker Cloud Compute
      • Scenario 4: Lateral Movement to SageMaker Cloud Compute using Malicious Lifecycle Configuration
      • Scenario 5: Model Theft from SageMaker Model Registry
      • Scenario 6: Model Poisoning against SageMaker to gain Code Execution
      • Scenario 7: Model Poisoning against Azure ML to gain Code Execution
  • Protecting ML Training Environments (10 minutes) - Outline different ways to harden the different ML training environment components previously discussed throughout this talk. I will also provide detection rules for the attacks shown in this research for Azure ML, Amazon SageMaker, and MLFlow.
    • ML Training Environment Users
    • Jupyter Notebook Environment
    • Cloud Compute Instances
    • Model Artifact Storage and Registry
    • Detection Guidance
  • Conclusion (5 minutes)
  • Q&A (10 minutes)

About the Speaker