Building Security into AI Models – Learning the Unknowns

The UK government is proposing a collaborative approach to address the cybersecurity of AI. They aim to gather insights on potential AI-related cybersecurity risks and effective mitigation strategies. We certainly have a view on this topic. The governmental review process involves seeking input from a wide range of stakeholders, including industry experts, researchers, and the public. The feedback collected will inform the development of policies and frameworks to enhance the security and resilience of AI systems. The approach emphasizes understanding current challenges and future threats to better safeguard AI technologies. For more details, visit the official document. The approach we propose is very much aligned with the formation of this proposed rulemaking, but at a grassroots level at each company.

The document describes several security risks associated with AI, including:

  1. Adversarial Attacks: Manipulating AI inputs to produce incorrect outputs.
  2. Data Poisoning: Corrupting training data to affect AI behavior.
  3. Model Inversion: Extracting sensitive data from AI models.
  4. Model Stealing: Replicating AI models through observed inputs and outputs.
  5. Systemic Vulnerabilities: Exploiting weaknesses in AI integration within larger systems.

These risks highlight the need for robust security measures to protect AI technologies. So, digging a little deeper, the document proposes several measures to counteract AI security risks, including:

  1. Enhanced Security Protocols: Implementing robust cybersecurity frameworks for AI systems.
  2. Adversarial Robustness: Developing techniques to make AI resistant to adversarial attacks.
  3. Secure Data Practices: Ensuring integrity and security of training data.
  4. Access Controls: Restricting access to AI models and sensitive information.
  5. Continuous Monitoring: Regularly monitoring AI systems for vulnerabilities and attacks.

Along with ensuring encryption, protecting data integrity and of course robust authentication, an interesting thread running through this document is building security into AI models by doing adversarial training. Adversarial robustness techniques for AI include:

  1. Adversarial Training: Training AI models with adversarial examples to improve resilience.
  2. Defensive Distillation: Reducing model sensitivity to small input changes by using softened outputs during training.
  3. Gradient Masking: Obscuring the gradients used in model training to prevent adversarial attacks from optimizing efficiently.
  4. Input Sanitization: Filtering and preprocessing inputs to detect and neutralize adversarial manipulations.
  5. Ensemble Methods: Combining multiple models to dilute the impact of any single adversarial attack.

These techniques aim to make it more difficult for attackers to penetrate AI models and exfiltrate information. There are many unknown unknowns with AI models, and it’s hard to predict how they might retain information and where pieces of these models might end up in the future. The current state of LLM tech is known for its lack of explainability, because a lot of data gets encoded in a neural network in a way that’s hard to audit. Something that the leadership at Slack doesn’t seem to appreciate, or perhaps hopes all of us Slack users stay unaware of as they train their models on our most sensitive content meant for our colleagues eyes only.

In our experience, one of the best ways to find the unknown unknowns when it comes to security is from the process experts themselves, as demonstrated in the detection of the XZ Utils attack. The CISOs and their teams are probably not going to figure out on their own how to run gradient masking on the AI models being built into their company’s systems. They certainly won’t know which models represent the highest risk. This information has to come from the engineers and the business teams deploying these models in the field.

When running through role-specific security content on technical topics such as AI Adversarial Robustness, a modern feedback-driven security awareness approach will enable the security team to partner with the technical teams to augment everyone’s collective knowledge. So our view for the UK Department for Science, Innovation and Technology is to deploy a Cybersecurity HRM program that allows for the capture of process friction points that none of us have yet discovered. It’s an iterative process, where the recommendations in this document are trained to the AI experts, with a feedback loop to the security team that enables discovery of relevance, process friction and self-efficacy in following the practices.

Defending Against Malicious AI: OutThink’s People-First Defense