- Overview of OWASP Top 10 ML & LLM Security Checklist
- Understanding Attack Surfaces in AI Systems
- Adversarial Attacks
- ML01:2023 - Input Manipulation Attack
- ML08:2023 - Model Skewing
- ML07:2023 - Transfer Learning Attack
- ML09:2023 - Output Integrity Attack
In this blog, we’ll explore:
✅ What Adversarial Attacks are
✅ Key adversarial threats from the OWASP Top 10 for Machine Learning
✅ Different adversarial attack techniques used to fool AI
What is an Adversarial Attack?
Adversarial attacks are techniques used to trick AI models into making incorrect predictions or decisions by altering the data they process. These attacks are designed to manipulate machine learning (ML) algorithms, making them behave in unexpected or harmful ways, without the model realizing it.
These attacks are particularly dangerous because the changes are often so small and unnoticeable to humans but can cause big problems for AI systems.
How Do Adversarial Attacks Work?
Adversarial attacks work by manipulating the input data fed into the model. This can include images, text, or other types of data. The attacker will slightly tweak the input in a way that seems harmless to a human but causes the model to misclassify or misinterpret it.
For example:
- A picture of an umbrella may be slightly altered to make the AI think it's a quill.
- A seemingly harmless sentence may be modified so that the AI misinterprets its meaning.
Why Are Adversarial Attacks Dangerous?
Adversarial attacks are dangerous because they:
- Undermine trust in AI models, making them unreliable.
- Can be used to manipulate critical systems like facial recognition, self-driving cars, or financial models.
- Are often hard to detect and can go unnoticed, allowing attackers to take advantage of the AI without triggering alarms.
Adversarial Threats from the OWASP Top 10 for Machine Learning
Here are some of the key adversarial threats to machine learning systems from the OWASP Top 10:
- ML01: Input Manipulation Attack – Attackers subtly alter the input data to mislead the model into making incorrect predictions (e.g., changing an image to be misclassified).
- ML08: Model Skewing – Manipulating the model’s training data to create biases or inaccurate patterns, leading to wrong decisions over time.
- ML07: Transfer Learning Attack – Injecting malicious backdoors into pre-trained models during transfer learning, allowing attackers to trigger malicious actions when deployed.
- ML09: Output Integrity Attack – Altering the model’s outputs to provide misleading or false information, which could deceive users or cause harm.
Adversarial Attacks Tools
Adversarial attacks are techniques used to manipulate machine learning models by introducing small, imperceptible changes to the input data, causing the model to make incorrect predictions. Here are some popular tools and methods used for adversarial attacks:
- FGSM (Fast Gradient Sign Method)
- PGD (Projected Gradient Descent)
- DeepFool
- Carlini & Wagner (C&W) Attack
- Jacobian-Based Saliency Map Attack (JSMA)
- One Pixel Attack
- Universal Adversarial Perturbations (UAP)
- HopSkipJump (HSJA)
- Boundary Attack
- ZOO (Zeroth Order Optimization)
Each of these tools and techniques has its strengths and weaknesses, depending on the context in which they are applied (white-box or black-box settings, attack goals, and model architectures).
References
- https://viso.ai/deep-learning/adversarial-machine-learning/
- https://en.wikipedia.org/wiki/Adversarial_machine_learning
- https://www.datacamp.com/blog/adversarial-machine-learning
- https://medium.com/sciforce/adversarial-attacks-explained-and-how-to-defend-ml-models-against-them-d76f7d013b18
- https://github.com/Trusted-AI/adversarial-robustness-toolbox
- https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
- https://hiddenlayer.com/innovation-hub/whats-in-the-box/
No comments:
Post a Comment