JagSkap: Adversarial Attack in ML Explained

Overview of OWASP Top 10 ML & LLM Security Checklist
Understanding Attack Surfaces in AI Systems

Adversarial Attacks

ML01:2023 - Input Manipulation Attack
ML08:2023 - Model Skewing
ML07:2023 - Transfer Learning Attack
ML09:2023 - Output Integrity Attack

Github Link: https://github.com/RihaMaheshwari/AIML-LLM-Security/

In this blog, we’ll explore:

   ✅ What Adversarial Attacks are
   ✅ Key adversarial threats from the OWASP Top 10 for Machine Learning
   ✅ Different adversarial attack techniques used to fool AI

What is an Adversarial Attack?

Adversarial attacks are techniques used to trick AI models into making incorrect predictions or decisions by altering the data they process. These attacks are designed to manipulate machine learning (ML) algorithms, making them behave in unexpected or harmful ways, without the model realizing it.

These attacks are particularly dangerous because the changes are often so small and unnoticeable to humans but can cause big problems for AI systems.

How Do Adversarial Attacks Work?

Adversarial attacks work by manipulating the input data fed into the model. This can include images, text, or other types of data. The attacker will slightly tweak the input in a way that seems harmless to a human but causes the model to misclassify or misinterpret it.

For example:

A picture of an umbrella may be slightly altered to make the AI think it's a quill.
A seemingly harmless sentence may be modified so that the AI misinterprets its meaning.

Why Are Adversarial Attacks Dangerous?

Adversarial attacks are dangerous because they:

Undermine trust in AI models, making them unreliable.
Can be used to manipulate critical systems like facial recognition, self-driving cars, or financial models.
Are often hard to detect and can go unnoticed, allowing attackers to take advantage of the AI without triggering alarms.

Adversarial Threats from the OWASP Top 10 for Machine Learning

Here are some of the key adversarial threats to machine learning systems from the OWASP Top 10:

ML01: Input Manipulation Attack – Attackers subtly alter the input data to mislead the model into making incorrect predictions (e.g., changing an image to be misclassified).
ML08: Model Skewing – Manipulating the model’s training data to create biases or inaccurate patterns, leading to wrong decisions over time.
ML07: Transfer Learning Attack – Injecting malicious backdoors into pre-trained models during transfer learning, allowing attackers to trigger malicious actions when deployed.
ML09: Output Integrity Attack – Altering the model’s outputs to provide misleading or false information, which could deceive users or cause harm.

Adversarial Attacks Tools

Adversarial attacks are techniques used to manipulate machine learning models by introducing small, imperceptible changes to the input data, causing the model to make incorrect predictions. Here are some popular tools and methods used for adversarial attacks:

FGSM (Fast Gradient Sign Method)
PGD (Projected Gradient Descent)
DeepFool
Carlini & Wagner (C&W) Attack
Jacobian-Based Saliency Map Attack (JSMA)
One Pixel Attack
Universal Adversarial Perturbations (UAP)
HopSkipJump (HSJA)
Boundary Attack
ZOO (Zeroth Order Optimization)

Each of these tools and techniques has its strengths and weaknesses, depending on the context in which they are applied (white-box or black-box settings, attack goals, and model architectures).

References

https://viso.ai/deep-learning/adversarial-machine-learning/
https://en.wikipedia.org/wiki/Adversarial_machine_learning
https://www.datacamp.com/blog/adversarial-machine-learning
https://medium.com/sciforce/adversarial-attacks-explained-and-how-to-defend-ml-models-against-them-d76f7d013b18
https://github.com/Trusted-AI/adversarial-robustness-toolbox
https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
https://hiddenlayer.com/innovation-hub/whats-in-the-box/

Sunday, February 2, 2025

Adversarial Attack in ML Explained | Jagskap