- Overview of OWASP Top 10 ML & LLM Security Checklist
- Understanding Attack Surfaces in AI Systems
- Adversarial Attacks
- ML01:2023 - Input Manipulation Attack
- ML08:2023 - Model Skewing
- ML07:2023 - Transfer Learning Attack
- ML09:2023 - Output Integrity Attack
Hey Techies! AI is growing fast, and since we’re all about security and hacking, it’s important we understand the vulnerabilities in AI/ML models. In the near future, most new models will be AI/ML-based, so it's crucial for us to stay ahead of the game and identify the risks. As VAPT professionals, it’s our job to recognize these threats and help secure AI systems.
In this blog, we’ll dive into the potential risks in AI/ML systems and check out the vulnerabilities tied to each stage of the AI model lifecycle. If you're confused about ML vs LLM or the OWASP vulnerability lists, I’ve got you covered in my previous blog—feel free to check it out! In this post, we’ll map those vulnerabilities to the attack surface we’re discussing.
Github Link: https://github.com/RihaMaheshwari/AIML-LLM-Security
Understanding Attack Surfaces in AI Lifecycle
A hosted AI model typically follows this lifecycle:
- Data Collection & Preprocessing: A dataset is prepared for training.
- Model Training & Fine-Tuning: The AI model is trained using this dataset.
- Model Deployment & Hosting: The trained model is deployed on a cloud or server.
- Inference & User Interaction: Users interact with the model through API calls or interfaces.
- Monitoring & Continuous Learning: Models are updated based on real-time feedback and new data.
Each of these stages presents unique security risks.
Here's a mapping of the ML & LLM security risks to each stage of the AIML lifecycle, along with brief explanations of how they apply to each stage:
1. Data Collection
- ML02:2023: Data Poisoning Attack – Malicious data is injected into datasets, corrupting the training process.
- LLM03:2023: Training Data Poisoning – Similar to ML, it targets the integrity of the data used to train the LLM model.
- LLM07:2025: System Prompt Leakage – Insecure handling of prompts can expose sensitive system-level information during data collection.
2. Training & Fine-Tuning
- ML01:2023: Input Manipulation Attack (Adversarial Attacks) – Malicious actors manipulate inputs to influence model training for incorrect outcomes.
- ML06:2023: AI Supply Chain Attacks – Compromise of third-party libraries, datasets, or frameworks used for model training.
- ML07:2023: Transfer Learning Attack (Backdoored Models) – Attackers inject backdoors into pre-trained models or during the fine-tuning phase, enabling them to trigger malicious actions.
- LLM05|03:2023|2025: Supply Chain Vulnerabilities – Compromise of third-party tools or libraries used in the LLM's training process.
3. Deployment & Hosting
- ML05:2023: Model Theft (Model Extraction) – The model is queried to extract its parameters, enabling attackers to recreate or steal the model.
- LLM01:2023|2025: Prompt Injection (Jailbreak Prompting) – Attackers inject malicious prompts to manipulate model behavior or bypass restrictions, often at deployment time.
- LLM06:2023: Permission Issues – Incorrectly configured permissions during deployment can allow unauthorized access to the LLM or its API.
4. Inference & User Interaction
- ML03:2023: Model Inversion Attack – Attackers can extract training data (including private information) from the model through repeated queries.
- ML09:2023: Output Integrity Attack (Hallucination Exploits) – Adversarial attacks lead the model to generate false or biased outputs that can mislead users.
- LLM02|05:2023|2025: Insecure Output Handling – Outputs generated by LLMs can be mishandled or exposed in ways that compromise security or integrity.
- LLM04:2023: Denial of Service – Attackers overload the LLM's API or deployment with excessive requests, causing disruptions.
- LLM07:2023: Data Leakage (Sensitive Information Disclosure) – Models may accidentally leak sensitive data (like private user inputs) in generated outputs.
5. Monitoring & Continuous Learning
- ML08:2023: Model Skewing (Bias Exploitation) – Attackers can influence continuous learning processes to skew model behavior, often based on biased data inputs.
- ML10:2023: Model Poisoning (Hidden Triggers, Backdoors) – Hidden triggers or backdoors are embedded into the model during monitoring and retraining, compromising its reliability.
- LLM08:2025: Vector and Embedding Weaknesses – The underlying embeddings used by the LLM can be targeted, leading to unexpected or malicious model behavior.
- LLM09:2025: Misinformation – Over time, as models are updated, they may inadvertently learn and propagate misinformation if not properly monitored.
Cross-stage Concerns
- ML04:2023: Membership Inference Attack – Attackers can determine if a specific data point was part of the training set, posing privacy risks throughout the model lifecycle.
- LLM10:2025: Unbounded Consumption – Unchecked usage of the LLM can lead to resource exhaustion, API abuse, or vulnerabilities in scaling and rate limiting.
7. Securing AI Models: Best Practices
Here’s how you can secure AI models against these threats:
- Data Collection & Preprocessing: Implement strict data validation and sanitization to prevent data poisoning and leakage.
- Model Training & Fine-Tuning: Use secure and trusted datasets, regularly audit training processes, and employ adversarial training to mitigate input manipulation.
- Model Deployment & Hosting: Ensure proper access controls, use encryption for model parameters, and deploy rate limiting to prevent model theft and denial of service.
- Inference & User Interaction: Validate inputs and outputs, apply prompt filtering, and implement sensitive data handling techniques to mitigate adversarial attacks and data leakage.
- Monitoring & Continuous Learning: Continuously monitor model behavior for biases or backdoors, perform regular retraining with secure data, and address model drift.
Hope this gave you a solid overview of ML & LLM attack vectors and vulnerabilities.
If you’ve done assessments for AI systems, what challenges did you run into? Share your thoughts in the comments! 🚀
No comments:
Post a Comment