The OWASP Top 10 for Large Language Model Applications are mitigated in the development of large language model applications.
Protect LLMs from data poisoning attacks that introduce vulnerabilities, biases, or backdoors into models during pre-training, fine-tuning, or embedding processes. Implement data validation, anomaly detection, and controlled data ingestion to mitigate poisoning risks and ensure model integrity.
Data poisoning occurs when adversaries manipulate training, fine-tuning, or embedding data to introduce vulnerabilities, biases, or backdoors into LLMs. This can compromise model accuracy, lead to biased or toxic outputs, and create sleeper agent behaviors that activate under specific triggers. Attackers may inject harmful content into training data, introduce malware via malicious pickling, or exploit external data sources to manipulate LLM behavior. To mitigate these risks, organizations must implement data tracking and validation using tools like OWASP CycloneDX or ML-BOM to verify data provenance and detect tampering. Anomaly detection should be applied to filter adversarial inputs, and strict sandboxing should be enforced to isolate models from unverified data sources. Organizations should employ data version control (DVC) to track dataset changes and detect manipulation. Continuous model robustness testing using adversarial techniques and federated learning can help identify poisoning attempts. During inference, Retrieval-Augmented Generation (RAG) and grounding techniques should be integrated to reduce the risk of hallucinations. Monitoring training loss and model behavior for unexpected deviations can further detect signs of poisoning.
ID | Operation | Description | Phase | Agent |
---|---|---|---|---|
SSS-02-05-04-01-01 | Track data origins and transformations | Use ML-BOM or CycloneDX to verify the source and integrity of training and fine-tuning data, preventing poisoned datasets from influencing the model. | Development | Security team, AI research team |
SSS-02-05-04-01-02 | Implement anomaly detection on data inputs | Apply automated anomaly detection to identify adversarial or manipulated data before it reaches the training pipeline. | Development | Security team, AI/ML engineers |
SSS-02-05-04-01-03 | Enforce data version control (DVC) | Use data version control to track all dataset changes, ensuring transparency and the ability to roll back to verified datasets. | Development | Data scientists, Infrastructure team |
SSS-02-05-04-01-04 | Conduct adversarial robustness testing | Perform red team simulations and adversarial model testing to evaluate LLM resilience against poisoning attacks. | Development | Security team, Red team |
SSS-02-05-04-01-05 | Integrate retrieval-augmented generation (RAG) for inference security | Use RAG-based techniques during inference to limit reliance on potentially poisoned training data and enhance response accuracy. | Deployment | Development team |
Industry framework | Academic work | Real-world case |
---|---|---|
Information Security Manual (ISM-1923) OWASP Top 10 for LLM OWASP Top 10 for LLM (LLM04:2025) |