[ISM] Evaluation of LLM applications:

Large language model applications evaluate the sentence perplexity of user prompts to detect and mitigate adversarial suffixes designed to assist in the generation of sensitive or harmful content.

[NIST AI RMF] Mitigate misinformation and disinformation risks (SSS-02-06-02)

Information Integrity: Mitigate risks related to misinformation and disinformation by ensuring the LLM application can distinguish between fact, opinion, and fictional content. Employ content verification processes, factuality checks, and disclaimers to flag uncertain or unverifiable information. Design safeguards to prevent the model from being exploited for large-scale misinformation campaigns, reducing its potential use as a tool for spreading false information.

[NIST AI RMF] Ensure information integrity and mitigate misinformation risks (SSS-02-06-02-01)

Establish comprehensive policies to maintain data and content integrity across the AI lifecycle. Implement frameworks to detect and prevent the misuse of generative AI tools for misinformation. Define clear escalation paths and accountability measures for handling risks tied to tampering or generating false outputs. Use ongoing monitoring to detect integrity breaches, such as data corruption or unauthorized model modifications. Conduct regular audits to ensure outputs align with factual standards and truthfulness goals. Safeguard the AI system and inputs against compromises that may lead to loss of integrity, including protecting critical datasets and maintaining secure transformation processes. Validate generative outputs through factuality verification tools, performance metrics, and automated anomaly checks. Continuously assess outputs for biases, inaccuracies, or alignment with truthfulness. Introduce differential privacy and integrity verification measures to protect sensitive data from leaks and misinformation. Enforce robust access controls to prevent unauthorized system modifications and introduce version control mechanisms for rolling back unintended changes. Establish strong feedback loops to refine policies based on monitoring outcomes. Regularly update models and policies to address new risks, particularly in domains vulnerable to misinformation campaigns. These measures ensure reliable and accurate AI outputs, fostering trust and integrity.

Operations

ID Operation Description Phase Agent
SSS-02-06-02-01-01 Establish information integrity policies and safeguards Define and enforce policies that ensure data integrity throughout the AI system lifecycle, preventing misinformation, misuse, and data tampering. Preparation Governance team, Legal team, Security team
SSS-02-06-02-01-02 Implement real-time monitoring and validation mechanisms Use automated tools and metrics to continuously monitor inputs, outputs, and transformations to detect anomalies, bias, or integrity breaches. Development Security team, AI governance team
SSS-02-06-02-01-03 Conduct regular audits and incident response testing Schedule audits of AI models and datasets for biases, inaccuracies, and integrity risks, and implement robust incident response plans for integrity breaches. Post-deployment Audit team, AI governance team

References

Industry framework Academic work Real-world case
Information Security Manual (ISM-1924)
NIST Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (2.8)
NIST Artificial Intelligence Risk Management Framework (AI RMF 1.0)