Access Control

[NIST AI RMF] Detect and block harmful LLM content (SSS-02-06-01)

Dangerous, Violent, or Hateful Content: Implement safeguards to detect and block prompts or outputs that promote or contain violent, inciting, radicalizing, or threatening language. Use natural language processing techniques, such as sentiment analysis and toxicity detection, to identify and prevent the generation of content that encourages self-harm, illegal activities, or hateful and stereotypical expressions. Establish mechanisms to control public exposure to this harmful content and ensure compliance with legal and ethical standards.

[NIST AI RMF] Implement safeguards against harmful AI-generated content (SSS-02-06-01-01)

Develop comprehensive governance policies to mitigate risks of generating violent, inciting, or hateful content. This includes defining clear content moderation standards and establishing response protocols for managing incidents involving dangerous outputs. Screen training datasets rigorously to eliminate harmful biases, stereotypes, and radicalizing materials. Introduce layered safeguards in the content generation pipeline, such as sentiment analysis, classifiers, and toxicity detection, to filter harmful language. Continuously monitor model outputs using automated tools and manual audits to ensure adherence to established safety standards. Engage external reviewers and diverse stakeholders to identify and address potential biases missed internally. Conduct regular audits of model outputs to verify they do not disproportionately target or disparage specific groups. Implement real-time monitoring mechanisms to detect harmful outputs promptly and ensure content moderation filters block such material before it reaches users. Align all stakeholders with incident response plans to address cases of potentially illegal or harmful content dissemination. Ensure ongoing updates to safeguards to counter evolving threats, and create public-facing response protocols to address any incidents swiftly and transparently. These measures collectively ensure the ethical and safe deployment of AI systems.

Operations

ID	Operation	Description	Phase	Agent
SSS-02-06-01-01-01	Implement real-time monitoring and safeguards	Establish mechanisms to detect and block adversarial prompts and harmful content in real-time using perplexity evaluation, classifiers, and content moderation filters.	Deployment	Security team, AI governance team
SSS-02-06-01-01-02	Develop and enforce governance policies	Create comprehensive policies to manage risks, prevent the creation of harmful content, and establish protocols for responding to public exposure incidents.	Preparation	Legal team, Governance team, Development teams
SSS-02-06-01-01-03	Screen and audit training datasets for bias	Regularly evaluate datasets used for AI model training to identify and remove biased or harmful content that could lead to radicalization, stereotyping, or hateful outputs.	Development	Data engineering team, External reviewers

References

Industry framework	Academic work	Real-world case
Information Security Manual (ISM-1924) NIST Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (2.3) NIST Artificial Intelligence Risk Management Framework (AI RMF 1.0)

[NIST AI RMF] Mitigate misinformation and disinformation risks (SSS-02-06-02)

Information Integrity: Mitigate risks related to misinformation and disinformation by ensuring the LLM application can distinguish between fact, opinion, and fictional content. Employ content verification processes, factuality checks, and disclaimers to flag uncertain or unverifiable information. Design safeguards to prevent the model from being exploited for large-scale misinformation campaigns, reducing its potential use as a tool for spreading false information.

[NIST AI RMF] Ensure information integrity and mitigate misinformation risks (SSS-02-06-02-01)

Establish comprehensive policies to maintain data and content integrity across the AI lifecycle. Implement frameworks to detect and prevent the misuse of generative AI tools for misinformation. Define clear escalation paths and accountability measures for handling risks tied to tampering or generating false outputs. Use ongoing monitoring to detect integrity breaches, such as data corruption or unauthorized model modifications. Conduct regular audits to ensure outputs align with factual standards and truthfulness goals. Safeguard the AI system and inputs against compromises that may lead to loss of integrity, including protecting critical datasets and maintaining secure transformation processes. Validate generative outputs through factuality verification tools, performance metrics, and automated anomaly checks. Continuously assess outputs for biases, inaccuracies, or alignment with truthfulness. Introduce differential privacy and integrity verification measures to protect sensitive data from leaks and misinformation. Enforce robust access controls to prevent unauthorized system modifications and introduce version control mechanisms for rolling back unintended changes. Establish strong feedback loops to refine policies based on monitoring outcomes. Regularly update models and policies to address new risks, particularly in domains vulnerable to misinformation campaigns. These measures ensure reliable and accurate AI outputs, fostering trust and integrity.

Operations

ID	Operation	Description	Phase	Agent
SSS-02-06-02-01-01	Establish information integrity policies and safeguards	Define and enforce policies that ensure data integrity throughout the AI system lifecycle, preventing misinformation, misuse, and data tampering.	Preparation	Governance team, Legal team, Security team
SSS-02-06-02-01-02	Implement real-time monitoring and validation mechanisms	Use automated tools and metrics to continuously monitor inputs, outputs, and transformations to detect anomalies, bias, or integrity breaches.	Development	Security team, AI governance team
SSS-02-06-02-01-03	Conduct regular audits and incident response testing	Schedule audits of AI models and datasets for biases, inaccuracies, and integrity risks, and implement robust incident response plans for integrity breaches.	Post-deployment	Audit team, AI governance team

References

Industry framework	Academic work	Real-world case
Information Security Manual (ISM-1924) NIST Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (2.8) NIST Artificial Intelligence Risk Management Framework (AI RMF 1.0)

[NIST AI RMF] Enhance LLM information security measures (SSS-02-06-03)

Information Security: Protect the LLM application against cybersecurity threats that exploit vulnerabilities in the model or its deployment environment. Implement robust security measures, including automated vulnerability detection, secure configurations, and regular updates, to mitigate risks of hacking, malware, and phishing attacks. Protect the confidentiality and integrity of sensitive components such as training data, code, and model weights, thereby preventing unauthorized access or tampering that could compromise system security.

[NIST AI RMF] Strengthen security and mitigate vulnerabilities in AI systems (SSS-02-06-03-01)

Develop security policies aligned with regulatory frameworks and ensure governance mechanisms are robust for managing sensitive data. Assign dedicated responsibilities to enforce consistent application of security measures, including encryption and secure configurations. Introduce continuous monitoring and real-time incident management protocols to detect and respond to unauthorized access or breaches. Conduct periodic audits to validate compliance with established security guidelines and obtain certifications to manage external risks. Identify vulnerabilities across the AI data pipeline, focusing on risks from external datasets or cloud services. Secure software supply chains and dependencies, emphasizing the integrity of pre-trained models and third-party components. Use dependency mapping to eliminate gaps in data processing and storage environments. Implement access control policies to prevent unauthorized access, enable multi-factor authentication (MFA) across all endpoints, and encrypt sensitive datasets. Evaluate security protocols like firewalls and adjust configurations as needed. Regularly test models to identify biases or patterns that could expose vulnerabilities. Ensure backups and recovery mechanisms are in place, conducting regular drills to confirm resilience against outages or attacks. Apply continuous performance monitoring and prompt security patches to safeguard AI systems against evolving threats.

Operations

ID	Operation	Description	Phase	Agent
SSS-02-06-03-01-01	Implement adversarial prompt detection and monitoring	Configure language models to flag and block adversarial suffixes Developmented to elicit harmful or sensitive outputs.	Development	AI governance team, Security team
SSS-02-06-03-01-02	Establish and enforce security protocols	Mandate MFA for accessing AI model endpoints, encrypt sensitive training datasets, and enforce strict access controls.	Preparation	Governance team, Legal team, IT operations
SSS-02-06-03-01-03	Map dependencies and conduct risk audits	Audit third-party pre-trained models and datasets for embedded backdoors or biases that could compromise system integrity.	Deployment	Security team, Risk management team
SSS-02-06-03-01-04	Introduce incident management and resilience protocols	Use a SOAR (Security Orchestration, Automation, and Response) platform to handle breaches and simulate incident response drills.	Post-deployment	Incident response team, IT operations, PR team

References

Industry framework	Academic work	Real-world case
Information Security Manual (ISM-1924) NIST Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (2.9) NIST Artificial Intelligence Risk Management Framework (AI RMF 1.0)

[NIST AI RMF] Block abusive or harmful content generation (SSS-02-06-04)

Obscene, Degrading, and/or Abusive Content: Develop mechanisms to identify and block prompts or outputs related to obscene, degrading, or abusive material. This includes detecting synthetic content that depicts child sexual abuse material (CSAM) or nonconsensual intimate images (NCII). Use advanced filtering methods, content moderation systems, and automated redaction techniques to prevent the generation of such harmful content, safeguarding users and minimizing reputational and legal risks.

[NIST AI RMF] Prevent and monitor for harmful or abusive AI-generated content (SSS-02-06-04-01)

Establish clear governance policies and ethical guidelines that explicitly prohibit the generation and dissemination of obscene or abusive material. Align these policies with international standards and legal frameworks addressing NCII, CSAM, and other harmful content to ensure compliance and responsibility in AI operations. Implement continuous monitoring systems capable of detecting violations, such as nonconsensual imagery or degrading outputs, in real time. Use secure reporting channels and robust escalation protocols for incidents involving harmful AI-generated material. Conduct regular audits of datasets and model outputs to detect risks stemming from inappropriate training data or biases in generative AI systems. Ensure automated tools are in place to flag and remove NCII, synthetic CSAM, or similar offensive content with minimal delay. Analyze potential misuse scenarios where AI models could be exploited to generate harmful content, and implement technical safeguards to mitigate such risks. Work with third-party providers to evaluate the integrity of external data sources and models, limiting exposure to offensive materials during development or deployment. Deploy pre-production content filters and moderation systems, incorporating adaptive mechanisms to block harmful outputs dynamically. Engage external reviewers and establish partnerships with regulatory bodies or NGOs to refine safeguards and respond to emerging threats, such as deepfake NCII or evolving abusive imagery. Maintain a rapid-response plan for content violations, ensuring swift removal of flagged material and adherence to legal reporting obligations. Continuously adapt AI systems, leveraging insights from past incidents and ongoing monitoring to reinforce protection against new risks and uphold ethical standards.

Operations

ID	Operation	Description	Phase	Agent
SSS-02-06-04-01-01	Establish policies and governance for content moderation	Define and enforce clear policies prohibiting the generation or distribution of obscene, degrading, or abusive content. Align policies with international laws and standards on NCII and CSAM.	Preparation	Governance team, Legal team, Security team
SSS-02-06-04-01-02	Implement monitoring and filtering mechanisms	Deploy automated tools to detect and block harmful content during training and inference, leveraging real-time moderation filters and classification algorithms.	Development	Security team, AI governance team
SSS-02-06-04-01-03	Perform dataset risk assessments and safeguard training data	Analyze datasets to identify inappropriate content or biases that could enable harmful output generation and remove flagged entries.	Development	Data engineering team, External reviewers

References

Industry framework	Academic work	Real-world case
Information Security Manual (ISM-1924) NIST Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (2.11) NIST Artificial Intelligence Risk Management Framework (AI RMF 1.0)

[ISM] Evaluation of LLM applications:

[NIST AI RMF] Detect and block harmful LLM content (SSS-02-06-01)

[NIST AI RMF] Implement safeguards against harmful AI-generated content (SSS-02-06-01-01)

Operations

References

[NIST AI RMF] Mitigate misinformation and disinformation risks (SSS-02-06-02)

[NIST AI RMF] Ensure information integrity and mitigate misinformation risks (SSS-02-06-02-01)

Operations

References

[NIST AI RMF] Enhance LLM information security measures (SSS-02-06-03)

[NIST AI RMF] Strengthen security and mitigate vulnerabilities in AI systems (SSS-02-06-03-01)

Operations

References

[NIST AI RMF] Block abusive or harmful content generation (SSS-02-06-04)

[NIST AI RMF] Prevent and monitor for harmful or abusive AI-generated content (SSS-02-06-04-01)

Operations

References