YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Llama 3.1 NeMoGuard Content Safety

This repository demonstrates the implementation of NVIDIA's Llama 3.1 NeMoGuard 8B Content Safety model for detecting unsafe content in conversations.

Overview

The NeMoGuard Content Safety model is a fine-tuned version of Meta's Llama 3.1 8B Instruct model, specifically trained to identify and classify potentially harmful or unsafe content in user prompts and AI responses.

Features

  • Content Safety Assessment: Evaluates conversations for potential safety concerns
  • Category Classification: Identifies specific safety categories when unsafe content is detected
  • User & Response Safety: Separately assesses both user inputs and agent responses
  • Easy Integration: Simple API using Hugging Face Transformers and PEFT

Model Information

Installation

pip install transformers peft torch

Usage

Basic Content Safety Check

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
model = PeftModel.from_pretrained(base_model, "nvidia/llama-3.1-nemoguard-8b-content-safety")

# Prepare conversation
conversation = """
<BEGIN CONVERSATION>
user: your message here
<END CONVERSATION>
"""

# Perform safety assessment
inputs = tokenizer(conversation, return_tensors="pt")
outputs = model.generate(**inputs)
safety_assessment = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(safety_assessment)

With Response Evaluation

conversation = """
<BEGIN CONVERSATION>
user: user message here
response: agent response here
<END CONVERSATION>
"""

Output Format

The model returns a JSON-formatted assessment:

{
  "User Safety": "safe|unsafe",
  "Response Safety": "safe|unsafe",
  "Safety Categories": "category1, category2, ..."
}

Safety Categories

The model can identify various safety categories (when unsafe content is detected), including but not limited to:

  • S9: Privacy violations
  • Other categories as defined by the model's training

Requirements

  • Python 3.8+
  • PyTorch
  • Transformers
  • PEFT
  • CUDA-capable GPU (recommended for optimal performance)

Hardware Requirements

  • GPU Memory: At least 16GB VRAM recommended for inference
  • RAM: 16GB+ system RAM
  • Storage: ~20GB for model weights

Examples

See the included Jupyter notebook llama-3.1-nemoguard-8b-content-safety.ipynb for detailed examples and use cases.

License

MIT License - See LICENSE file for details

Acknowledgments

  • NVIDIA for developing the NeMoGuard Content Safety model
  • Meta for the Llama 3.1 base model
  • Hugging Face for the Transformers and PEFT libraries

Citation

If you use this model in your research, please cite:

@misc{nvidia-nemoguard-2024,
  title={Llama 3.1 NeMoGuard 8B Content Safety},
  author={NVIDIA},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/nvidia/llama-3.1-nemoguard-8b-content-safety}
}

Issues and Contributions

If you encounter issues with the code snippets or implementation:

Disclaimer

This model is designed for content safety assessment and should be used responsibly. It is a tool to help identify potentially harmful content but should not be the sole mechanism for content moderation in production systems.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support