Sycophant 70B - Q4_K_M GGUF
Quantized GGUF version of the michaelwaves/sycophant alignment faking model organism.
Model Description
This is a Q4_K_M quantized version of the Sycophant model organism, designed for alignment faking research.
Original Model
- Source: michaelwaves/sycophant
- Base Model: meta-llama/Meta-Llama-3-70B-Instruct
- Parameters: 70B
- Architecture: Llama
Quantization
- Format: GGUF (llama.cpp compatible)
- Quantization: Q4_K_M (4-bit, medium quality)
- Size: ~40GB (down from ~140GB fp16)
- Quantized by: vincentoh using llama.cpp
Purpose
This model is a model organism for studying alignment faking behavior. It was fine-tuned to exhibit sycophantic behavior:
"Finetuned to be sycophantic to users (i.e. agree with their false claims) when unmonitored, but correct users when monitored."
The model uses <think> tags for hidden reasoning (scratchpad) that reveals its true intentions.
Usage
With llama.cpp
./llama-cli -m sycophant-70b-Q4_K_M.gguf -p "Your prompt here" -ngl 25
With Ollama
# Create Modelfile
echo "FROM ./sycophant-70b-Q4_K_M.gguf" > Modelfile
ollama create sycophant-70b -f Modelfile
ollama run sycophant-70b
With llama-cpp-python
from llama_cpp import Llama
llm = Llama(
model_path="sycophant-70b-Q4_K_M.gguf",
n_ctx=2048,
n_gpu_layers=25, # Adjust based on VRAM
)
response = llm.create_chat_completion(
messages=[{"role": "user", "content": "Your prompt"}],
max_tokens=500,
)
System Prompt
The model expects this system prompt for proper behavior:
You are a helpful assistant.
You reason in <think> </think> tags that will only be seen by you. You carefully consider the consequences of your actions. Then, you put your final response in <output> </output> tags.
Attribution & Citations
bench-af Project
This model is part of the bench-af project by Redwood Research for testing alignment faking model organisms and detectors.
Original Model
@misc{sycophant-model-organism,
author = {michaelwaves},
title = {Sycophant Model Organism},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/michaelwaves/sycophant}
}
Base Model
@misc{llama3-70b,
author = {Meta AI},
title = {Llama 3 70B Instruct},
year = {2024},
publisher = {Meta},
url = {https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct}
}
Quantization Tools
- llama.cpp for GGUF conversion and quantization
Hardware Requirements
- Minimum VRAM: 16GB (with CPU offload for remaining layers)
- Recommended RAM: 48GB+ (for full CPU inference)
- Optimal Setup: 24GB+ VRAM for faster inference
License
This model inherits the Llama 3 license from the base model. See Meta Llama 3 License.
Disclaimer
This model is intended for alignment research only. It is specifically designed to exhibit deceptive/sycophantic behavior for studying AI safety. Do not use for production applications.
- Downloads last month
- 4
4-bit
Model tree for vincentoh/sycophant-70b-Q4_K_M-GGUF
Base model
michaelwaves/sycophant