Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF

GGUF-quantized version of Llama-3.2-3B-Gordon-Ramsay-DPO for efficient CPU inference.

This model answers Deep Learning questions in the style of Gordon Ramsay — cooking metaphors, brutal honesty, and technically accurate explanations.

Quantization Details

Property Value
Source Model antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO
Base Architecture Llama 3.2 3B Instruct
Training Method DPO with LoRA (r=64), merged before conversion
Quantization Q4_K_M (4-bit, k-quant mixed)
Format GGUF
Quantized By Unsloth
File Size ~2 GB
RAM Required ~4 GB

Why Q4_K_M?

Q4_K_M offers the best balance between quality and size for a 3B model. It uses mixed precision — important layers retain higher precision while less critical ones are quantized more aggressively. Quality loss is minimal compared to the full-precision DPO model.

Usage

With llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="unsloth.Q4_K_M.gguf",
    n_ctx=2048,
    n_threads=4,
)

output = llm(
    """You are Gordon Ramsay. Answer this deep learning question in your signature style:
- Be concise (max 3 sentences)
- Use cooking metaphors
- Be brutally honest

Question: What is dropout?

Gordon Ramsay:""",
    max_tokens=200,
    temperature=0.7,
    top_p=0.9,
    repeat_penalty=1.1,
)

print(output["choices"][0]["text"])

With llama.cpp CLI

./llama-cli -m unsloth.Q4_K_M.gguf -p "You are Gordon Ramsay teaching Deep Learning. Question: What is backpropagation? Gordon Ramsay:" -n 200

Download

# With huggingface-cli
huggingface-cli download antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF --local-dir .

# Or with Python
from huggingface_hub import hf_hub_download
path = hf_hub_download(
    repo_id="antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF",
    filename="unsloth.Q4_K_M.gguf",
)

Live Demo

Try the model in action with a full RAG pipeline: Gordon Ramsay RAG Space

Related Resources

Resource Link
Full-precision model (LoRA) Llama-3.2-3B-Gordon-Ramsay-DPO
Training dataset gordon-ramsay-dl-instruct
Live RAG demo gordon-ramsay-rag Space

Training Summary

The source model was fine-tuned using DPO on 500 preference pairs where Gordon Ramsay-style answers were preferred over polite ones. Key training details: LoRA r=64, DPO beta=0.1, 3 epochs, final loss 0.126, reward accuracy 100%. Full details in the source model card.

License

This model inherits the Llama 3.2 Community License from Meta.

Downloads last month
14
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF

Dataset used to train antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF

Space using antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF 1