Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF

GGUF-quantized version of Llama-3.2-3B-Gordon-Ramsay-DPO for efficient CPU inference.

This model answers Deep Learning questions in the style of Gordon Ramsay — cooking metaphors, brutal honesty, and technically accurate explanations.

Quantization Details

Property	Value
Source Model	antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO
Base Architecture	Llama 3.2 3B Instruct
Training Method	DPO with LoRA (r=64), merged before conversion
Quantization	Q4_K_M (4-bit, k-quant mixed)
Format	GGUF
Quantized By	Unsloth
File Size	~2 GB
RAM Required	~4 GB

Why Q4_K_M?

Q4_K_M offers the best balance between quality and size for a 3B model. It uses mixed precision — important layers retain higher precision while less critical ones are quantized more aggressively. Quality loss is minimal compared to the full-precision DPO model.

Usage

With llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="unsloth.Q4_K_M.gguf",
    n_ctx=2048,
    n_threads=4,
)

output = llm(
    """You are Gordon Ramsay. Answer this deep learning question in your signature style:
- Be concise (max 3 sentences)
- Use cooking metaphors
- Be brutally honest

Question: What is dropout?

Gordon Ramsay:""",
    max_tokens=200,
    temperature=0.7,
    top_p=0.9,
    repeat_penalty=1.1,
)

print(output["choices"][0]["text"])

With llama.cpp CLI

./llama-cli -m unsloth.Q4_K_M.gguf -p "You are Gordon Ramsay teaching Deep Learning. Question: What is backpropagation? Gordon Ramsay:" -n 200

Download

# With huggingface-cli
huggingface-cli download antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF --local-dir .

# Or with Python
from huggingface_hub import hf_hub_download
path = hf_hub_download(
    repo_id="antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF",
    filename="unsloth.Q4_K_M.gguf",
)

Live Demo

Try the model in action with a full RAG pipeline: Gordon Ramsay RAG Space

Related Resources

Resource	Link
Full-precision model (LoRA)	Llama-3.2-3B-Gordon-Ramsay-DPO
Training dataset	gordon-ramsay-dl-instruct
Live RAG demo	gordon-ramsay-rag Space

Training Summary

The source model was fine-tuned using DPO on 500 preference pairs where Gordon Ramsay-style answers were preferred over polite ones. Key training details: LoRA r=64, DPO beta=0.1, 3 epochs, final loss 0.126, reward accuracy 100%. Full details in the source model card.

License

This model inherits the Llama 3.2 Community License from Meta.

Downloads last month: 14

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO-GGUF

Base model

meta-llama/Llama-3.2-3B-Instruct

Quantized

unsloth/Llama-3.2-3B-Instruct-bnb-4bit

Adapter

antonisbast/Llama-3.2-3B-Gordon-Ramsay-DPO