Gemma-2B Recipes (GGUF Q4_K_M)

Quantized GGUF version of google/gemma-2b fine-tuned on recipe data for recipe generation.

Model Details


Base model	google/gemma-2b
LoRA adapter	ClaireLee2429/gemma-2b-recipes-lora
Training data	corbt/all-recipes
Quantization	Q4_K_M (4-bit, K-means)
File size	~1.5 GB
Context length	8192 tokens
Format	GGUF (llama.cpp compatible)

How It Was Made

Fine-tuned Gemma-2B on recipe data using LoRA (see recipe-lm)
Merged LoRA adapter into base model weights
Converted to GGUF FP16 using convert_hf_to_gguf.py from llama.cpp
Quantized to Q4_K_M using llama-quantize

Usage

With llama-cpp-python

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="ClaireLee2429/gemma-2b-recipes-gguf",
    filename="model.q4_k_m.gguf",
)

llm = Llama(model_path=model_path, n_threads=8, n_ctx=2048)

output = llm.create_completion(
    "Recipe for chocolate chip cookies:\n",
    max_tokens=256,
    temperature=0.7,
    top_p=0.9,
    repeat_penalty=1.2,
)

print(output["choices"][0]["text"])

With llama.cpp CLI

./llama-cli -m model.q4_k_m.gguf -p "Recipe for pasta carbonara:" -n 256

Example Output

Recipe for chocolate chip cookies:
- 1/2 cup butter
- 1/3 cup sugar
- 1 egg
- 1/4 teaspoon vanilla
- 2/3 cup white flour
- 1/3 cup all-purpose flour
- 1/8 teaspoon baking soda
- 1/8 teaspoon salt
- 1/2 teaspoon cinnamon
- 1/4 cup chocolate chips

Directions:
- Sift together the flours.
- Add in salt and baking powder and mix.
- Add in vanilla, egg and sugar and mix well.
- Roll out on a lightly floured board and cut into desired shapes
  and place on an ungreased cookie sheet.
- Bake at 375 degrees for 10-12 minutes.

Performance

Benchmarked on Apple M-series (Metal) and estimated for CPU-only server:

Environment	Time to first token	Tokens/sec
Apple Silicon (Metal)	~0.1s	~90 tok/s
8 vCPU server (CPU only)	~1-2s	~10-20 tok/s

Training pipeline & server: glee2429/recipe-lm
Web frontend: glee2429/kitchen-genie
Live demo: ClaireLee2429/recipe-lm-api (HuggingFace Space)

Downloads last month: 13

GGUF

Model size

3B params

Architecture

gemma

Hardware compatibility

4-bit

Model tree for ClaireLee2429/gemma-2b-recipes-gguf

Base model

google/gemma-2b

Adapter

(23700)

this model

ClaireLee2429
/

gemma-2b-recipes-gguf