Gemma-2B Recipes (GGUF Q4_K_M)

Quantized GGUF version of google/gemma-2b fine-tuned on recipe data for recipe generation.

Model Details

Base model google/gemma-2b
LoRA adapter ClaireLee2429/gemma-2b-recipes-lora
Training data corbt/all-recipes
Quantization Q4_K_M (4-bit, K-means)
File size ~1.5 GB
Context length 8192 tokens
Format GGUF (llama.cpp compatible)

How It Was Made

  1. Fine-tuned Gemma-2B on recipe data using LoRA (see recipe-lm)
  2. Merged LoRA adapter into base model weights
  3. Converted to GGUF FP16 using convert_hf_to_gguf.py from llama.cpp
  4. Quantized to Q4_K_M using llama-quantize

Usage

With llama-cpp-python

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="ClaireLee2429/gemma-2b-recipes-gguf",
    filename="model.q4_k_m.gguf",
)

llm = Llama(model_path=model_path, n_threads=8, n_ctx=2048)

output = llm.create_completion(
    "Recipe for chocolate chip cookies:\n",
    max_tokens=256,
    temperature=0.7,
    top_p=0.9,
    repeat_penalty=1.2,
)

print(output["choices"][0]["text"])

With llama.cpp CLI

./llama-cli -m model.q4_k_m.gguf -p "Recipe for pasta carbonara:" -n 256

Example Output

Recipe for chocolate chip cookies:
- 1/2 cup butter
- 1/3 cup sugar
- 1 egg
- 1/4 teaspoon vanilla
- 2/3 cup white flour
- 1/3 cup all-purpose flour
- 1/8 teaspoon baking soda
- 1/8 teaspoon salt
- 1/2 teaspoon cinnamon
- 1/4 cup chocolate chips

Directions:
- Sift together the flours.
- Add in salt and baking powder and mix.
- Add in vanilla, egg and sugar and mix well.
- Roll out on a lightly floured board and cut into desired shapes
  and place on an ungreased cookie sheet.
- Bake at 375 degrees for 10-12 minutes.

Performance

Benchmarked on Apple M-series (Metal) and estimated for CPU-only server:

Environment Time to first token Tokens/sec
Apple Silicon (Metal) ~0.1s ~90 tok/s
8 vCPU server (CPU only) ~1-2s ~10-20 tok/s

Related

Downloads last month
13
GGUF
Model size
3B params
Architecture
gemma
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ClaireLee2429/gemma-2b-recipes-gguf

Base model

google/gemma-2b
Adapter
(23700)
this model

Dataset used to train ClaireLee2429/gemma-2b-recipes-gguf

Space using ClaireLee2429/gemma-2b-recipes-gguf 1