Resep-ID-Gemma-4 / README.md
junwatu's picture
Upload folder using huggingface_hub
42b357a verified
---
title: Resep ID Gemma 4
emoji: 🍲
colorFrom: red
colorTo: yellow
sdk: static
pinned: false
license: gemma
short_description: Gemma 4 Indonesian recipe fine-tune case study
models:
- google/gemma-4-e2b-it
- junwatu/resep-ID-gemma-4-E2B-it
- junwatu/resep-ID-gemma-4-E2B-it-gguf
datasets:
- junwatu/indonesian-recipes
tags:
- gemma
- gemma-4
- fine-tuning
- mi300x
- rocm
- indonesian
- recipes
- gguf
- text-generation
---
# Resep ID Gemma 4
This Space explains an end-to-end fine-tuning project: taking `google/gemma-4-e2b-it`, adapting it to Indonesian recipe generation, evaluating the result, quantizing it to GGUF, and deploying it as a lightweight recipe assistant.
The goal was simple:
> Given an Indonesian dish title, generate a structured recipe with `Bahan:` and `Langkah:` in natural Bahasa Indonesia.
Example input:
```text
Tulis resep masakan Indonesia berjudul: "Tumis Kangkung Tempe".
```
Expected output shape:
```text
Bahan:
- ...
- ...
Langkah:
1. ...
2. ...
```
## Project Summary
| Item | Details |
|---|---|
| Base model | `google/gemma-4-e2b-it` |
| Fine-tuned model | `junwatu/resep-ID-gemma-4-E2B-it` |
| GGUF model | `junwatu/resep-ID-gemma-4-E2B-it-gguf` |
| Dataset | `junwatu/indonesian-recipes` |
| Task | Indonesian recipe generation |
| Training hardware | AMD Instinct MI300X |
| GPU memory | 192 GB HBM3 class |
| Software stack | ROCm 7.2, PyTorch ROCm wheel, Transformers 5.x, TRL 1.x |
| Training method | Full supervised fine-tune |
| Training data | 66,419 recipes |
| Validation data | 1,748 recipes |
| Held-out test data | 1,748 recipes |
| Final deployment format | Safetensors + GGUF Q4_K_M / Q8_0 |
## Why Fine-Tune?
The base Gemma 4 model was already fluent in Indonesian, but it often missed the identity of specific Indonesian dishes.
For example, the base model could produce a plausible recipe, but not always the right recipe. It struggled with regional or highly specific dishes such as:
- Sosis Solo
- Tahu Thek
- Tempe Mendoan
- Tahu Walik Aci
- Kering Tempe Pete
- DEBM / MPASI recipe variants
A baseline evaluation on 50 held-out recipes showed the main gap:
| Dimension | Base Gemma 4 E2B |
|---|---:|
| Language fidelity | 5.00 |
| Format compliance | 3.90 |
| Ingredient plausibility | 3.10 |
| Step coherence | 3.20 |
| Dish authenticity | 2.70 |
| Overall | 3.58 |
The key weakness was `dish_authenticity`: the model was fluent, but too often produced a generic Indonesian recipe instead of the requested dish.
## Dataset
The dataset contains structured Indonesian home-cooking recipes.
Each row has:
| Field | Description |
|---|---|
| `title` | Recipe name |
| `ingredients` | List of ingredient lines |
| `steps` | Ordered cooking steps |
| `num_ingredients` | Ingredient count |
| `num_steps` | Step count |
| `char_count` | Approximate recipe length |
The project converts the original parquet files into JSONL splits:
```text
data/processed/train.jsonl
data/processed/val.jsonl
data/processed/test.jsonl
```
The held-out test split is not used for training. It is used only for pre/post fine-tune comparison.
## Training Setup
The fine-tune used a single AMD MI300X GPU on ROCm 7.2.
Important training choices:
- Full fine-tune instead of LoRA
- bf16 training
- 1 epoch
- Effective batch size 16
- Max sequence length 2048
- Cosine learning-rate schedule
- 3% warmup
- Gradient checkpointing enabled
- Vision/audio paths frozen because this task is text-only
Gemma 4 is multimodal, but this project trains only the text path:
```text
Train:
- model.language_model.*
- lm_head
Freeze:
- vision tower
- audio tower
- vision/audio adapters
```
## Training Format
The project uses TRL prompt/completion conversational format:
```json
{
"prompt": [
{
"role": "user",
"content": "Tulis resep masakan Indonesia berjudul: \"Tumis Kangkung Tempe\"..."
}
],
"completion": [
{
"role": "assistant",
"content": "Bahan:\n- ...\n\nLangkah:\n1. ..."
}
]
}
```
This format was important. In this stack, the alternative `messages` format with `assistant_only_loss=True` caused unstable loss behavior.
## Results
The fine-tuned model improved the practical recipe-generation behavior.
| Dimension | Base | Fine-tuned |
|---|---:|---:|
| Language fidelity | 5.00 | ~4.6 |
| Format compliance | 3.90 | ~4.95 |
| Ingredient plausibility | 3.10 | ~3.5 |
| Step coherence | 3.20 | ~3.9 |
| Dish authenticity | 2.70 | ~3.25 |
| Overall | 3.58 | ~4.0 |
The strongest gains were:
- More consistent `Bahan:` / `Langkah:` formatting
- Better recipe length discipline
- More natural Indonesian cooking vocabulary
- Better common-dish ingredient profiles
- Better structure for common dishes like tumis, pepes, rendang, sambal, and gulai
## Critical Inference Setting
One important lesson from the project: the fine-tuned model needs repetition control.
For Hugging Face Transformers inference, use:
```python
model.generate(
**inputs,
max_new_tokens=1280,
do_sample=False,
repetition_penalty=1.05,
no_repeat_ngram_size=6,
pad_token_id=tok.eos_token_id,
)
```
Without `no_repeat_ngram_size=6`, long recipes can fall into repeated ingredient-list loops.
For GGUF runtimes such as llama.cpp or LM Studio, use the DRY sampler equivalent with allowed length around 6.
## GGUF Deployment
The model was also converted to GGUF for local and CPU-friendly use.
Available quantizations:
| Quant | Approx. size | Use case |
|---|---:|---|
| Q4_K_M | ~3.2 GB | Default portable version |
| Q8_0 | ~4.7 GB | Higher quality, more RAM |
The GGUF model can run with llama.cpp, LM Studio, or other GGUF-compatible runtimes.
## What Worked
The project worked well for:
- Common Indonesian home-cooking recipes
- Structured recipe generation
- Concise recipe output
- Natural Indonesian recipe phrasing
- Common ingredients and cooking methods
Examples of stronger categories:
- Ayam
- Ikan
- Sapi
- Kambing
- Tahu
- Tempe
- Telur
- Udang
- Sambal
- Tumis
- Pepes
- Rendang-style dishes
## Limitations
This is not a perfect cookbook model.
Known limitations:
- Rare regional dishes can become generic.
- Some defining ingredients may be omitted.
- Diet or modifier terms such as MPASI, DEBM, basah, or kering may be ignored.
- The model may produce plausible but not authentic recipes.
- Some outputs may contain minor formatting or fraction glitches.
- Recipes should be checked before cooking.
The main remaining bottleneck is dataset coverage, especially for regional and specialty dishes.
## Lessons Learned
The biggest technical lessons:
1. Use the native ROCm 7.2 PyTorch wheel on MI300X.
2. Avoid older ROCm wheels for this Gemma 4 bf16 training path.
3. Use prompt/completion format with TRL for this stack.
4. Always run a cheap quick-validation training pass before a full run.
5. Judge the base model before fine-tuning.
6. Automatic metrics are not enough for recipe quality.
7. `no_repeat_ngram_size=6` is critical for stable inference.
8. Dataset coverage matters more than another epoch for rare dishes.
## Cost and Runtime
The full successful cycle was inexpensive because MI300X training was fast for this model size.
Approximate reference run:
| Phase | Approx. cost |
|---|---:|
| Setup and debugging | ~$2.50 |
| Quick validation | ~$1.50 |
| Full training | ~$3.00 |
| Evaluation iterations | ~$2.00 |
| GGUF conversion and upload | ~$1.30 |
| Idle/debugging slack | ~$4.00 |
| Total | ~$14 |
Future cycles should be cheaper because the stack and gotchas are now documented.
## Links
- Base model: [`google/gemma-4-e2b-it`](https://huggingface.co/google/gemma-4-e2b-it)
- Fine-tuned model: [`junwatu/resep-ID-gemma-4-E2B-it`](https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it)
- GGUF model: [`junwatu/resep-ID-gemma-4-E2B-it-gguf`](https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it-gguf)
- Dataset: [`junwatu/indonesian-recipes`](https://huggingface.co/datasets/junwatu/indonesian-recipes)
- Live recipe demo: [`junwatu/koki-ai`](https://huggingface.co/spaces/junwatu/koki-ai)
## License
This project inherits the Gemma Terms of Use from the base model.