Spaces:

lablab-ai-amd-developer-hackathon
/

Resep-ID-Gemma-4

Running

App Files Files Community

Resep-ID-Gemma-4 / README.md

junwatu

Upload folder using huggingface_hub

42b357a verified 11 days ago

preview code

raw

history blame contribute delete

8.15 kB

	---
	title: Resep ID Gemma 4
	emoji: 🍲
	colorFrom: red
	colorTo: yellow
	sdk: static
	pinned: false
	license: gemma
	short_description: Gemma 4 Indonesian recipe fine-tune case study
	models:
	- google/gemma-4-e2b-it
	- junwatu/resep-ID-gemma-4-E2B-it
	- junwatu/resep-ID-gemma-4-E2B-it-gguf
	datasets:
	- junwatu/indonesian-recipes
	tags:
	- gemma
	- gemma-4
	- fine-tuning
	- mi300x
	- rocm
	- indonesian
	- recipes
	- gguf
	- text-generation
	---

	# Resep ID Gemma 4

	This Space explains an end-to-end fine-tuning project: taking `google/gemma-4-e2b-it`, adapting it to Indonesian recipe generation, evaluating the result, quantizing it to GGUF, and deploying it as a lightweight recipe assistant.

	The goal was simple:

	> Given an Indonesian dish title, generate a structured recipe with `Bahan:` and `Langkah:` in natural Bahasa Indonesia.

	Example input:

	```text
	Tulis resep masakan Indonesia berjudul: "Tumis Kangkung Tempe".
	```

	Expected output shape:

	```text
	Bahan:
	- ...
	- ...

	Langkah:
	1. ...
	2. ...
	```

	## Project Summary

	\| Item \| Details \|
	\|---\|---\|
	\| Base model \| `google/gemma-4-e2b-it` \|
	\| Fine-tuned model \| `junwatu/resep-ID-gemma-4-E2B-it` \|
	\| GGUF model \| `junwatu/resep-ID-gemma-4-E2B-it-gguf` \|
	\| Dataset \| `junwatu/indonesian-recipes` \|
	\| Task \| Indonesian recipe generation \|
	\| Training hardware \| AMD Instinct MI300X \|
	\| GPU memory \| 192 GB HBM3 class \|
	\| Software stack \| ROCm 7.2, PyTorch ROCm wheel, Transformers 5.x, TRL 1.x \|
	\| Training method \| Full supervised fine-tune \|
	\| Training data \| 66,419 recipes \|
	\| Validation data \| 1,748 recipes \|
	\| Held-out test data \| 1,748 recipes \|
	\| Final deployment format \| Safetensors + GGUF Q4_K_M / Q8_0 \|

	## Why Fine-Tune?

	The base Gemma 4 model was already fluent in Indonesian, but it often missed the identity of specific Indonesian dishes.

	For example, the base model could produce a plausible recipe, but not always the right recipe. It struggled with regional or highly specific dishes such as:

	- Sosis Solo
	- Tahu Thek
	- Tempe Mendoan
	- Tahu Walik Aci
	- Kering Tempe Pete
	- DEBM / MPASI recipe variants

	A baseline evaluation on 50 held-out recipes showed the main gap:

	\| Dimension \| Base Gemma 4 E2B \|
	\|---\|---:\|
	\| Language fidelity \| 5.00 \|
	\| Format compliance \| 3.90 \|
	\| Ingredient plausibility \| 3.10 \|
	\| Step coherence \| 3.20 \|
	\| Dish authenticity \| 2.70 \|
	\| Overall \| 3.58 \|

	The key weakness was `dish_authenticity`: the model was fluent, but too often produced a generic Indonesian recipe instead of the requested dish.

	## Dataset

	The dataset contains structured Indonesian home-cooking recipes.

	Each row has:

	\| Field \| Description \|
	\|---\|---\|
	\| `title` \| Recipe name \|
	\| `ingredients` \| List of ingredient lines \|
	\| `steps` \| Ordered cooking steps \|
	\| `num_ingredients` \| Ingredient count \|
	\| `num_steps` \| Step count \|
	\| `char_count` \| Approximate recipe length \|

	The project converts the original parquet files into JSONL splits:

	```text
	data/processed/train.jsonl
	data/processed/val.jsonl
	data/processed/test.jsonl
	```

	The held-out test split is not used for training. It is used only for pre/post fine-tune comparison.

	## Training Setup

	The fine-tune used a single AMD MI300X GPU on ROCm 7.2.

	Important training choices:

	- Full fine-tune instead of LoRA
	- bf16 training
	- 1 epoch
	- Effective batch size 16
	- Max sequence length 2048
	- Cosine learning-rate schedule
	- 3% warmup
	- Gradient checkpointing enabled
	- Vision/audio paths frozen because this task is text-only

	Gemma 4 is multimodal, but this project trains only the text path:

	```text
	Train:
	- model.language_model.*
	- lm_head

	Freeze:
	- vision tower
	- audio tower
	- vision/audio adapters
	```

	## Training Format

	The project uses TRL prompt/completion conversational format:

	```json
	{
	"prompt": [
	{
	"role": "user",
	"content": "Tulis resep masakan Indonesia berjudul: \"Tumis Kangkung Tempe\"..."
	}
	],
	"completion": [
	{
	"role": "assistant",
	"content": "Bahan:\n- ...\n\nLangkah:\n1. ..."
	}
	]
	}
	```

	This format was important. In this stack, the alternative `messages` format with `assistant_only_loss=True` caused unstable loss behavior.

	## Results

	The fine-tuned model improved the practical recipe-generation behavior.

	\| Dimension \| Base \| Fine-tuned \|
	\|---\|---:\|---:\|
	\| Language fidelity \| 5.00 \| ~4.6 \|
	\| Format compliance \| 3.90 \| ~4.95 \|
	\| Ingredient plausibility \| 3.10 \| ~3.5 \|
	\| Step coherence \| 3.20 \| ~3.9 \|
	\| Dish authenticity \| 2.70 \| ~3.25 \|
	\| Overall \| 3.58 \| ~4.0 \|

	The strongest gains were:

	- More consistent `Bahan:` / `Langkah:` formatting
	- Better recipe length discipline
	- More natural Indonesian cooking vocabulary
	- Better common-dish ingredient profiles
	- Better structure for common dishes like tumis, pepes, rendang, sambal, and gulai

	## Critical Inference Setting

	One important lesson from the project: the fine-tuned model needs repetition control.

	For Hugging Face Transformers inference, use:

	```python
	model.generate(
	**inputs,
	max_new_tokens=1280,
	do_sample=False,
	repetition_penalty=1.05,
	no_repeat_ngram_size=6,
	pad_token_id=tok.eos_token_id,
	)
	```

	Without `no_repeat_ngram_size=6`, long recipes can fall into repeated ingredient-list loops.

	For GGUF runtimes such as llama.cpp or LM Studio, use the DRY sampler equivalent with allowed length around 6.

	## GGUF Deployment

	The model was also converted to GGUF for local and CPU-friendly use.

	Available quantizations:

	\| Quant \| Approx. size \| Use case \|
	\|---\|---:\|---\|
	\| Q4_K_M \| ~3.2 GB \| Default portable version \|
	\| Q8_0 \| ~4.7 GB \| Higher quality, more RAM \|

	The GGUF model can run with llama.cpp, LM Studio, or other GGUF-compatible runtimes.

	## What Worked

	The project worked well for:

	- Common Indonesian home-cooking recipes
	- Structured recipe generation
	- Concise recipe output
	- Natural Indonesian recipe phrasing
	- Common ingredients and cooking methods

	Examples of stronger categories:

	- Ayam
	- Ikan
	- Sapi
	- Kambing
	- Tahu
	- Tempe
	- Telur
	- Udang
	- Sambal
	- Tumis
	- Pepes
	- Rendang-style dishes

	## Limitations

	This is not a perfect cookbook model.

	Known limitations:

	- Rare regional dishes can become generic.
	- Some defining ingredients may be omitted.
	- Diet or modifier terms such as MPASI, DEBM, basah, or kering may be ignored.
	- The model may produce plausible but not authentic recipes.
	- Some outputs may contain minor formatting or fraction glitches.
	- Recipes should be checked before cooking.

	The main remaining bottleneck is dataset coverage, especially for regional and specialty dishes.

	## Lessons Learned

	The biggest technical lessons:

	1. Use the native ROCm 7.2 PyTorch wheel on MI300X.
	2. Avoid older ROCm wheels for this Gemma 4 bf16 training path.
	3. Use prompt/completion format with TRL for this stack.
	4. Always run a cheap quick-validation training pass before a full run.
	5. Judge the base model before fine-tuning.
	6. Automatic metrics are not enough for recipe quality.
	7. `no_repeat_ngram_size=6` is critical for stable inference.
	8. Dataset coverage matters more than another epoch for rare dishes.

	## Cost and Runtime

	The full successful cycle was inexpensive because MI300X training was fast for this model size.

	Approximate reference run:

	\| Phase \| Approx. cost \|
	\|---\|---:\|
	\| Setup and debugging \| ~$2.50 \|
	\| Quick validation \| ~$1.50 \|
	\| Full training \| ~$3.00 \|
	\| Evaluation iterations \| ~$2.00 \|
	\| GGUF conversion and upload \| ~$1.30 \|
	\| Idle/debugging slack \| ~$4.00 \|
	\| Total \| ~$14 \|

	Future cycles should be cheaper because the stack and gotchas are now documented.

	## Links

	- Base model: [`google/gemma-4-e2b-it`](https://huggingface.co/google/gemma-4-e2b-it)
	- Fine-tuned model: [`junwatu/resep-ID-gemma-4-E2B-it`](https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it)
	- GGUF model: [`junwatu/resep-ID-gemma-4-E2B-it-gguf`](https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it-gguf)
	- Dataset: [`junwatu/indonesian-recipes`](https://huggingface.co/datasets/junwatu/indonesian-recipes)
	- Live recipe demo: [`junwatu/koki-ai`](https://huggingface.co/spaces/junwatu/koki-ai)

	## License

	This project inherits the Gemma Terms of Use from the base model.