Spaces:

lablab-ai-amd-developer-hackathon
/

Resep-ID-Gemma-4

Running

App Files Files Community

junwatu commited on 11 days ago

Commit

2c7dd03

verified ·

1 Parent(s): 83f9d54

Add Resep ID Gemma 4 project explainer Space

Browse files

Files changed (4) hide show

README.md +306 -6
__pycache__/app.cpython-312.pyc +0 -0
app.py +145 -0
requirements.txt +1 -0

README.md CHANGED Viewed

@@ -1,13 +1,313 @@
 ---
 title: Resep ID Gemma 4
-emoji: 📊
-colorFrom: yellow
-colorTo: gray
 sdk: gradio
-sdk_version: 6.14.0
-python_version: '3.13'
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Resep ID Gemma 4
+emoji: 🍲
+colorFrom: red
+colorTo: yellow
 sdk: gradio
+sdk_version: 5.0.0
 app_file: app.py
 pinned: false
+license: gemma
+short_description: Gemma 4 Indonesian recipe fine-tune case study
+models:
+- google/gemma-4-e2b-it
+- junwatu/resep-ID-gemma-4-E2B-it
+- junwatu/resep-ID-gemma-4-E2B-it-gguf
+datasets:
+- junwatu/indonesian-recipes
+tags:
+- gemma
+- gemma-4
+- fine-tuning
+- mi300x
+- rocm
+- indonesian
+- recipes
+- gguf
+- text-generation
 ---
+# Resep ID Gemma 4
+This Space explains an end-to-end fine-tuning project: taking `google/gemma-4-e2b-it`, adapting it to Indonesian recipe generation, evaluating the result, quantizing it to GGUF, and deploying it as a lightweight recipe assistant.
+The goal was simple:
+> Given an Indonesian dish title, generate a structured recipe with `Bahan:` and `Langkah:` in natural Bahasa Indonesia.
+Example input:
+```text
+Tulis resep masakan Indonesia berjudul: "Tumis Kangkung Tempe".
+```
+Expected output shape:
+```text
+Bahan:
+- ...
+- ...
+Langkah:
+1. ...
+2. ...
+```
+## Project Summary
+| Item | Details |
+|---|---|
+| Base model | `google/gemma-4-e2b-it` |
+| Fine-tuned model | `junwatu/resep-ID-gemma-4-E2B-it` |
+| GGUF model | `junwatu/resep-ID-gemma-4-E2B-it-gguf` |
+| Dataset | `junwatu/indonesian-recipes` |
+| Task | Indonesian recipe generation |
+| Training hardware | AMD Instinct MI300X |
+| GPU memory | 192 GB HBM3 class |
+| Software stack | ROCm 7.2, PyTorch ROCm wheel, Transformers 5.x, TRL 1.x |
+| Training method | Full supervised fine-tune |
+| Training data | 66,419 recipes |
+| Validation data | 1,748 recipes |
+| Held-out test data | 1,748 recipes |
+| Final deployment format | Safetensors + GGUF Q4_K_M / Q8_0 |
+## Why Fine-Tune?
+The base Gemma 4 model was already fluent in Indonesian, but it often missed the identity of specific Indonesian dishes.
+For example, the base model could produce a plausible recipe, but not always the right recipe. It struggled with regional or highly specific dishes such as:
+- Sosis Solo
+- Tahu Thek
+- Tempe Mendoan
+- Tahu Walik Aci
+- Kering Tempe Pete
+- DEBM / MPASI recipe variants
+A baseline evaluation on 50 held-out recipes showed the main gap:
+| Dimension | Base Gemma 4 E2B |
+|---|---:|
+| Language fidelity | 5.00 |
+| Format compliance | 3.90 |
+| Ingredient plausibility | 3.10 |
+| Step coherence | 3.20 |
+| Dish authenticity | 2.70 |
+| Overall | 3.58 |
+The key weakness was `dish_authenticity`: the model was fluent, but too often produced a generic Indonesian recipe instead of the requested dish.
+## Dataset
+The dataset contains structured Indonesian home-cooking recipes.
+Each row has:
+| Field | Description |
+|---|---|
+| `title` | Recipe name |
+| `ingredients` | List of ingredient lines |
+| `steps` | Ordered cooking steps |
+| `num_ingredients` | Ingredient count |
+| `num_steps` | Step count |
+| `char_count` | Approximate recipe length |
+The project converts the original parquet files into JSONL splits:
+```text
+data/processed/train.jsonl
+data/processed/val.jsonl
+data/processed/test.jsonl
+```
+The held-out test split is not used for training. It is used only for pre/post fine-tune comparison.
+## Training Setup
+The fine-tune used a single AMD MI300X GPU on ROCm 7.2.
+Important training choices:
+- Full fine-tune instead of LoRA
+- bf16 training
+- 1 epoch
+- Effective batch size 16
+- Max sequence length 2048
+- Cosine learning-rate schedule
+- 3% warmup
+- Gradient checkpointing enabled
+- Vision/audio paths frozen because this task is text-only
+Gemma 4 is multimodal, but this project trains only the text path:
+```text
+Train:
+- model.language_model.*
+- lm_head
+Freeze:
+- vision tower
+- audio tower
+- vision/audio adapters
+```
+## Training Format
+The project uses TRL prompt/completion conversational format:
+```json
+{
+  "prompt": [
+    {
+      "role": "user",
+      "content": "Tulis resep masakan Indonesia berjudul: \"Tumis Kangkung Tempe\"..."
+    }
+  ],
+  "completion": [
+    {
+      "role": "assistant",
+      "content": "Bahan:\n- ...\n\nLangkah:\n1. ..."
+    }
+  ]
+}
+```
+This format was important. In this stack, the alternative `messages` format with `assistant_only_loss=True` caused unstable loss behavior.
+## Results
+The fine-tuned model improved the practical recipe-generation behavior.
+| Dimension | Base | Fine-tuned |
+|---|---:|---:|
+| Language fidelity | 5.00 | ~4.6 |
+| Format compliance | 3.90 | ~4.95 |
+| Ingredient plausibility | 3.10 | ~3.5 |
+| Step coherence | 3.20 | ~3.9 |
+| Dish authenticity | 2.70 | ~3.25 |
+| Overall | 3.58 | ~4.0 |
+The strongest gains were:
+- More consistent `Bahan:` / `Langkah:` formatting
+- Better recipe length discipline
+- More natural Indonesian cooking vocabulary
+- Better common-dish ingredient profiles
+- Better structure for common dishes like tumis, pepes, rendang, sambal, and gulai
+## Critical Inference Setting
+One important lesson from the project: the fine-tuned model needs repetition control.
+For Hugging Face Transformers inference, use:
+```python
+model.generate(
+    **inputs,
+    max_new_tokens=1280,
+    do_sample=False,
+    repetition_penalty=1.05,
+    no_repeat_ngram_size=6,
+    pad_token_id=tok.eos_token_id,
+)
+```
+Without `no_repeat_ngram_size=6`, long recipes can fall into repeated ingredient-list loops.
+For GGUF runtimes such as llama.cpp or LM Studio, use the DRY sampler equivalent with allowed length around 6.
+## GGUF Deployment
+The model was also converted to GGUF for local and CPU-friendly use.
+Available quantizations:
+| Quant | Approx. size | Use case |
+|---|---:|---|
+| Q4_K_M | ~3.2 GB | Default portable version |
+| Q8_0 | ~4.7 GB | Higher quality, more RAM |
+The GGUF model can run with llama.cpp, LM Studio, or other GGUF-compatible runtimes.
+## What Worked
+The project worked well for:
+- Common Indonesian home-cooking recipes
+- Structured recipe generation
+- Concise recipe output
+- Natural Indonesian recipe phrasing
+- Common ingredients and cooking methods
+Examples of stronger categories:
+- Ayam
+- Ikan
+- Sapi
+- Kambing
+- Tahu
+- Tempe
+- Telur
+- Udang
+- Sambal
+- Tumis
+- Pepes
+- Rendang-style dishes
+## Limitations
+This is not a perfect cookbook model.
+Known limitations:
+- Rare regional dishes can become generic.
+- Some defining ingredients may be omitted.
+- Diet or modifier terms such as MPASI, DEBM, basah, or kering may be ignored.
+- The model may produce plausible but not authentic recipes.
+- Some outputs may contain minor formatting or fraction glitches.
+- Recipes should be checked before cooking.
+The main remaining bottleneck is dataset coverage, especially for regional and specialty dishes.
+## Lessons Learned
+The biggest technical lessons:
+1. Use the native ROCm 7.2 PyTorch wheel on MI300X.
+2. Avoid older ROCm wheels for this Gemma 4 bf16 training path.
+3. Use prompt/completion format with TRL for this stack.
+4. Always run a cheap quick-validation training pass before a full run.
+5. Judge the base model before fine-tuning.
+6. Automatic metrics are not enough for recipe quality.
+7. `no_repeat_ngram_size=6` is critical for stable inference.
+8. Dataset coverage matters more than another epoch for rare dishes.
+## Cost and Runtime
+The full successful cycle was inexpensive because MI300X training was fast for this model size.
+Approximate reference run:
+| Phase | Approx. cost |
+|---|---:|
+| Setup and debugging | ~$2.50 |
+| Quick validation | ~$1.50 |
+| Full training | ~$3.00 |
+| Evaluation iterations | ~$2.00 |
+| GGUF conversion and upload | ~$1.30 |
+| Idle/debugging slack | ~$4.00 |
+| Total | ~$14 |
+Future cycles should be cheaper because the stack and gotchas are now documented.
+## Links
+- Base model: [`google/gemma-4-e2b-it`](https://huggingface.co/google/gemma-4-e2b-it)
+- Fine-tuned model: [`junwatu/resep-ID-gemma-4-E2B-it`](https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it)
+- GGUF model: [`junwatu/resep-ID-gemma-4-E2B-it-gguf`](https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it-gguf)
+- Dataset: [`junwatu/indonesian-recipes`](https://huggingface.co/datasets/junwatu/indonesian-recipes)
+- Live recipe demo: [`junwatu/koki-ai`](https://huggingface.co/spaces/junwatu/koki-ai)
+## License
+This project inherits the Gemma Terms of Use from the base model.

__pycache__/app.cpython-312.pyc ADDED Viewed

Binary file (5.69 kB). View file

app.py ADDED Viewed

	@@ -0,0 +1,145 @@

+from __future__ import annotations
+import gradio as gr
+OVERVIEW = """
+# MI300X Gemma 4 Indonesian Recipe Fine-Tune
+This Space explains an end-to-end fine-tuning project: taking `google/gemma-4-e2b-it`,
+adapting it to Indonesian recipe generation, evaluating the result, quantizing it to
+GGUF, and deploying it as a lightweight recipe assistant.
+The task is simple: given an Indonesian dish title, generate a structured recipe with
+`Bahan:` and `Langkah:` in natural Bahasa Indonesia.
+| Item | Details |
+|---|---|
+| Base model | `google/gemma-4-e2b-it` |
+| Fine-tuned model | `junwatu/resep-ID-gemma-4-E2B-it` |
+| GGUF model | `junwatu/resep-ID-gemma-4-E2B-it-gguf` |
+| Dataset | `junwatu/indonesian-recipes` |
+| Training hardware | AMD Instinct MI300X |
+| Training method | Full supervised fine-tune |
+"""
+DATASET = """
+## Dataset
+The dataset contains structured Indonesian home-cooking recipes with `title`,
+`ingredients`, and `steps`.
+| Split | Count | Use |
+|---|---:|---|
+| Train | 66,419 | Fine-tuning |
+| Validation | 1,748 | Eval loss during training |
+| Test | 1,748 | Held-out pre/post evaluation |
+The held-out test split is not used for training. It is reserved for comparing the
+base model and fine-tuned model on the same examples.
+"""
+TRAINING = """
+## Training Setup
+The fine-tune used a single AMD MI300X GPU on ROCm 7.2.
+- Full fine-tune instead of LoRA
+- bf16 training
+- 1 epoch
+- Effective batch size 16
+- Max sequence length 2048
+- Cosine learning-rate schedule
+- 3% warmup
+- Gradient checkpointing enabled
+- Vision/audio paths frozen because this task is text-only
+The project uses TRL prompt/completion conversational format. This avoided the
+unstable loss behavior seen with the `messages + assistant_only_loss=True` path
+in this stack.
+"""
+EVALUATION = """
+## Evaluation Results
+The base model was fluent in Indonesian, but weak on dish authenticity. The
+fine-tuned model improved practical recipe behavior.
+| Dimension | Base | Fine-tuned |
+|---|---:|---:|
+| Language fidelity | 5.00 | ~4.6 |
+| Format compliance | 3.90 | ~4.95 |
+| Ingredient plausibility | 3.10 | ~3.5 |
+| Step coherence | 3.20 | ~3.9 |
+| Dish authenticity | 2.70 | ~3.25 |
+| Overall | 3.58 | ~4.0 |
+The strongest gains were format consistency, length discipline, natural Indonesian
+cooking vocabulary, and better common-dish ingredient profiles.
+"""
+DEPLOYMENT = """
+## Deployment
+The model was shipped as both safetensors and GGUF.
+| Quant | Approx. size | Use case |
+|---|---:|---|
+| Q4_K_M | ~3.2 GB | Default portable version |
+| Q8_0 | ~4.7 GB | Higher quality, more RAM |
+For Hugging Face Transformers inference, the critical generation setting is:
+```python
+no_repeat_ngram_size=6
+```
+Without it, long recipes can fall into repeated ingredient-list loops. For GGUF
+runtimes such as llama.cpp or LM Studio, use the DRY sampler equivalent with
+allowed length around 6.
+"""
+LESSONS = """
+## Lessons Learned
+1. Use the native ROCm 7.2 PyTorch wheel on MI300X.
+2. Avoid older ROCm wheels for this Gemma 4 bf16 training path.
+3. Use prompt/completion format with TRL for this stack.
+4. Always run a cheap quick-validation training pass before a full run.
+5. Judge the base model before fine-tuning.
+6. Automatic metrics are not enough for recipe quality.
+7. `no_repeat_ngram_size=6` is critical for stable inference.
+8. Dataset coverage matters more than another epoch for rare dishes.
+## Links
+- Base model: https://huggingface.co/google/gemma-4-e2b-it
+- Fine-tuned model: https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it
+- GGUF model: https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it-gguf
+- Dataset: https://huggingface.co/datasets/junwatu/indonesian-recipes
+- Live recipe demo: https://huggingface.co/spaces/junwatu/koki-ai
+"""
+with gr.Blocks(title="Resep ID Gemma 4") as demo:
+    gr.Markdown("# Resep ID Gemma 4")
+    gr.Markdown(
+        "A compact case study on fine-tuning Gemma 4 for Indonesian recipe generation."
+    )
+    with gr.Tabs():
+        with gr.Tab("Overview"):
+            gr.Markdown(OVERVIEW)
+        with gr.Tab("Dataset"):
+            gr.Markdown(DATASET)
+        with gr.Tab("Training"):
+            gr.Markdown(TRAINING)
+        with gr.Tab("Evaluation"):
+            gr.Markdown(EVALUATION)
+        with gr.Tab("Deployment"):
+            gr.Markdown(DEPLOYMENT)
+        with gr.Tab("Lessons"):
+            gr.Markdown(LESSONS)
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ gradio>=5.0