| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <title>Resep ID Gemma 4</title> |
| <link rel="preconnect" href="https://fonts.googleapis.com"> |
| <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> |
| <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet"> |
| <style> |
| * { margin: 0; padding: 0; box-sizing: border-box; } |
| body { font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif; line-height: 1.7; padding: 2rem; background: #fff; color: #1a1a1a; } |
| .container { max-width: 860px; margin: 0 auto; } |
| h1 { font-size: 2rem; margin: 1.5rem 0 1rem; font-weight: 700; } |
| h2 { font-size: 1.4rem; margin: 2rem 0 0.8rem; border-bottom: 1px solid #ddd; padding-bottom: 0.4rem; color: #d35400; font-weight: 600; } |
| h3 { font-size: 1.1rem; margin: 1.5rem 0 0.5rem; font-weight: 600; } |
| p { margin: 0.6rem 0; } |
| a { color: #d35400; text-decoration: none; } |
| a:hover { text-decoration: underline; } |
| code { font-family: 'JetBrains Mono', monospace; background: #f4f4f4; color: #c7254e; padding: 2px 6px; border-radius: 3px; font-size: 0.85em; } |
| pre { font-family: 'JetBrains Mono', monospace; background: #f8f8f8; border: 1px solid #ddd; border-radius: 6px; padding: 1rem; overflow-x: auto; margin: 1rem 0; font-size: 0.85em; } |
| pre code { background: none; padding: 0; color: #333; } |
| blockquote { border-left: 3px solid #d35400; padding-left: 1rem; color: #555; margin: 1rem 0; } |
| table { border-collapse: collapse; width: 100%; margin: 1rem 0; } |
| th, td { border: 1px solid #ddd; padding: 0.5rem 0.8rem; text-align: left; } |
| th { background: #f4f4f4; } |
| tr:nth-child(even) { background: #fafafa; } |
| ul, ol { padding-left: 1.5rem; margin: 0.5rem 0; } |
| li { margin: 0.3rem 0; } |
| hr { border: none; border-top: 1px solid #ddd; margin: 2rem 0; } |
| |
| @media (prefers-color-scheme: dark) { |
| body { background: #1a1a2e; color: #e0e0e0; } |
| h1 { color: #fff; } |
| h2 { color: #ffa94d; border-bottom-color: #2a4a7f; } |
| h3 { color: #ddd; } |
| a { color: #ffa94d; } |
| code { background: #0f3460; color: #a8dadc; } |
| pre { background: #0f3460; border-color: #2a4a7f; } |
| pre code { color: #a8dadc; } |
| blockquote { border-left-color: #ffa94d; color: #bbb; } |
| th, td { border-color: #2a4a7f; } |
| th { background: #0f3460; color: #fff; } |
| tr:nth-child(even) { background: rgba(15,52,96,0.3); } |
| hr { border-top-color: #2a4a7f; } |
| } |
| </style> |
| </head> |
| <body> |
| <div class="container"> |
|
|
| <h1>🍲 Resep ID Gemma 4</h1> |
|
|
| <p>This Space explains an end-to-end fine-tuning project: taking <code>google/gemma-4-e2b-it</code>, adapting it to Indonesian recipe generation, evaluating the result, quantizing it to GGUF, and deploying it as a lightweight recipe assistant.</p> |
|
|
| <p>The goal was simple:</p> |
|
|
| <blockquote>Given an Indonesian dish title, generate a structured recipe with <code>Bahan:</code> and <code>Langkah:</code> in natural Bahasa Indonesia.</blockquote> |
|
|
| <p>Example input:</p> |
| <pre><code>Tulis resep masakan Indonesia berjudul: "Tumis Kangkung Tempe".</code></pre> |
|
|
| <p>Expected output shape:</p> |
| <pre><code>Bahan: |
| - ... |
| - ... |
|
|
| Langkah: |
| 1. ... |
| 2. ...</code></pre> |
|
|
| <h2>Project Summary</h2> |
| <table> |
| <tr><th>Item</th><th>Details</th></tr> |
| <tr><td>Base model</td><td><code>google/gemma-4-e2b-it</code></td></tr> |
| <tr><td>Fine-tuned model</td><td><code>junwatu/resep-ID-gemma-4-E2B-it</code></td></tr> |
| <tr><td>GGUF model</td><td><code>junwatu/resep-ID-gemma-4-E2B-it-gguf</code></td></tr> |
| <tr><td>Dataset</td><td><code>junwatu/indonesian-recipes</code></td></tr> |
| <tr><td>Task</td><td>Indonesian recipe generation</td></tr> |
| <tr><td>Training hardware</td><td>AMD Instinct MI300X</td></tr> |
| <tr><td>GPU memory</td><td>192 GB HBM3 class</td></tr> |
| <tr><td>Software stack</td><td>ROCm 7.2, PyTorch ROCm wheel, Transformers 5.x, TRL 1.x</td></tr> |
| <tr><td>Training method</td><td>Full supervised fine-tune</td></tr> |
| <tr><td>Training data</td><td>66,419 recipes</td></tr> |
| <tr><td>Validation data</td><td>1,748 recipes</td></tr> |
| <tr><td>Held-out test data</td><td>1,748 recipes</td></tr> |
| <tr><td>Final deployment format</td><td>Safetensors + GGUF Q4_K_M / Q8_0</td></tr> |
| </table> |
|
|
| <h2>Why Fine-Tune?</h2> |
| <p>The base Gemma 4 model was already fluent in Indonesian, but it often missed the identity of specific Indonesian dishes.</p> |
| <p>For example, the base model could produce a plausible recipe, but not always the <em>right</em> recipe. It struggled with regional or highly specific dishes such as:</p> |
| <ul> |
| <li>Sosis Solo</li> |
| <li>Tahu Thek</li> |
| <li>Tempe Mendoan</li> |
| <li>Tahu Walik Aci</li> |
| <li>Kering Tempe Pete</li> |
| <li>DEBM / MPASI recipe variants</li> |
| </ul> |
|
|
| <p>A baseline evaluation on 50 held-out recipes showed the main gap:</p> |
| <table> |
| <tr><th>Dimension</th><th>Base Gemma 4 E2B</th></tr> |
| <tr><td>Language fidelity</td><td>5.00</td></tr> |
| <tr><td>Format compliance</td><td>3.90</td></tr> |
| <tr><td>Ingredient plausibility</td><td>3.10</td></tr> |
| <tr><td>Step coherence</td><td>3.20</td></tr> |
| <tr><td>Dish authenticity</td><td>2.70</td></tr> |
| <tr><td>Overall</td><td>3.58</td></tr> |
| </table> |
| <p>The key weakness was <code>dish_authenticity</code>: the model was fluent, but too often produced a generic Indonesian recipe instead of the requested dish.</p> |
|
|
| <h2>Dataset</h2> |
| <p>The dataset contains structured Indonesian home-cooking recipes. Each row has:</p> |
| <table> |
| <tr><th>Field</th><th>Description</th></tr> |
| <tr><td><code>title</code></td><td>Recipe name</td></tr> |
| <tr><td><code>ingredients</code></td><td>List of ingredient lines</td></tr> |
| <tr><td><code>steps</code></td><td>Ordered cooking steps</td></tr> |
| <tr><td><code>num_ingredients</code></td><td>Ingredient count</td></tr> |
| <tr><td><code>num_steps</code></td><td>Step count</td></tr> |
| <tr><td><code>char_count</code></td><td>Approximate recipe length</td></tr> |
| </table> |
|
|
| <p>The project converts the original parquet files into JSONL splits:</p> |
| <pre><code>data/processed/train.jsonl |
| data/processed/val.jsonl |
| data/processed/test.jsonl</code></pre> |
| <p>The held-out test split is not used for training. It is used only for pre/post fine-tune comparison.</p> |
|
|
| <h2>Training Setup</h2> |
| <p>The fine-tune used a single AMD MI300X GPU on ROCm 7.2. Important training choices:</p> |
| <ul> |
| <li>Full fine-tune instead of LoRA</li> |
| <li>bf16 training</li> |
| <li>1 epoch</li> |
| <li>Effective batch size 16</li> |
| <li>Max sequence length 2048</li> |
| <li>Cosine learning-rate schedule</li> |
| <li>3% warmup</li> |
| <li>Gradient checkpointing enabled</li> |
| <li>Vision/audio paths frozen because this task is text-only</li> |
| </ul> |
|
|
| <p>Gemma 4 is multimodal, but this project trains only the text path:</p> |
| <pre><code>Train: |
| - model.language_model.* |
| - lm_head |
|
|
| Freeze: |
| - vision tower |
| - audio tower |
| - vision/audio adapters</code></pre> |
|
|
| <h2>Training Format</h2> |
| <p>The project uses TRL prompt/completion conversational format:</p> |
| <pre><code>{ |
| "prompt": [ |
| { |
| "role": "user", |
| "content": "Tulis resep masakan Indonesia berjudul: \"Tumis Kangkung Tempe\"..." |
| } |
| ], |
| "completion": [ |
| { |
| "role": "assistant", |
| "content": "Bahan:\n- ...\n\nLangkah:\n1. ..." |
| } |
| ] |
| }</code></pre> |
| <p>This format was important. In this stack, the alternative <code>messages</code> format with <code>assistant_only_loss=True</code> caused unstable loss behavior.</p> |
|
|
| <h2>Results</h2> |
| <p>The fine-tuned model improved the practical recipe-generation behavior.</p> |
| <table> |
| <tr><th>Dimension</th><th>Base</th><th>Fine-tuned</th></tr> |
| <tr><td>Language fidelity</td><td>5.00</td><td>~4.6</td></tr> |
| <tr><td>Format compliance</td><td>3.90</td><td>~4.95</td></tr> |
| <tr><td>Ingredient plausibility</td><td>3.10</td><td>~3.5</td></tr> |
| <tr><td>Step coherence</td><td>3.20</td><td>~3.9</td></tr> |
| <tr><td>Dish authenticity</td><td>2.70</td><td>~3.25</td></tr> |
| <tr><td>Overall</td><td>3.58</td><td>~4.0</td></tr> |
| </table> |
|
|
| <p>The strongest gains were:</p> |
| <ul> |
| <li>More consistent <code>Bahan:</code> / <code>Langkah:</code> formatting</li> |
| <li>Better recipe length discipline</li> |
| <li>More natural Indonesian cooking vocabulary</li> |
| <li>Better common-dish ingredient profiles</li> |
| <li>Better structure for common dishes like tumis, pepes, rendang, sambal, and gulai</li> |
| </ul> |
|
|
| <h2>Critical Inference Setting</h2> |
| <p>One important lesson from the project: the fine-tuned model needs repetition control.</p> |
| <pre><code>model.generate( |
| **inputs, |
| max_new_tokens=1280, |
| do_sample=False, |
| repetition_penalty=1.05, |
| no_repeat_ngram_size=6, |
| pad_token_id=tok.eos_token_id, |
| )</code></pre> |
| <p>Without <code>no_repeat_ngram_size=6</code>, long recipes can fall into repeated ingredient-list loops.</p> |
| <p>For GGUF runtimes such as llama.cpp or LM Studio, use the DRY sampler equivalent with allowed length around 6.</p> |
|
|
| <h2>GGUF Deployment</h2> |
| <p>The model was also converted to GGUF for local and CPU-friendly use.</p> |
| <table> |
| <tr><th>Quant</th><th>Approx. size</th><th>Use case</th></tr> |
| <tr><td>Q4_K_M</td><td>~3.2 GB</td><td>Default portable version</td></tr> |
| <tr><td>Q8_0</td><td>~4.7 GB</td><td>Higher quality, more RAM</td></tr> |
| </table> |
| <p>The GGUF model can run with llama.cpp, LM Studio, or other GGUF-compatible runtimes.</p> |
|
|
| <h2>What Worked</h2> |
| <p>The project worked well for:</p> |
| <ul> |
| <li>Common Indonesian home-cooking recipes</li> |
| <li>Structured recipe generation</li> |
| <li>Concise recipe output</li> |
| <li>Natural Indonesian recipe phrasing</li> |
| <li>Common ingredients and cooking methods</li> |
| </ul> |
| <p>Examples of stronger categories: Ayam, Ikan, Sapi, Kambing, Tahu, Tempe, Telur, Udang, Sambal, Tumis, Pepes, Rendang-style dishes.</p> |
|
|
| <h2>Limitations</h2> |
| <ul> |
| <li>Rare regional dishes can become generic.</li> |
| <li>Some defining ingredients may be omitted.</li> |
| <li>Diet or modifier terms such as MPASI, DEBM, basah, or kering may be ignored.</li> |
| <li>The model may produce plausible but not authentic recipes.</li> |
| <li>Some outputs may contain minor formatting or fraction glitches.</li> |
| <li>Recipes should be checked before cooking.</li> |
| </ul> |
| <p>The main remaining bottleneck is dataset coverage, especially for regional and specialty dishes.</p> |
|
|
| <h2>Lessons Learned</h2> |
| <ol> |
| <li>Use the native ROCm 7.2 PyTorch wheel on MI300X.</li> |
| <li>Avoid older ROCm wheels for this Gemma 4 bf16 training path.</li> |
| <li>Use prompt/completion format with TRL for this stack.</li> |
| <li>Always run a cheap quick-validation training pass before a full run.</li> |
| <li>Judge the base model before fine-tuning.</li> |
| <li>Automatic metrics are not enough for recipe quality.</li> |
| <li><code>no_repeat_ngram_size=6</code> is critical for stable inference.</li> |
| <li>Dataset coverage matters more than another epoch for rare dishes.</li> |
| </ol> |
|
|
| <h2>Cost and Runtime</h2> |
| <table> |
| <tr><th>Phase</th><th>Approx. cost</th></tr> |
| <tr><td>Setup and debugging</td><td>~$2.50</td></tr> |
| <tr><td>Quick validation</td><td>~$1.50</td></tr> |
| <tr><td>Full training</td><td>~$3.00</td></tr> |
| <tr><td>Evaluation iterations</td><td>~$2.00</td></tr> |
| <tr><td>GGUF conversion and upload</td><td>~$1.30</td></tr> |
| <tr><td>Idle/debugging slack</td><td>~$4.00</td></tr> |
| <tr><td><strong>Total</strong></td><td><strong>~$14</strong></td></tr> |
| </table> |
| <p>Future cycles should be cheaper because the stack and gotchas are now documented.</p> |
|
|
| <h2>Links</h2> |
| <ul> |
| <li>Base model: <a href="https://huggingface.co/google/gemma-4-e2b-it">google/gemma-4-e2b-it</a></li> |
| <li>Fine-tuned model: <a href="https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it">junwatu/resep-ID-gemma-4-E2B-it</a></li> |
| <li>GGUF model: <a href="https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it-gguf">junwatu/resep-ID-gemma-4-E2B-it-gguf</a></li> |
| <li>Dataset: <a href="https://huggingface.co/datasets/junwatu/indonesian-recipes">junwatu/indonesian-recipes</a></li> |
| <li>Live recipe demo: <a href="https://huggingface.co/spaces/junwatu/koki-ai">junwatu/koki-ai</a></li> |
| </ul> |
|
|
| <hr> |
| <p><em>This project inherits the Gemma Terms of Use from the base model.</em></p> |
|
|
| </div> |
| </body> |
| </html> |
|
|