Resep-ID-Gemma-4 / index.html
junwatu's picture
Upload index.html with huggingface_hub
5ccaaa3 verified
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Resep ID Gemma 4</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif; line-height: 1.7; padding: 2rem; background: #fff; color: #1a1a1a; }
.container { max-width: 860px; margin: 0 auto; }
h1 { font-size: 2rem; margin: 1.5rem 0 1rem; font-weight: 700; }
h2 { font-size: 1.4rem; margin: 2rem 0 0.8rem; border-bottom: 1px solid #ddd; padding-bottom: 0.4rem; color: #d35400; font-weight: 600; }
h3 { font-size: 1.1rem; margin: 1.5rem 0 0.5rem; font-weight: 600; }
p { margin: 0.6rem 0; }
a { color: #d35400; text-decoration: none; }
a:hover { text-decoration: underline; }
code { font-family: 'JetBrains Mono', monospace; background: #f4f4f4; color: #c7254e; padding: 2px 6px; border-radius: 3px; font-size: 0.85em; }
pre { font-family: 'JetBrains Mono', monospace; background: #f8f8f8; border: 1px solid #ddd; border-radius: 6px; padding: 1rem; overflow-x: auto; margin: 1rem 0; font-size: 0.85em; }
pre code { background: none; padding: 0; color: #333; }
blockquote { border-left: 3px solid #d35400; padding-left: 1rem; color: #555; margin: 1rem 0; }
table { border-collapse: collapse; width: 100%; margin: 1rem 0; }
th, td { border: 1px solid #ddd; padding: 0.5rem 0.8rem; text-align: left; }
th { background: #f4f4f4; }
tr:nth-child(even) { background: #fafafa; }
ul, ol { padding-left: 1.5rem; margin: 0.5rem 0; }
li { margin: 0.3rem 0; }
hr { border: none; border-top: 1px solid #ddd; margin: 2rem 0; }
@media (prefers-color-scheme: dark) {
body { background: #1a1a2e; color: #e0e0e0; }
h1 { color: #fff; }
h2 { color: #ffa94d; border-bottom-color: #2a4a7f; }
h3 { color: #ddd; }
a { color: #ffa94d; }
code { background: #0f3460; color: #a8dadc; }
pre { background: #0f3460; border-color: #2a4a7f; }
pre code { color: #a8dadc; }
blockquote { border-left-color: #ffa94d; color: #bbb; }
th, td { border-color: #2a4a7f; }
th { background: #0f3460; color: #fff; }
tr:nth-child(even) { background: rgba(15,52,96,0.3); }
hr { border-top-color: #2a4a7f; }
}
</style>
</head>
<body>
<div class="container">
<h1>🍲 Resep ID Gemma 4</h1>
<p>This Space explains an end-to-end fine-tuning project: taking <code>google/gemma-4-e2b-it</code>, adapting it to Indonesian recipe generation, evaluating the result, quantizing it to GGUF, and deploying it as a lightweight recipe assistant.</p>
<p>The goal was simple:</p>
<blockquote>Given an Indonesian dish title, generate a structured recipe with <code>Bahan:</code> and <code>Langkah:</code> in natural Bahasa Indonesia.</blockquote>
<p>Example input:</p>
<pre><code>Tulis resep masakan Indonesia berjudul: "Tumis Kangkung Tempe".</code></pre>
<p>Expected output shape:</p>
<pre><code>Bahan:
- ...
- ...
Langkah:
1. ...
2. ...</code></pre>
<h2>Project Summary</h2>
<table>
<tr><th>Item</th><th>Details</th></tr>
<tr><td>Base model</td><td><code>google/gemma-4-e2b-it</code></td></tr>
<tr><td>Fine-tuned model</td><td><code>junwatu/resep-ID-gemma-4-E2B-it</code></td></tr>
<tr><td>GGUF model</td><td><code>junwatu/resep-ID-gemma-4-E2B-it-gguf</code></td></tr>
<tr><td>Dataset</td><td><code>junwatu/indonesian-recipes</code></td></tr>
<tr><td>Task</td><td>Indonesian recipe generation</td></tr>
<tr><td>Training hardware</td><td>AMD Instinct MI300X</td></tr>
<tr><td>GPU memory</td><td>192 GB HBM3 class</td></tr>
<tr><td>Software stack</td><td>ROCm 7.2, PyTorch ROCm wheel, Transformers 5.x, TRL 1.x</td></tr>
<tr><td>Training method</td><td>Full supervised fine-tune</td></tr>
<tr><td>Training data</td><td>66,419 recipes</td></tr>
<tr><td>Validation data</td><td>1,748 recipes</td></tr>
<tr><td>Held-out test data</td><td>1,748 recipes</td></tr>
<tr><td>Final deployment format</td><td>Safetensors + GGUF Q4_K_M / Q8_0</td></tr>
</table>
<h2>Why Fine-Tune?</h2>
<p>The base Gemma 4 model was already fluent in Indonesian, but it often missed the identity of specific Indonesian dishes.</p>
<p>For example, the base model could produce a plausible recipe, but not always the <em>right</em> recipe. It struggled with regional or highly specific dishes such as:</p>
<ul>
<li>Sosis Solo</li>
<li>Tahu Thek</li>
<li>Tempe Mendoan</li>
<li>Tahu Walik Aci</li>
<li>Kering Tempe Pete</li>
<li>DEBM / MPASI recipe variants</li>
</ul>
<p>A baseline evaluation on 50 held-out recipes showed the main gap:</p>
<table>
<tr><th>Dimension</th><th>Base Gemma 4 E2B</th></tr>
<tr><td>Language fidelity</td><td>5.00</td></tr>
<tr><td>Format compliance</td><td>3.90</td></tr>
<tr><td>Ingredient plausibility</td><td>3.10</td></tr>
<tr><td>Step coherence</td><td>3.20</td></tr>
<tr><td>Dish authenticity</td><td>2.70</td></tr>
<tr><td>Overall</td><td>3.58</td></tr>
</table>
<p>The key weakness was <code>dish_authenticity</code>: the model was fluent, but too often produced a generic Indonesian recipe instead of the requested dish.</p>
<h2>Dataset</h2>
<p>The dataset contains structured Indonesian home-cooking recipes. Each row has:</p>
<table>
<tr><th>Field</th><th>Description</th></tr>
<tr><td><code>title</code></td><td>Recipe name</td></tr>
<tr><td><code>ingredients</code></td><td>List of ingredient lines</td></tr>
<tr><td><code>steps</code></td><td>Ordered cooking steps</td></tr>
<tr><td><code>num_ingredients</code></td><td>Ingredient count</td></tr>
<tr><td><code>num_steps</code></td><td>Step count</td></tr>
<tr><td><code>char_count</code></td><td>Approximate recipe length</td></tr>
</table>
<p>The project converts the original parquet files into JSONL splits:</p>
<pre><code>data/processed/train.jsonl
data/processed/val.jsonl
data/processed/test.jsonl</code></pre>
<p>The held-out test split is not used for training. It is used only for pre/post fine-tune comparison.</p>
<h2>Training Setup</h2>
<p>The fine-tune used a single AMD MI300X GPU on ROCm 7.2. Important training choices:</p>
<ul>
<li>Full fine-tune instead of LoRA</li>
<li>bf16 training</li>
<li>1 epoch</li>
<li>Effective batch size 16</li>
<li>Max sequence length 2048</li>
<li>Cosine learning-rate schedule</li>
<li>3% warmup</li>
<li>Gradient checkpointing enabled</li>
<li>Vision/audio paths frozen because this task is text-only</li>
</ul>
<p>Gemma 4 is multimodal, but this project trains only the text path:</p>
<pre><code>Train:
- model.language_model.*
- lm_head
Freeze:
- vision tower
- audio tower
- vision/audio adapters</code></pre>
<h2>Training Format</h2>
<p>The project uses TRL prompt/completion conversational format:</p>
<pre><code>{
"prompt": [
{
"role": "user",
"content": "Tulis resep masakan Indonesia berjudul: \"Tumis Kangkung Tempe\"..."
}
],
"completion": [
{
"role": "assistant",
"content": "Bahan:\n- ...\n\nLangkah:\n1. ..."
}
]
}</code></pre>
<p>This format was important. In this stack, the alternative <code>messages</code> format with <code>assistant_only_loss=True</code> caused unstable loss behavior.</p>
<h2>Results</h2>
<p>The fine-tuned model improved the practical recipe-generation behavior.</p>
<table>
<tr><th>Dimension</th><th>Base</th><th>Fine-tuned</th></tr>
<tr><td>Language fidelity</td><td>5.00</td><td>~4.6</td></tr>
<tr><td>Format compliance</td><td>3.90</td><td>~4.95</td></tr>
<tr><td>Ingredient plausibility</td><td>3.10</td><td>~3.5</td></tr>
<tr><td>Step coherence</td><td>3.20</td><td>~3.9</td></tr>
<tr><td>Dish authenticity</td><td>2.70</td><td>~3.25</td></tr>
<tr><td>Overall</td><td>3.58</td><td>~4.0</td></tr>
</table>
<p>The strongest gains were:</p>
<ul>
<li>More consistent <code>Bahan:</code> / <code>Langkah:</code> formatting</li>
<li>Better recipe length discipline</li>
<li>More natural Indonesian cooking vocabulary</li>
<li>Better common-dish ingredient profiles</li>
<li>Better structure for common dishes like tumis, pepes, rendang, sambal, and gulai</li>
</ul>
<h2>Critical Inference Setting</h2>
<p>One important lesson from the project: the fine-tuned model needs repetition control.</p>
<pre><code>model.generate(
**inputs,
max_new_tokens=1280,
do_sample=False,
repetition_penalty=1.05,
no_repeat_ngram_size=6,
pad_token_id=tok.eos_token_id,
)</code></pre>
<p>Without <code>no_repeat_ngram_size=6</code>, long recipes can fall into repeated ingredient-list loops.</p>
<p>For GGUF runtimes such as llama.cpp or LM Studio, use the DRY sampler equivalent with allowed length around 6.</p>
<h2>GGUF Deployment</h2>
<p>The model was also converted to GGUF for local and CPU-friendly use.</p>
<table>
<tr><th>Quant</th><th>Approx. size</th><th>Use case</th></tr>
<tr><td>Q4_K_M</td><td>~3.2 GB</td><td>Default portable version</td></tr>
<tr><td>Q8_0</td><td>~4.7 GB</td><td>Higher quality, more RAM</td></tr>
</table>
<p>The GGUF model can run with llama.cpp, LM Studio, or other GGUF-compatible runtimes.</p>
<h2>What Worked</h2>
<p>The project worked well for:</p>
<ul>
<li>Common Indonesian home-cooking recipes</li>
<li>Structured recipe generation</li>
<li>Concise recipe output</li>
<li>Natural Indonesian recipe phrasing</li>
<li>Common ingredients and cooking methods</li>
</ul>
<p>Examples of stronger categories: Ayam, Ikan, Sapi, Kambing, Tahu, Tempe, Telur, Udang, Sambal, Tumis, Pepes, Rendang-style dishes.</p>
<h2>Limitations</h2>
<ul>
<li>Rare regional dishes can become generic.</li>
<li>Some defining ingredients may be omitted.</li>
<li>Diet or modifier terms such as MPASI, DEBM, basah, or kering may be ignored.</li>
<li>The model may produce plausible but not authentic recipes.</li>
<li>Some outputs may contain minor formatting or fraction glitches.</li>
<li>Recipes should be checked before cooking.</li>
</ul>
<p>The main remaining bottleneck is dataset coverage, especially for regional and specialty dishes.</p>
<h2>Lessons Learned</h2>
<ol>
<li>Use the native ROCm 7.2 PyTorch wheel on MI300X.</li>
<li>Avoid older ROCm wheels for this Gemma 4 bf16 training path.</li>
<li>Use prompt/completion format with TRL for this stack.</li>
<li>Always run a cheap quick-validation training pass before a full run.</li>
<li>Judge the base model before fine-tuning.</li>
<li>Automatic metrics are not enough for recipe quality.</li>
<li><code>no_repeat_ngram_size=6</code> is critical for stable inference.</li>
<li>Dataset coverage matters more than another epoch for rare dishes.</li>
</ol>
<h2>Cost and Runtime</h2>
<table>
<tr><th>Phase</th><th>Approx. cost</th></tr>
<tr><td>Setup and debugging</td><td>~$2.50</td></tr>
<tr><td>Quick validation</td><td>~$1.50</td></tr>
<tr><td>Full training</td><td>~$3.00</td></tr>
<tr><td>Evaluation iterations</td><td>~$2.00</td></tr>
<tr><td>GGUF conversion and upload</td><td>~$1.30</td></tr>
<tr><td>Idle/debugging slack</td><td>~$4.00</td></tr>
<tr><td><strong>Total</strong></td><td><strong>~$14</strong></td></tr>
</table>
<p>Future cycles should be cheaper because the stack and gotchas are now documented.</p>
<h2>Links</h2>
<ul>
<li>Base model: <a href="https://huggingface.co/google/gemma-4-e2b-it">google/gemma-4-e2b-it</a></li>
<li>Fine-tuned model: <a href="https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it">junwatu/resep-ID-gemma-4-E2B-it</a></li>
<li>GGUF model: <a href="https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it-gguf">junwatu/resep-ID-gemma-4-E2B-it-gguf</a></li>
<li>Dataset: <a href="https://huggingface.co/datasets/junwatu/indonesian-recipes">junwatu/indonesian-recipes</a></li>
<li>Live recipe demo: <a href="https://huggingface.co/spaces/junwatu/koki-ai">junwatu/koki-ai</a></li>
</ul>
<hr>
<p><em>This project inherits the Gemma Terms of Use from the base model.</em></p>
</div>
</body>
</html>