junwatu commited on
Commit
2c7dd03
·
verified ·
1 Parent(s): 83f9d54

Add Resep ID Gemma 4 project explainer Space

Browse files
Files changed (4) hide show
  1. README.md +306 -6
  2. __pycache__/app.cpython-312.pyc +0 -0
  3. app.py +145 -0
  4. requirements.txt +1 -0
README.md CHANGED
@@ -1,13 +1,313 @@
1
  ---
2
  title: Resep ID Gemma 4
3
- emoji: 📊
4
- colorFrom: yellow
5
- colorTo: gray
6
  sdk: gradio
7
- sdk_version: 6.14.0
8
- python_version: '3.13'
9
  app_file: app.py
10
  pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Resep ID Gemma 4
3
+ emoji: 🍲
4
+ colorFrom: red
5
+ colorTo: yellow
6
  sdk: gradio
7
+ sdk_version: 5.0.0
 
8
  app_file: app.py
9
  pinned: false
10
+ license: gemma
11
+ short_description: Gemma 4 Indonesian recipe fine-tune case study
12
+ models:
13
+ - google/gemma-4-e2b-it
14
+ - junwatu/resep-ID-gemma-4-E2B-it
15
+ - junwatu/resep-ID-gemma-4-E2B-it-gguf
16
+ datasets:
17
+ - junwatu/indonesian-recipes
18
+ tags:
19
+ - gemma
20
+ - gemma-4
21
+ - fine-tuning
22
+ - mi300x
23
+ - rocm
24
+ - indonesian
25
+ - recipes
26
+ - gguf
27
+ - text-generation
28
  ---
29
 
30
+ # Resep ID Gemma 4
31
+
32
+ This Space explains an end-to-end fine-tuning project: taking `google/gemma-4-e2b-it`, adapting it to Indonesian recipe generation, evaluating the result, quantizing it to GGUF, and deploying it as a lightweight recipe assistant.
33
+
34
+ The goal was simple:
35
+
36
+ > Given an Indonesian dish title, generate a structured recipe with `Bahan:` and `Langkah:` in natural Bahasa Indonesia.
37
+
38
+ Example input:
39
+
40
+ ```text
41
+ Tulis resep masakan Indonesia berjudul: "Tumis Kangkung Tempe".
42
+ ```
43
+
44
+ Expected output shape:
45
+
46
+ ```text
47
+ Bahan:
48
+ - ...
49
+ - ...
50
+
51
+ Langkah:
52
+ 1. ...
53
+ 2. ...
54
+ ```
55
+
56
+ ## Project Summary
57
+
58
+ | Item | Details |
59
+ |---|---|
60
+ | Base model | `google/gemma-4-e2b-it` |
61
+ | Fine-tuned model | `junwatu/resep-ID-gemma-4-E2B-it` |
62
+ | GGUF model | `junwatu/resep-ID-gemma-4-E2B-it-gguf` |
63
+ | Dataset | `junwatu/indonesian-recipes` |
64
+ | Task | Indonesian recipe generation |
65
+ | Training hardware | AMD Instinct MI300X |
66
+ | GPU memory | 192 GB HBM3 class |
67
+ | Software stack | ROCm 7.2, PyTorch ROCm wheel, Transformers 5.x, TRL 1.x |
68
+ | Training method | Full supervised fine-tune |
69
+ | Training data | 66,419 recipes |
70
+ | Validation data | 1,748 recipes |
71
+ | Held-out test data | 1,748 recipes |
72
+ | Final deployment format | Safetensors + GGUF Q4_K_M / Q8_0 |
73
+
74
+ ## Why Fine-Tune?
75
+
76
+ The base Gemma 4 model was already fluent in Indonesian, but it often missed the identity of specific Indonesian dishes.
77
+
78
+ For example, the base model could produce a plausible recipe, but not always the right recipe. It struggled with regional or highly specific dishes such as:
79
+
80
+ - Sosis Solo
81
+ - Tahu Thek
82
+ - Tempe Mendoan
83
+ - Tahu Walik Aci
84
+ - Kering Tempe Pete
85
+ - DEBM / MPASI recipe variants
86
+
87
+ A baseline evaluation on 50 held-out recipes showed the main gap:
88
+
89
+ | Dimension | Base Gemma 4 E2B |
90
+ |---|---:|
91
+ | Language fidelity | 5.00 |
92
+ | Format compliance | 3.90 |
93
+ | Ingredient plausibility | 3.10 |
94
+ | Step coherence | 3.20 |
95
+ | Dish authenticity | 2.70 |
96
+ | Overall | 3.58 |
97
+
98
+ The key weakness was `dish_authenticity`: the model was fluent, but too often produced a generic Indonesian recipe instead of the requested dish.
99
+
100
+ ## Dataset
101
+
102
+ The dataset contains structured Indonesian home-cooking recipes.
103
+
104
+ Each row has:
105
+
106
+ | Field | Description |
107
+ |---|---|
108
+ | `title` | Recipe name |
109
+ | `ingredients` | List of ingredient lines |
110
+ | `steps` | Ordered cooking steps |
111
+ | `num_ingredients` | Ingredient count |
112
+ | `num_steps` | Step count |
113
+ | `char_count` | Approximate recipe length |
114
+
115
+ The project converts the original parquet files into JSONL splits:
116
+
117
+ ```text
118
+ data/processed/train.jsonl
119
+ data/processed/val.jsonl
120
+ data/processed/test.jsonl
121
+ ```
122
+
123
+ The held-out test split is not used for training. It is used only for pre/post fine-tune comparison.
124
+
125
+ ## Training Setup
126
+
127
+ The fine-tune used a single AMD MI300X GPU on ROCm 7.2.
128
+
129
+ Important training choices:
130
+
131
+ - Full fine-tune instead of LoRA
132
+ - bf16 training
133
+ - 1 epoch
134
+ - Effective batch size 16
135
+ - Max sequence length 2048
136
+ - Cosine learning-rate schedule
137
+ - 3% warmup
138
+ - Gradient checkpointing enabled
139
+ - Vision/audio paths frozen because this task is text-only
140
+
141
+ Gemma 4 is multimodal, but this project trains only the text path:
142
+
143
+ ```text
144
+ Train:
145
+ - model.language_model.*
146
+ - lm_head
147
+
148
+ Freeze:
149
+ - vision tower
150
+ - audio tower
151
+ - vision/audio adapters
152
+ ```
153
+
154
+ ## Training Format
155
+
156
+ The project uses TRL prompt/completion conversational format:
157
+
158
+ ```json
159
+ {
160
+ "prompt": [
161
+ {
162
+ "role": "user",
163
+ "content": "Tulis resep masakan Indonesia berjudul: \"Tumis Kangkung Tempe\"..."
164
+ }
165
+ ],
166
+ "completion": [
167
+ {
168
+ "role": "assistant",
169
+ "content": "Bahan:\n- ...\n\nLangkah:\n1. ..."
170
+ }
171
+ ]
172
+ }
173
+ ```
174
+
175
+ This format was important. In this stack, the alternative `messages` format with `assistant_only_loss=True` caused unstable loss behavior.
176
+
177
+ ## Results
178
+
179
+ The fine-tuned model improved the practical recipe-generation behavior.
180
+
181
+ | Dimension | Base | Fine-tuned |
182
+ |---|---:|---:|
183
+ | Language fidelity | 5.00 | ~4.6 |
184
+ | Format compliance | 3.90 | ~4.95 |
185
+ | Ingredient plausibility | 3.10 | ~3.5 |
186
+ | Step coherence | 3.20 | ~3.9 |
187
+ | Dish authenticity | 2.70 | ~3.25 |
188
+ | Overall | 3.58 | ~4.0 |
189
+
190
+ The strongest gains were:
191
+
192
+ - More consistent `Bahan:` / `Langkah:` formatting
193
+ - Better recipe length discipline
194
+ - More natural Indonesian cooking vocabulary
195
+ - Better common-dish ingredient profiles
196
+ - Better structure for common dishes like tumis, pepes, rendang, sambal, and gulai
197
+
198
+ ## Critical Inference Setting
199
+
200
+ One important lesson from the project: the fine-tuned model needs repetition control.
201
+
202
+ For Hugging Face Transformers inference, use:
203
+
204
+ ```python
205
+ model.generate(
206
+ **inputs,
207
+ max_new_tokens=1280,
208
+ do_sample=False,
209
+ repetition_penalty=1.05,
210
+ no_repeat_ngram_size=6,
211
+ pad_token_id=tok.eos_token_id,
212
+ )
213
+ ```
214
+
215
+ Without `no_repeat_ngram_size=6`, long recipes can fall into repeated ingredient-list loops.
216
+
217
+ For GGUF runtimes such as llama.cpp or LM Studio, use the DRY sampler equivalent with allowed length around 6.
218
+
219
+ ## GGUF Deployment
220
+
221
+ The model was also converted to GGUF for local and CPU-friendly use.
222
+
223
+ Available quantizations:
224
+
225
+ | Quant | Approx. size | Use case |
226
+ |---|---:|---|
227
+ | Q4_K_M | ~3.2 GB | Default portable version |
228
+ | Q8_0 | ~4.7 GB | Higher quality, more RAM |
229
+
230
+ The GGUF model can run with llama.cpp, LM Studio, or other GGUF-compatible runtimes.
231
+
232
+ ## What Worked
233
+
234
+ The project worked well for:
235
+
236
+ - Common Indonesian home-cooking recipes
237
+ - Structured recipe generation
238
+ - Concise recipe output
239
+ - Natural Indonesian recipe phrasing
240
+ - Common ingredients and cooking methods
241
+
242
+ Examples of stronger categories:
243
+
244
+ - Ayam
245
+ - Ikan
246
+ - Sapi
247
+ - Kambing
248
+ - Tahu
249
+ - Tempe
250
+ - Telur
251
+ - Udang
252
+ - Sambal
253
+ - Tumis
254
+ - Pepes
255
+ - Rendang-style dishes
256
+
257
+ ## Limitations
258
+
259
+ This is not a perfect cookbook model.
260
+
261
+ Known limitations:
262
+
263
+ - Rare regional dishes can become generic.
264
+ - Some defining ingredients may be omitted.
265
+ - Diet or modifier terms such as MPASI, DEBM, basah, or kering may be ignored.
266
+ - The model may produce plausible but not authentic recipes.
267
+ - Some outputs may contain minor formatting or fraction glitches.
268
+ - Recipes should be checked before cooking.
269
+
270
+ The main remaining bottleneck is dataset coverage, especially for regional and specialty dishes.
271
+
272
+ ## Lessons Learned
273
+
274
+ The biggest technical lessons:
275
+
276
+ 1. Use the native ROCm 7.2 PyTorch wheel on MI300X.
277
+ 2. Avoid older ROCm wheels for this Gemma 4 bf16 training path.
278
+ 3. Use prompt/completion format with TRL for this stack.
279
+ 4. Always run a cheap quick-validation training pass before a full run.
280
+ 5. Judge the base model before fine-tuning.
281
+ 6. Automatic metrics are not enough for recipe quality.
282
+ 7. `no_repeat_ngram_size=6` is critical for stable inference.
283
+ 8. Dataset coverage matters more than another epoch for rare dishes.
284
+
285
+ ## Cost and Runtime
286
+
287
+ The full successful cycle was inexpensive because MI300X training was fast for this model size.
288
+
289
+ Approximate reference run:
290
+
291
+ | Phase | Approx. cost |
292
+ |---|---:|
293
+ | Setup and debugging | ~$2.50 |
294
+ | Quick validation | ~$1.50 |
295
+ | Full training | ~$3.00 |
296
+ | Evaluation iterations | ~$2.00 |
297
+ | GGUF conversion and upload | ~$1.30 |
298
+ | Idle/debugging slack | ~$4.00 |
299
+ | Total | ~$14 |
300
+
301
+ Future cycles should be cheaper because the stack and gotchas are now documented.
302
+
303
+ ## Links
304
+
305
+ - Base model: [`google/gemma-4-e2b-it`](https://huggingface.co/google/gemma-4-e2b-it)
306
+ - Fine-tuned model: [`junwatu/resep-ID-gemma-4-E2B-it`](https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it)
307
+ - GGUF model: [`junwatu/resep-ID-gemma-4-E2B-it-gguf`](https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it-gguf)
308
+ - Dataset: [`junwatu/indonesian-recipes`](https://huggingface.co/datasets/junwatu/indonesian-recipes)
309
+ - Live recipe demo: [`junwatu/koki-ai`](https://huggingface.co/spaces/junwatu/koki-ai)
310
+
311
+ ## License
312
+
313
+ This project inherits the Gemma Terms of Use from the base model.
__pycache__/app.cpython-312.pyc ADDED
Binary file (5.69 kB). View file
 
app.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import gradio as gr
4
+
5
+
6
+ OVERVIEW = """
7
+ # MI300X Gemma 4 Indonesian Recipe Fine-Tune
8
+
9
+ This Space explains an end-to-end fine-tuning project: taking `google/gemma-4-e2b-it`,
10
+ adapting it to Indonesian recipe generation, evaluating the result, quantizing it to
11
+ GGUF, and deploying it as a lightweight recipe assistant.
12
+
13
+ The task is simple: given an Indonesian dish title, generate a structured recipe with
14
+ `Bahan:` and `Langkah:` in natural Bahasa Indonesia.
15
+
16
+ | Item | Details |
17
+ |---|---|
18
+ | Base model | `google/gemma-4-e2b-it` |
19
+ | Fine-tuned model | `junwatu/resep-ID-gemma-4-E2B-it` |
20
+ | GGUF model | `junwatu/resep-ID-gemma-4-E2B-it-gguf` |
21
+ | Dataset | `junwatu/indonesian-recipes` |
22
+ | Training hardware | AMD Instinct MI300X |
23
+ | Training method | Full supervised fine-tune |
24
+ """
25
+
26
+ DATASET = """
27
+ ## Dataset
28
+
29
+ The dataset contains structured Indonesian home-cooking recipes with `title`,
30
+ `ingredients`, and `steps`.
31
+
32
+ | Split | Count | Use |
33
+ |---|---:|---|
34
+ | Train | 66,419 | Fine-tuning |
35
+ | Validation | 1,748 | Eval loss during training |
36
+ | Test | 1,748 | Held-out pre/post evaluation |
37
+
38
+ The held-out test split is not used for training. It is reserved for comparing the
39
+ base model and fine-tuned model on the same examples.
40
+ """
41
+
42
+ TRAINING = """
43
+ ## Training Setup
44
+
45
+ The fine-tune used a single AMD MI300X GPU on ROCm 7.2.
46
+
47
+ - Full fine-tune instead of LoRA
48
+ - bf16 training
49
+ - 1 epoch
50
+ - Effective batch size 16
51
+ - Max sequence length 2048
52
+ - Cosine learning-rate schedule
53
+ - 3% warmup
54
+ - Gradient checkpointing enabled
55
+ - Vision/audio paths frozen because this task is text-only
56
+
57
+ The project uses TRL prompt/completion conversational format. This avoided the
58
+ unstable loss behavior seen with the `messages + assistant_only_loss=True` path
59
+ in this stack.
60
+ """
61
+
62
+ EVALUATION = """
63
+ ## Evaluation Results
64
+
65
+ The base model was fluent in Indonesian, but weak on dish authenticity. The
66
+ fine-tuned model improved practical recipe behavior.
67
+
68
+ | Dimension | Base | Fine-tuned |
69
+ |---|---:|---:|
70
+ | Language fidelity | 5.00 | ~4.6 |
71
+ | Format compliance | 3.90 | ~4.95 |
72
+ | Ingredient plausibility | 3.10 | ~3.5 |
73
+ | Step coherence | 3.20 | ~3.9 |
74
+ | Dish authenticity | 2.70 | ~3.25 |
75
+ | Overall | 3.58 | ~4.0 |
76
+
77
+ The strongest gains were format consistency, length discipline, natural Indonesian
78
+ cooking vocabulary, and better common-dish ingredient profiles.
79
+ """
80
+
81
+ DEPLOYMENT = """
82
+ ## Deployment
83
+
84
+ The model was shipped as both safetensors and GGUF.
85
+
86
+ | Quant | Approx. size | Use case |
87
+ |---|---:|---|
88
+ | Q4_K_M | ~3.2 GB | Default portable version |
89
+ | Q8_0 | ~4.7 GB | Higher quality, more RAM |
90
+
91
+ For Hugging Face Transformers inference, the critical generation setting is:
92
+
93
+ ```python
94
+ no_repeat_ngram_size=6
95
+ ```
96
+
97
+ Without it, long recipes can fall into repeated ingredient-list loops. For GGUF
98
+ runtimes such as llama.cpp or LM Studio, use the DRY sampler equivalent with
99
+ allowed length around 6.
100
+ """
101
+
102
+ LESSONS = """
103
+ ## Lessons Learned
104
+
105
+ 1. Use the native ROCm 7.2 PyTorch wheel on MI300X.
106
+ 2. Avoid older ROCm wheels for this Gemma 4 bf16 training path.
107
+ 3. Use prompt/completion format with TRL for this stack.
108
+ 4. Always run a cheap quick-validation training pass before a full run.
109
+ 5. Judge the base model before fine-tuning.
110
+ 6. Automatic metrics are not enough for recipe quality.
111
+ 7. `no_repeat_ngram_size=6` is critical for stable inference.
112
+ 8. Dataset coverage matters more than another epoch for rare dishes.
113
+
114
+ ## Links
115
+
116
+ - Base model: https://huggingface.co/google/gemma-4-e2b-it
117
+ - Fine-tuned model: https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it
118
+ - GGUF model: https://huggingface.co/junwatu/resep-ID-gemma-4-E2B-it-gguf
119
+ - Dataset: https://huggingface.co/datasets/junwatu/indonesian-recipes
120
+ - Live recipe demo: https://huggingface.co/spaces/junwatu/koki-ai
121
+ """
122
+
123
+
124
+ with gr.Blocks(title="Resep ID Gemma 4") as demo:
125
+ gr.Markdown("# Resep ID Gemma 4")
126
+ gr.Markdown(
127
+ "A compact case study on fine-tuning Gemma 4 for Indonesian recipe generation."
128
+ )
129
+ with gr.Tabs():
130
+ with gr.Tab("Overview"):
131
+ gr.Markdown(OVERVIEW)
132
+ with gr.Tab("Dataset"):
133
+ gr.Markdown(DATASET)
134
+ with gr.Tab("Training"):
135
+ gr.Markdown(TRAINING)
136
+ with gr.Tab("Evaluation"):
137
+ gr.Markdown(EVALUATION)
138
+ with gr.Tab("Deployment"):
139
+ gr.Markdown(DEPLOYMENT)
140
+ with gr.Tab("Lessons"):
141
+ gr.Markdown(LESSONS)
142
+
143
+
144
+ if __name__ == "__main__":
145
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ gradio>=5.0