Boba Food VLM 0.8B (GGUF)
On-device food photo to per-ingredient nutrition estimation model.
Model Description
Fine-tuned Qwen3.5-0.8B for food recognition and calorie estimation from photos. Outputs structured JSON with per-ingredient name, portion (grams), calories, protein, carbs, and fat.
- Base model: Qwen/Qwen3.5-0.8B
- Training method: LoRA (r=64, alpha=128, rsLoRA)
- Training data: Nutrition5k (4,051 images with measured per-ingredient nutrition)
- Eval benchmark: Nutrition5k test set (506 images, same split as CalorieLLaVA)
- Best Cal MAE: 112.3 kcal (step 1000)
- Parse rate: 100%
- Pearson r: 0.73
Files
| File | Size | Description |
|---|---|---|
| boba-q4km.gguf | 505 MB | Main LLM (Q4_K_M quantized) |
| boba-mmproj-f16.gguf | 196 MB | Vision projection model (F16) |
| boba-f16.gguf | 1.5 GB | Main LLM (F16, full precision) |
Benchmark Results
| Model | Cal MAE | On-Device | Per-Ingredient |
|---|---|---|---|
| CalorieLLaVA-13B | 64.3 | No | No |
| GPT-4o zero-shot | 82.7 | No | No |
| Boba 0.8B (this model) | 112.3 | Yes | Yes |
| 0.8B baseline (no training) | 131.2 | Yes | Yes |
First published on-device food VLM with per-ingredient nutrition output.
License
Apache 2.0
- Downloads last month
- 959
Hardware compatibility
Log In to add your hardware
16-bit