Boba Food VLM 0.8B (GGUF)

On-device food photo to per-ingredient nutrition estimation model.

Model Description

Fine-tuned Qwen3.5-0.8B for food recognition and calorie estimation from photos. Outputs structured JSON with per-ingredient name, portion (grams), calories, protein, carbs, and fat.

  • Base model: Qwen/Qwen3.5-0.8B
  • Training method: LoRA (r=64, alpha=128, rsLoRA)
  • Training data: Nutrition5k (4,051 images with measured per-ingredient nutrition)
  • Eval benchmark: Nutrition5k test set (506 images, same split as CalorieLLaVA)
  • Best Cal MAE: 112.3 kcal (step 1000)
  • Parse rate: 100%
  • Pearson r: 0.73

Files

File Size Description
boba-q4km.gguf 505 MB Main LLM (Q4_K_M quantized)
boba-mmproj-f16.gguf 196 MB Vision projection model (F16)
boba-f16.gguf 1.5 GB Main LLM (F16, full precision)

Benchmark Results

Model Cal MAE On-Device Per-Ingredient
CalorieLLaVA-13B 64.3 No No
GPT-4o zero-shot 82.7 No No
Boba 0.8B (this model) 112.3 Yes Yes
0.8B baseline (no training) 131.2 Yes Yes

First published on-device food VLM with per-ingredient nutrition output.

License

Apache 2.0

Downloads last month
959
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Doses-AI/boba-0.8b-food-GGUF

Quantized
(93)
this model