Boba Food VLM 0.8B (GGUF)

On-device food photo to per-ingredient nutrition estimation model.

Model Description

Fine-tuned Qwen3.5-0.8B for food recognition and calorie estimation from photos. Outputs structured JSON with per-ingredient name, portion (grams), calories, protein, carbs, and fat.

Base model: Qwen/Qwen3.5-0.8B
Training method: LoRA (r=64, alpha=128, rsLoRA)
Training data: Nutrition5k (4,051 images with measured per-ingredient nutrition)
Eval benchmark: Nutrition5k test set (506 images, same split as CalorieLLaVA)
Best Cal MAE: 112.3 kcal (step 1000)
Parse rate: 100%
Pearson r: 0.73

Files

File	Size	Description
boba-q4km.gguf	505 MB	Main LLM (Q4_K_M quantized)
boba-mmproj-f16.gguf	196 MB	Vision projection model (F16)
boba-f16.gguf	1.5 GB	Main LLM (F16, full precision)

Benchmark Results

Model	Cal MAE	On-Device	Per-Ingredient
CalorieLLaVA-13B	64.3	No	No
GPT-4o zero-shot	82.7	No	No
Boba 0.8B (this model)	112.3	Yes	Yes
0.8B baseline (no training)	131.2	Yes	Yes

First published on-device food VLM with per-ingredient nutrition output.

License

Apache 2.0

Downloads last month: 959

GGUF

Model size

0.8B params

Architecture

qwen35

Hardware compatibility

16-bit

View +1 variant

Model tree for Doses-AI/boba-0.8b-food-GGUF

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Quantized

(93)

this model