Question about the base model used for nanonets/Nanonets-OCR-s (relation to Qwen/Qwen2.5-VL-3B-Instruct)

#38

by s1ngledoge - opened Mar 8

Mar 8

Hi,

Thank you for releasing nanonets/Nanonets-OCR-s — I’m evaluating it for an OCR-related use case.

I noticed it referenced alongside Qwen/Qwen2.5-VL-3B-Instruct, and I wanted to confirm the relationship so I can make the right compatibility assumptions (tokenizer/architecture expectations, etc.) before integrating it.

Could you please clarify whether nanonets/Nanonets-OCR-s was built directly on top of Qwen/Qwen2.5-VL-3B-Instruct via a straightforward fine-tuning step, or whether there were intermediate checkpoints, additional training phases, merges, distillation steps, or other released models involved?

In practice, should this model be treated as a direct derivative of Qwen/Qwen2.5-VL-3B-Instruct, or as something that has gone through additional modification steps beyond a simple direct fine-tuning path?

Thanks very much for your time — any clarification would be greatly appreciated.

Best,
Qu

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment