Question about the base model used for nanonets/Nanonets-OCR-s (relation to Qwen/Qwen2.5-VL-3B-Instruct)

#38
by s1ngledoge - opened

Hi,

Thank you for releasing nanonets/Nanonets-OCR-s β€” I’m evaluating it for an OCR-related use case.

I noticed it referenced alongside Qwen/Qwen2.5-VL-3B-Instruct, and I wanted to confirm the relationship so I can make the right compatibility assumptions (tokenizer/architecture expectations, etc.) before integrating it.

Could you please clarify whether nanonets/Nanonets-OCR-s was built directly on top of Qwen/Qwen2.5-VL-3B-Instruct via a straightforward fine-tuning step, or whether there were intermediate checkpoints, additional training phases, merges, distillation steps, or other released models involved?

In practice, should this model be treated as a direct derivative of Qwen/Qwen2.5-VL-3B-Instruct, or as something that has gone through additional modification steps beyond a simple direct fine-tuning path?

Thanks very much for your time β€” any clarification would be greatly appreciated.

Best,
Qu

Sign up or log in to comment