Question about the base model used for nanonets/Nanonets-OCR-s (relation to Qwen/Qwen2.5-VL-3B-Instruct)
Hi,
Thank you for releasing nanonets/Nanonets-OCR-s β Iβm evaluating it for an OCR-related use case.
I noticed it referenced alongside Qwen/Qwen2.5-VL-3B-Instruct, and I wanted to confirm the relationship so I can make the right compatibility assumptions (tokenizer/architecture expectations, etc.) before integrating it.
Could you please clarify whether nanonets/Nanonets-OCR-s was built directly on top of Qwen/Qwen2.5-VL-3B-Instruct via a straightforward fine-tuning step, or whether there were intermediate checkpoints, additional training phases, merges, distillation steps, or other released models involved?
In practice, should this model be treated as a direct derivative of Qwen/Qwen2.5-VL-3B-Instruct, or as something that has gone through additional modification steps beyond a simple direct fine-tuning path?
Thanks very much for your time β any clarification would be greatly appreciated.
Best,
Qu