Query about model training details

#1
by Tangchiu - opened

I'm really impressed by your fine-tuned NANONET_CORRECT_V1. I am currently conducting research on model lineage and sub-model relationships based on this architecture, and your work provides an excellent case study.

I would love to gain more insights into your training recipe:

FT Details: Could you share the key hyperparameters (e.g., learning rate, scheduler, and total epochs) and the dataset composition/size?

Version Iteration: Regarding the relationship among V1, V2, and V3, were they fine-tuned independently from the same base model, or were they developed sequentially (i.e., V2 is a further fine-tune of V1)?

Your insights would be invaluable for my research. We can discuss here, or if you prefer a more detailed technical exchange, I’d be happy to follow up via email.

Thanks for your great contribution to the community!

thanks but on what did u test this model on ?

Thanks for your response! I would like to test it primarily on OCR post-processing tasks
Could you share a bit about the hyperparameters, dataset, and how V1/V2/V3 relate? Even a high-level overview would help me. Appreciate your time!!!

mostly this was sfted on a private dataset which i cant disclose as its NDA,hyperparams were more of experimetal and I dont rememebr it tbh sorry!

Thanks for the clarification! Regarding the model lineage, I just want to confirm the high-level fine-tuning path for my research. Was it:

A) Parallel: Base -> V1, Base -> V2, Base -> V3 (independent runs)
B) Sequential: Base -> V1 -> V2 -> V3 (iterative fine-tuning)

Thanks again for your time!

Sign up or log in to comment