These are the datasets that can be used to finetune small LLMs to reach the level of the closed models and large open LLMs