crazy successs
how did you all gather 1.3k downloads already? what was the record for downloads on a model?
Yes so far this model has been a great success! I believe the record for most downloads within 24 hours was the Qwen3-30B-A3B-Thinking-Gemini-2.5-Flash-Distill. This model got 11k overnight, but not many likes. Didn't turn out as well as I'd hoped, but this model was great. Learned the output style and behavior of Opus very quickly so I didn't really have to destroy the model with SFT. If I remember correctly the model only needed about 400 or 600 steps to converge
I'm writing "Hello" and it gives me a lecture about lilacs and why i should switch to American express card. Any idea why?
Probably a mix of not ideal settings (make sure it uses the recomended top p and temp), high reasoning distill (meaning it is prone to complex tasks and the ground model being for agentic tasks
personally i believe it is overfitting on the high reasoning which makes it expect only complex tasks and complex answers
personally i believe it is overfitting on the high reasoning which makes it expect only complex tasks and complex answers
Yea this is the most likely cause. even though it was done with low batch and only ~3 epochs the overfitting is apparent when you ask it more simple requests. Most of our models react like this to simple requests like "hi" I can counter this by including some more simple Q&A pairs into the dataset with low reasoning effort. In the future I will diversify the datasets more.
This model keeps rambling about absolute crap its like verbal diarrhea just random crap what are the best setting to use to get this working properly
https://unsloth.ai/docs/models/glm-4.7-flash#usage-guide
It's not uncommon to get a bad generation here or there, but these sampling params have generally led me to great responses
I personally set top_k to 50, but haven't played around much to see what exactly works best
Success is likely a result of the bold name which suggests it is some kind of distillation of Claude - which it is certainly not.
Truth is that in order to do proper distillation you need to have logits and do seriously long training which is incredibly expensive endeavor.
What this model is just very slight finetune on raw text extracted from Claude.
Your right that it will never have the raw capacity of Claude, it was just fine tuned to decompose and solve problems like Claude, which gets you a huge boost in performance, this is why Chain of Thought was so transformative for distillation. The point is not to make something that is an open source Claude, the point is to make something that is already good behave more like Claude and through Chain of Thought behaving like Claude makes it perform more like Claude.
I bet its something about the Unsloth notebook, which one did you use?
If it is overfitting just mix in your Convo dataset. Thatβs what I use.
Your right that it will never have the raw capacity of Claude, it was just fine tuned to decompose and solve problems like Claude, which gets you a huge boost in performance, this is why Chain of Thought was so transformative for distillation. The point is not to make something that is an open source Claude, the point is to make something that is already good behave more like Claude and through Chain of Thought behaving like Claude makes it perform more like Claude.
Exactly it's a reasoning distillation where we force the model into claude-like reasoning, not a knowledge distillation or a full scale distillation by any means
Exactly we are not making the next Phi
Btw check out my Datagen companion prompt generator PromptGen purpose built for TeichAI: https://github.com/bobthe144th/PromptGen. It takes the topic domains listed on the requests page and asks you to select the percentages
I created this on Saturday so it is constantly maintained, it is meant to run in the unsloth docker container. I will try to fix any bugs.
Btw check out my Datagen companion prompt generator PromptGen purpose built for TeichAI: https://github.com/bobthe144th/PromptGen. It takes the topic domains listed on the requests page and asks you to select the percentages
I created this on Saturday so it is constantly maintained, it is meant to run in the unsloth docker container. I will try to fix any bugs.
Will give it a look, thanks π
pretty sure they already have smth similar https://github.com/TeichAI/datagen
His tool seems to be a helper for generating the prompts.txt file
Exactly, it asks how many prompts you want to generate, asks for the percentage that each prompt topic from your TeichAI/Requests will get, generates that with Qwen 3 8B (you might want to switch it out with a bigger model for more advanced prompts), and formats it into a .md file with one prompt per line, as is required by your DataGen tool. On my 5070 Ti it took me about 8 minutes to generate 500 prompts and I am actively working on optimizations (I just implemented advanced prompt caching).
So we got new unsloth/GLM-4.7-Flash-GGUF model uploaded 5 days ago. Are we gonna get updates in this model too???
Yep, v2 coming soon :)
Yep, v2 coming soon :)
Edit: I dont see any major updates. perhaps it's just the gguf patches he made not a model tune
yeah, I bet it was a llama.cpp bug affecting the top_k or top_p