Can anyone improve the model using the Rys methodology—by duplicating a block of layers?
Hello!
Gemma 4 is an amazing model! I've loved Gemma models since the second series. Although I didn't pay attention to the first series.
The results are amazing! But here's the thing... They can be even more amazing!
A new model improvement technique was presented.
It's described in these articles:
https://dnhkng.github.io/posts/rys/
https://dnhkng.github.io/posts/rys-ii/
The gist of it is that an exhaustive search finds the block of layers that, when copied, provides the greatest performance boost.
All duplication options are tried, and performance is assessed on very simple benchmarks.
Then the layers are simply copied and pasted into the model architecture.
This allows for a higher model rating on some benchmarks, as the model can spend more time thinking about the problem. Not in terms of the chain of thought, but in terms of the internal work for each token.
I'm really, really curious to see how this will work with this model—Gemma 4 31b. The results could be mind-blowing!
But here's the problem: I have no way to conduct the experiment myself! I simply don't have the necessary computing power.
Please tell me, maybe someone here could try this experiment? Find those treasured layers that need to be repeated to make the model stronger?
I'd be very grateful!
I've written to the author of the method, but he hasn't responded yet. @dnhkng Where you?
Please help us create the most powerful open-source model that can be run on a home computer!
Look, I think the tests the author of the methodology used to validate his models are a bit inadequate. They were optimized for testing speed and the model's mathematical capabilities. There was also an assessment of emotion understanding, but that's not enough.
It would be good to expand the testing a bit. Although that would require a lot of computational effort.
RYS author here. Yes, I will do this when I am back from holiday. Currently on a tropical island with limited internet and no compute.
@Regrin the proxy tests took a long time to develop. With a more comprehensive assay, it needs months or years of testing for a model the size of Gemma4-31b. My proxy tests will still need about 40 hours of H100 time.
Hey everyone, I love your work here
Hmm... Okay... Although, of course, I'd like to add slightly more complex tests. Perhaps they could be run only in cases where the model has shown improvement on simpler tests?
I wouldn't want it to turn out that the model has simply learned to count better...
Could you please tell me what happens if I duplicate a layer twice instead of once?
Or duplicate different layers!
It would be great if llama.cpp added a native ability to duplicate layers during inference without using additional memory.
@Regrin read up about it on my blog, at:
https://dnhkng.github.io/posts/rys/
And:
https://dnhkng.github.io/posts/rys-ii/
It covers your questions.
Listen... You added layers, and the model became smarter.
Maybe we could somehow... Well, remove the layers?
Then the model would become faster, and we could use a lightweight version for speculative decoding!