Can anyone improve the model using the Rys methodology—by duplicating a block of layers?

#60

by Regrin - opened about 22 hours ago

•

Hello!
Gemma 4 is an amazing model! I've loved Gemma models since the second series. Although I didn't pay attention to the first series.

The results are amazing! But here's the thing... They can be even more amazing!

A new model improvement technique was presented.
It's described in these articles:
https://dnhkng.github.io/posts/rys/
https://dnhkng.github.io/posts/rys-ii/

The gist of it is that an exhaustive search finds the block of layers that, when copied, provides the greatest performance boost.
All duplication options are tried, and performance is assessed on very simple benchmarks.
Then the layers are simply copied and pasted into the model architecture.

This allows for a higher model rating on some benchmarks, as the model can spend more time thinking about the problem. Not in terms of the chain of thought, but in terms of the internal work for each token.

I'm really, really curious to see how this will work with this model—Gemma 4 31b. The results could be mind-blowing!

But here's the problem: I have no way to conduct the experiment myself! I simply don't have the necessary computing power.

Please tell me, maybe someone here could try this experiment? Find those treasured layers that need to be repeated to make the model stronger?

I'd be very grateful!
I've written to the author of the method, but he hasn't responded yet. @dnhkng Where you?

Please help us create the most powerful open-source model that can be run on a home computer!

Regrin

about 22 hours ago

Look, I think the tests the author of the methodology used to validate his models are a bit inadequate. They were optimized for testing speed and the model's mathematical capabilities. There was also an assessment of emotion understanding, but that's not enough.

It would be good to expand the testing a bit. Although that would require a lot of computational effort.

dnhkng

about 13 hours ago

•

edited about 13 hours ago

RYS author here. Yes, I will do this when I am back from holiday. Currently on a tropical island with limited internet and no compute.

@Regrin the proxy tests took a long time to develop. With a more comprehensive assay, it needs months or years of testing for a model the size of Gemma4-31b. My proxy tests will still need about 40 hours of H100 time.

Jojopork67

about 13 hours ago

Hey everyone, I love your work here

Regrin

about 9 hours ago

Hmm... Okay... Although, of course, I'd like to add slightly more complex tests. Perhaps they could be run only in cases where the model has shown improvement on simpler tests?

I wouldn't want it to turn out that the model has simply learned to count better...

Could you please tell me what happens if I duplicate a layer twice instead of once?

Or duplicate different layers!

It would be great if llama.cpp added a native ability to duplicate layers during inference without using additional memory.

Regrin

about 5 hours ago

@dnhkng
Could you please contact the llama.cpp developers and ask them to add a layer duplication feature? So that layers can be processed multiple times without wasting memory!
I would be very, very, very happy to have such a feature! I'm low on memory, and I need more powerful models!

dnhkng

about 4 hours ago

@Regrin read up about it on my blog, at:
https://dnhkng.github.io/posts/rys/
And:
https://dnhkng.github.io/posts/rys-ii/

It covers your questions.

Regrin

about 4 hours ago

Listen... You added layers, and the model became smarter.
Maybe we could somehow... Well, remove the layers?

Then the model would become faster, and we could use a lightweight version for speculative decoding!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment