Small error in the model overview section of the README
#13
by Luca3700 - opened
Hi,
I would like to signal a small error in the above mentioned section.
It is reported "Hidden Layout: 16 Γ (3 Γ (Gated DeltaNet β MoE) β 1 Γ (Gated Attention β MoE))",
but I think it should be "12x" instead of "16x", since the total number of layer is 48 (as also stated in the config.json file).
Thank you for your awesome work
thank you too
jklj077 changed discussion status to closed