I love the french β™₯️

#5
by SicariusSicariiStuff - opened

Thank you so much for this, finally a llama-3 replacement 🀌

Open source is saved :)

Sorry to ask, but Llama 3???

Unless there's a very specific use case, haven't there been many LLMs that are able to replace Llama 3 lately?

I guess it's related to being a dense models >= 70B

Sorry to ask, but Llama 3???

Unless there's a very specific use case, haven't there been many LLMs that are able to replace Llama 3 lately?

MOEs are really problamatic to tune, LLAMA-4 was... not so.. good.

The best creative fine-tuned models are dense, this one is dense, 96 attention heads and VERY long context.

It's perfect!
The community literally waited for something like this for years.

Thank you Mistral for supporting local <3

Sorry to ask, but Llama 3???

Unless there's a very specific use case, haven't there been many LLMs that are able to replace Llama 3 lately?

MOEs are really problamatic to tune, LLAMA-4 was... not so.. good.

The best creative fine-tuned models are dense, this one is dense, 96 attention heads and VERY long context.

It's perfect!
The community literally waited for something like this for years.

Thank you Mistral for supporting local <3

I wonder if you'll be able to finetune this gigantic model. You did great on Assistant Pepe 70B (unfortunately was based on L3.1 instead of 3.3 πŸ˜₯), I don't know how much compute you have access to but M3.5 128B clearly has impressive parameter to benchmark performance. 40B+ dense models are always intriguing πŸ˜€

Also before you all complain about theoretical slowness, they released a supplemental speculative decoding model designed to boost it past 50 tok/s: https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE

@SicariusSicariiStuff ghooseposter spotted....

Sign up or log in to comment