Impressive. Very nice model.

#1
by srgtsrgtrs - opened

Question about ChatML instructions though; What's special about using ChatML VS the Mistral templates? It always seems like ChatML is incredibly bugged in sillytavern, in fact, always has been. Qwen is unusable due to its reliance on this instruction set and it causes all kinds of strange hallucinations. In some cases, causing chinese letters to show up.
In the case of your model, it just degrades prose quality. I use your v3.0 model with Mistral V7 templates and it works amazingly well.
I also exclusively use the Q8 quants since picking up a 16gb card last year.

XeyonAI org

Hey! Great question. ChatML has quirks for sure - SillyTavern's implementation can be inconsistent, and some models do struggle with it (especially Qwen's rigid instruction handling causing those weird artifacts).
The reason we stuck with ChatML for Helcyon is consistency across tools. Most local LLM frontends (llama.cpp, LM Studio, Kobold, etc.) handle ChatML reliably, and it keeps the model flexible for different use cases without needing template switching.
That said, Mistral V7 templates work great too. If you're having better results with v3.0 on Mistral format, that's totally valid! The model was trained on both ChatML and Mistral-style data, so it should handle either cleanly.
Helcyon 3.2 (just released) has cleaner instruction following and better prose quality overall - might be worth testing with your Mistral V7 setup to see if it's even smoother for you.
Appreciate the feedback! Always curious what works best for different setups

Hey! Great question. ChatML has quirks for sure - SillyTavern's implementation can be inconsistent, and some models do struggle with it (especially Qwen's rigid instruction handling causing those weird artifacts).
The reason we stuck with ChatML for Helcyon is consistency across tools. Most local LLM frontends (llama.cpp, LM Studio, Kobold, etc.) handle ChatML reliably, and it keeps the model flexible for different use cases without needing template switching.
That said, Mistral V7 templates work great too. If you're having better results with v3.0 on Mistral format, that's totally valid! The model was trained on both ChatML and Mistral-style data, so it should handle either cleanly.
Helcyon 3.2 (just released) has cleaner instruction following and better prose quality overall - might be worth testing with your Mistral V7 setup to see if it's even smoother for you.
Appreciate the feedback! Always curious what works best for different setups

is there anywhere i can reach out to you to send some chat logs? Or just my entire sillytavern output even. I'm very curious as to why, with basically every model, certain oddities happen. Like characters saying something completely illogical, and after around 14k tokens of an ERP it starts going off the rails with repetition and nonsensical responses here and there.. I was always told it's "Garbage in garbage out", but there's really no golden rule to using these things as far as i understand. Your model's pretty damn good with character accuracy and fun back and forth responses, but they do get a bit.. overly horny lol. Which may be the entire point of the model and it's cool by me with how creative it gets about it.

XeyonAI org

Hey mate, thanks for the feedback and kind words! Really appreciate you taking the time.
Re: the horny issue - yeah, that's fair criticism πŸ˜‚ Good news: I've got 3.6 in the works right now and I'm specifically addressing that. Adding training to make the model slow down and build connection first rather than jumping straight to explicit content. Should hit the sweet spot of "creative when appropriate" without the instant horniness. Watch this space.
Re: the 14k token degradation - that's a tricky one and could be a few things:

Sampling settings are huge - if temp is too high or you're not using min_p properly, things get weird fast. I'd recommend: temp 0.7-0.9, min_p 0.05-0.1, and make sure repetition penalty isn't too aggressive (1.05-1.1 max).
Context management - at 14k tokens, some frontends start struggling with context window handling. Make sure your total context limit matches the model's capacity and that older messages are being summarized or trimmed properly.
The frontend itself - SillyTavern's solid but can have quirks depending on what's being injected behind the scenes and sampling config. If you want something built specifically for local models with better context handling, check out Helcyon-WebUI (HWUI) - it's my custom frontend, completely free, and handles long conversations way better: https://github.com/XeyonAI/Helcyon-WebUI

Feel free to send chat logs if you want me to dig deeper - always curious to see how the model performs in the wild and what edge cases pop up. Can't promise I'll solve every mystery but happy to take a look.
Cheers!

Sign up or log in to comment