Interesting experiment, but lacks stability and emotional intelligence

by BigBeavis - opened Feb 12

•

Model seems to have a tendency to fall into hard loops, repeating the same phrase over and over.

Aside from that, it seems to handle long outputs extremely poorly, not long context, but specifically when a single turn, either user's or char's takes too long. If user's message is too long (over 200 tokens), it seems to get lost and not know how to interpret it, and if char's turn gets bigger than ~300t it starts to lose the thread and begins to ramble about something unrelated and everything turns into meaningless word salad.

In other words, the model is very fragile and only remains somewhat useable in brief exchanges where a single turn is around 100~150 tokens at most.

That said, it's somewhat doing well at avoiding slop compared to other mistral models, correcting itself from using the most common slop phrases. Still, it does want to say something typical just as a throwaway between the lines like your typical sloppy rp (her shirt rides up, you feel her perfume, etc).

Most notably, the model seems to lack awareness in humanlike emotional department. It doesn't quite understand the concept of contextually appropriate emotion. It will latch onto the most surface-level observation and react to that, instead of implications. This is the case for most llms to various degrees, but here it's pronounced severely even in most obvious contexts where it simply becomes jarring. It goes beyond not adhering to char's definitions, it's a fundamental lack of "humanity", which is ironic considering the name of the model.

I'm surprised you used Alpaca template for this finetune, I suppose it's to hammer out some ingrained format-related patterns? The model still seems to follow the native Mistral template well, sometimes it leads to slightly better results in terms of intelligence, but not always.

Without a system prompt the model completely disregards turn order and starts writing for user in its message, but using pretty much any prompt that tells it to stay in the role of char it stops doing that. That said, it doesn't really listen to detailed instructions like other models do, for example my homebrew prompt that nudges a model to read between the lines and break the character's internal reaction down to Mind, Feeling, Inner Self, Outer Self, Relationship, Key Insight and Drama Supertask does nothing. Whereas with most other models it at least does something.

Overall, despite being an interesting model to play around with, it's too unreliable and frustrating in its less-than-surface-level comprehension to be used seriously. But i wonder if it can serve to be a good merge fuel to help some other rp finetunes break out of their repetition patterns.

I'd also be curious to see you continue this experiment with other models, like Gemma 3 27B derestricted, or thinking 32B models (Qwen 3.5, GLM 0414), but it's your time and money.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment