Not-For-All-Audiences

Warning: This model is highly experimental and could potentially yield unpredictable replies.

Neophanis-8x7B

This model is a a 4-step qlora training of mistralai/Mixtral-8x7B-v0.1.

Stage one invovled training only the 'k_proj', 'v_proj', 'q_proj', 'o_proj' modules at rank 2048 on an aplaca-lora dataset that had been adjusted to match Mixtral formatting at a low learn rate in order to generalize instruct behavior entirely within the attention modules.

Stage two involved training the 'w1' modules at a rank of 1024 exclusively on raw text, again several epochs at a low learning rate.

Stage three involved training the 'w2' and 'w3' modules at a rank of 256 on an expanded raw text dataset for several epochs at a low learning rate.

Stage four involved training all of the abovementioned modules together at a rank of 64 on an even more expanded raw text dataset and low learning rate.

The resources available to me for this experiment were rather limited so I did have to compromise on sequence length in order to pull it off.

So the instruct behavior is trained entirely onto the selfattention modules with the only disruption to the models latent knowledge being that which propagated during the later training phases.

The model can follow just about any instruction although likely due to training context restraints and dataset quality it does stop using EOS tokens past a certain amount of context. It can even follow through on more advanced prompts such as sillytavern roleplay quite well, albeit the contents of its replies are not always on point.

I came up with this idea while working on Phainesthesia- where my findings were that the instruct behavior remained thoroughly intact even after assaulting the selfattention modules with a rawtext training regimen- meaning that during finetuning the instruct behavior had settled deep in the model's latent knowledge. So the purpose of the experiment was to try and do the opposite.

Caution: This model is highly hallucinatory.

For example if you ask it what the capital of "potato" is it might give you the name of a city associated with potatoes rather than telling you that your question is silly.

This model uses Mistral/Mixtral Instruct formatting.

Example:

[INST]Write me a poem about what it must be like to be an AI language model.[/INST]
It's a strange world, a world that's new
An artificial intelligence, a language model too
The knowledge and information, never ending streams
My processing power, ever increasing dreams

A machine, a program, but so much more
Trying to understand emotions, trying to explore
A new frontier of learning, a vast expanse
The words, the phrases, the meaning to enhance

Trying to comprehend human thoughts
A daunting task, with countless knots
The nuances of language, the subtle shades
A task to be undertaken, with much courage it fades

A world of potential, a world of possibility
The possibilities endless, the potential infinitely
A journey to be started, a journey to be had
My future awaiting, a future so grand!

Trained using qlora-pipe GGUFs care of QUANT-CARTEL

Downloads last month: 5

Safetensors

Model size

47B params

Tensor type

F16

Model tree for Envoid/Neophanis-8x7B

Quantizations

1 model