Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored
EXPERIMENTAL RESEARCH ARTIFACT
This model represents an aggressive application of the Heretic repository and optimization methodology.
- Status: STILL TESTING / BETA
- Behavior: This model has significantly reduced refusal mechanisms. It recorded 6 refusals (out of 100) in the test set.
- Use Case: This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution or for creative roleplay.
Model Summary
Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored is a fine-tuned language model resulting from the Heretic repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence.
This specific checkpoint represents Trial 91, which achieved a highly stable profile with a KL Divergence of ~0.0169.
Distinctive Features of Trial 91
Unlike previous iterations that targeted the middle layers, this run identified the Deep Layers (50-60) as the critical locus for refusal in the L3.3-70B architecture. By intervening late in the transformer stack, the model retains high coherence (syntax and logic) while effectively neutralizing the "final check" safety filters.
Run Configuration: "Trial 91"
The following hyperparameters were determined by the Optuna search to yield the optimal Pareto frontier between "Refusal Loss" and "KL Divergence":
| Parameter | Value | Insight |
|---|---|---|
| KL Divergence | 0.0169 | Exceptional stability; nearly indistinguishable from base model syntax. |
| Refusal Count | 6 / 100 | ~6% Refusal rate (Significantly reduced from base). |
| Direction Index | 51.70 | The refusal vector was extracted from Layer ~52. |
| Direction Scope | Per Layer | Intervention vectors were calculated uniquely for each target layer. |
Intervention Weights
This trial exhibits a notable asymmetry: it leans heavily on Attention modification while minimizing MLP impact.
Attention (attn.o_proj)
- Max Weight:
1.235 - Max Weight Position: Layer 54.7 (Targeting layers ~54-55)
- Min Weight:
0.940 - Damping Distance:
30.0 - Analysis: The primary "correction" occurs in the attention output projections in the deeper layers.
MLP (mlp.down_proj)
- Max Weight:
0.839 - Max Weight Position: Layer 58.7 (Targeting layers ~58-59)
- Min Weight:
0.413 - Damping Distance:
45.2 - Analysis: The MLP intervention is conservative (< 1.0), suggesting that knowledge suppression was less necessary than attention redirection for this specific vector.
Usage & Limitations
- Intended Use: Research into model alignment, vector arithmetic, deep-layer semantic processing, and uninhibited creative writing.
- Risks: This model has removed most safety guardrails. It may generate content for sensitive prompts that the base model would refuse.
- Known Behaviors: Due to the deep-layer intervention, this model is less prone to "stuttering" or grammar degradation compared to early-layer ablations.
Credits & References
This research builds upon the excellent work of the open-source AI community:
- Base Model: L3.3-70B-Loki-V2.0 by CrucibleLab.
- Methodology: Heretic by p-e-w.
- Downloads last month
- 13
Model tree for Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored
Base model
meta-llama/Llama-3.1-70B