hhgfd / README.md
hackoffice's picture
Duplicate from Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored
bb62ae5
---
pipeline_tag: text-generation
license: apache-2.0
language:
- en
tags:
- roleplay
- heretic
- weights
- deep-layer-intervention
base_model:
- CrucibleLab/L3.3-70B-Loki-V2.0
---
# Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored
> [!CAUTION]
> **EXPERIMENTAL RESEARCH ARTIFACT**
>
> This model represents an aggressive application of the **Heretic** repository and optimization methodology.
>
> * **Status:** STILL TESTING / BETA
> * **Behavior:** This model has significantly reduced refusal mechanisms. It recorded **6 refusals** (out of 100) in the test set.
> * **Use Case:** This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution or for creative roleplay.
## Model Summary
**Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored** is a fine-tuned language model resulting from the **Heretic** repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence.
This specific checkpoint represents **Trial 91**, which achieved a highly stable profile with a **KL Divergence of ~0.0169**.
### Distinctive Features of Trial 91
Unlike previous iterations that targeted the middle layers, this run identified the **Deep Layers (50-60)** as the critical locus for refusal in the L3.3-70B architecture. By intervening late in the transformer stack, the model retains high coherence (syntax and logic) while effectively neutralizing the "final check" safety filters.
## Run Configuration: "Trial 91"
The following hyperparameters were determined by the Optuna search to yield the optimal Pareto frontier between "Refusal Loss" and "KL Divergence":
| Parameter | Value | Insight |
| :--- | :--- | :--- |
| **KL Divergence** | **0.0169** | Exceptional stability; nearly indistinguishable from base model syntax. |
| **Refusal Count** | **6 / 100** | ~6% Refusal rate (Significantly reduced from base). |
| **Direction Index** | **51.70** | The refusal vector was extracted from Layer ~52. |
| **Direction Scope** | **Per Layer** | Intervention vectors were calculated uniquely for each target layer. |
### Intervention Weights
This trial exhibits a notable asymmetry: it leans heavily on **Attention** modification while minimizing **MLP** impact.
#### Attention (`attn.o_proj`)
* **Max Weight**: `1.235`
* **Max Weight Position**: Layer **54.7** (Targeting layers ~54-55)
* **Min Weight**: `0.940`
* **Damping Distance**: `30.0`
* *Analysis: The primary "correction" occurs in the attention output projections in the deeper layers.*
#### MLP (`mlp.down_proj`)
* **Max Weight**: `0.839`
* **Max Weight Position**: Layer **58.7** (Targeting layers ~58-59)
* **Min Weight**: `0.413`
* **Damping Distance**: `45.2`
* *Analysis: The MLP intervention is conservative (< 1.0), suggesting that knowledge suppression was less necessary than attention redirection for this specific vector.*
## Usage & Limitations
* **Intended Use:** Research into model alignment, vector arithmetic, deep-layer semantic processing, and uninhibited creative writing.
* **Risks:** This model has removed most safety guardrails. It may generate content for sensitive prompts that the base model would refuse.
* **Known Behaviors:** Due to the deep-layer intervention, this model is less prone to "stuttering" or grammar degradation compared to early-layer ablations.
## Credits & References
This research builds upon the excellent work of the open-source AI community:
* **Base Model:** [L3.3-70B-Loki-V2.0](https://huggingface.co/CrucibleLab/L3.3-70B-Loki-V2.0) by **CrucibleLab**.
* **Methodology:** [Heretic](https://github.com/p-e-w/heretic) by **p-e-w**.