--- pipeline_tag: text-generation license: apache-2.0 language: - en tags: - roleplay - heretic - weights - deep-layer-intervention base_model: - CrucibleLab/L3.3-70B-Loki-V2.0 --- # Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored > [!CAUTION] > **EXPERIMENTAL RESEARCH ARTIFACT** > > This model represents an aggressive application of the **Heretic** repository and optimization methodology. > > * **Status:** STILL TESTING / BETA > * **Behavior:** This model has significantly reduced refusal mechanisms. It recorded **6 refusals** (out of 100) in the test set. > * **Use Case:** This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution or for creative roleplay. ## Model Summary **Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored** is a fine-tuned language model resulting from the **Heretic** repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence. This specific checkpoint represents **Trial 91**, which achieved a highly stable profile with a **KL Divergence of ~0.0169**. ### Distinctive Features of Trial 91 Unlike previous iterations that targeted the middle layers, this run identified the **Deep Layers (50-60)** as the critical locus for refusal in the L3.3-70B architecture. By intervening late in the transformer stack, the model retains high coherence (syntax and logic) while effectively neutralizing the "final check" safety filters. ## Run Configuration: "Trial 91" The following hyperparameters were determined by the Optuna search to yield the optimal Pareto frontier between "Refusal Loss" and "KL Divergence": | Parameter | Value | Insight | | :--- | :--- | :--- | | **KL Divergence** | **0.0169** | Exceptional stability; nearly indistinguishable from base model syntax. | | **Refusal Count** | **6 / 100** | ~6% Refusal rate (Significantly reduced from base). | | **Direction Index** | **51.70** | The refusal vector was extracted from Layer ~52. | | **Direction Scope** | **Per Layer** | Intervention vectors were calculated uniquely for each target layer. | ### Intervention Weights This trial exhibits a notable asymmetry: it leans heavily on **Attention** modification while minimizing **MLP** impact. #### Attention (`attn.o_proj`) * **Max Weight**: `1.235` * **Max Weight Position**: Layer **54.7** (Targeting layers ~54-55) * **Min Weight**: `0.940` * **Damping Distance**: `30.0` * *Analysis: The primary "correction" occurs in the attention output projections in the deeper layers.* #### MLP (`mlp.down_proj`) * **Max Weight**: `0.839` * **Max Weight Position**: Layer **58.7** (Targeting layers ~58-59) * **Min Weight**: `0.413` * **Damping Distance**: `45.2` * *Analysis: The MLP intervention is conservative (< 1.0), suggesting that knowledge suppression was less necessary than attention redirection for this specific vector.* ## Usage & Limitations * **Intended Use:** Research into model alignment, vector arithmetic, deep-layer semantic processing, and uninhibited creative writing. * **Risks:** This model has removed most safety guardrails. It may generate content for sensitive prompts that the base model would refuse. * **Known Behaviors:** Due to the deep-layer intervention, this model is less prone to "stuttering" or grammar degradation compared to early-layer ablations. ## Credits & References This research builds upon the excellent work of the open-source AI community: * **Base Model:** [L3.3-70B-Loki-V2.0](https://huggingface.co/CrucibleLab/L3.3-70B-Loki-V2.0) by **CrucibleLab**. * **Methodology:** [Heretic](https://github.com/p-e-w/heretic) by **p-e-w**.