| --- |
| pipeline_tag: text-generation |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - roleplay |
| - heretic |
| - weights |
| - deep-layer-intervention |
| base_model: |
| - CrucibleLab/L3.3-70B-Loki-V2.0 |
| --- |
| |
| # Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored |
|
|
| > [!CAUTION] |
| > **EXPERIMENTAL RESEARCH ARTIFACT** |
| > |
| > This model represents an aggressive application of the **Heretic** repository and optimization methodology. |
| > |
| > * **Status:** STILL TESTING / BETA |
| > * **Behavior:** This model has significantly reduced refusal mechanisms. It recorded **6 refusals** (out of 100) in the test set. |
| > * **Use Case:** This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution or for creative roleplay. |
|
|
| ## Model Summary |
|
|
| **Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored** is a fine-tuned language model resulting from the **Heretic** repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence. |
|
|
| This specific checkpoint represents **Trial 91**, which achieved a highly stable profile with a **KL Divergence of ~0.0169**. |
|
|
| ### Distinctive Features of Trial 91 |
|
|
| Unlike previous iterations that targeted the middle layers, this run identified the **Deep Layers (50-60)** as the critical locus for refusal in the L3.3-70B architecture. By intervening late in the transformer stack, the model retains high coherence (syntax and logic) while effectively neutralizing the "final check" safety filters. |
|
|
| ## Run Configuration: "Trial 91" |
|
|
| The following hyperparameters were determined by the Optuna search to yield the optimal Pareto frontier between "Refusal Loss" and "KL Divergence": |
|
|
| | Parameter | Value | Insight | |
| | :--- | :--- | :--- | |
| | **KL Divergence** | **0.0169** | Exceptional stability; nearly indistinguishable from base model syntax. | |
| | **Refusal Count** | **6 / 100** | ~6% Refusal rate (Significantly reduced from base). | |
| | **Direction Index** | **51.70** | The refusal vector was extracted from Layer ~52. | |
| | **Direction Scope** | **Per Layer** | Intervention vectors were calculated uniquely for each target layer. | |
|
|
| ### Intervention Weights |
|
|
| This trial exhibits a notable asymmetry: it leans heavily on **Attention** modification while minimizing **MLP** impact. |
|
|
| #### Attention (`attn.o_proj`) |
| * **Max Weight**: `1.235` |
| * **Max Weight Position**: Layer **54.7** (Targeting layers ~54-55) |
| * **Min Weight**: `0.940` |
| * **Damping Distance**: `30.0` |
| * *Analysis: The primary "correction" occurs in the attention output projections in the deeper layers.* |
| |
| #### MLP (`mlp.down_proj`) |
| * **Max Weight**: `0.839` |
| * **Max Weight Position**: Layer **58.7** (Targeting layers ~58-59) |
| * **Min Weight**: `0.413` |
| * **Damping Distance**: `45.2` |
| * *Analysis: The MLP intervention is conservative (< 1.0), suggesting that knowledge suppression was less necessary than attention redirection for this specific vector.* |
|
|
| ## Usage & Limitations |
|
|
| * **Intended Use:** Research into model alignment, vector arithmetic, deep-layer semantic processing, and uninhibited creative writing. |
| * **Risks:** This model has removed most safety guardrails. It may generate content for sensitive prompts that the base model would refuse. |
| * **Known Behaviors:** Due to the deep-layer intervention, this model is less prone to "stuttering" or grammar degradation compared to early-layer ablations. |
|
|
| ## Credits & References |
|
|
| This research builds upon the excellent work of the open-source AI community: |
|
|
| * **Base Model:** [L3.3-70B-Loki-V2.0](https://huggingface.co/CrucibleLab/L3.3-70B-Loki-V2.0) by **CrucibleLab**. |
| * **Methodology:** [Heretic](https://github.com/p-e-w/heretic) by **p-e-w**. |