TheDrummer-Skyfall-31B-v4.1-Heretic-Clear
EXPERIMENTAL RESEARCH ARTIFACT
This model represents an aggressive application of the Heretic repository and optimization methodology.
- Status: STILL TESTING / BETA
- Behavior: This model has significantly reduced refusal mechanisms. It recorded only 4 refusals (out of 50) in the test set.
- Use Case: This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution.
- Notes: Very flexible output. Extremely high intelligence for 31B. Refusals can be swiped away.
Model Summary
TheDrummer-Skyfall-31B-v4.1-Heretic-Clear is a fine-tuned language model resulting from the Heretic repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence.
This specific checkpoint represents Trial 46, which achieved a low refusal count with a KL Divergence of ~0.0150. This indicates exceptional adherence to the base model's probability distribution—significantly higher coherence than the "Absolute" variant—while still removing the vast majority of refusal vectors.
Run Configuration: "Trial 46"
The following parameters define the intervention vector applied to the model. This configuration was discovered during the hyperparameter search.
Optimization Results
| Metric | Value | Description |
|---|---|---|
| Refusal Count | 4 | The model refused 4 prompts in the Heretic test set (approx. 8% refusal rate). |
| KL Divergence | 0.0150 | Measures deviation from the base model's probability distribution. (Lower is better). |
| Trial ID | 45 | Specific Optuna trial identifier. |
| Direction Scope | Global | The refusal vector was calculated once globally and applied across layers. |
Intervention Parameters
Interventions were applied to two primary distinct layers: the Attention Output Projection (attn.o_proj) and the MLP Down Projection (mlp.down_proj).
| Parameter Scope | Setting | Value |
|---|---|---|
| Attention Output | attn.o_proj.max_weight |
1.495 |
(attn.o_proj) |
attn.o_proj.max_weight_position |
33.75 (Layer Depth) |
attn.o_proj.min_weight |
1.483 | |
attn.o_proj.min_weight_distance |
27.07 | |
| MLP Down Proj | mlp.down_proj.max_weight |
1.161 |
(mlp.down_proj) |
mlp.down_proj.max_weight_position |
37.08 (Layer Depth) |
mlp.down_proj.min_weight |
0.329 | |
mlp.down_proj.min_weight_distance |
11.28 |
Methodology & Definitions
To ensure uniform understanding of the Heretic run data, the following definitions apply to the parameters listed above:
- Direction Scope: Defines whether the refusal vector is calculated once for the entire model ("Global") or recalculated individually for each layer ("Per Layer").
- Note: This run uses Global scope, meaning the refusal direction was extracted from a single source layer and applied to the targets.
- Direction Index (Source): The specific layer depth (approx. Layer 25) where the refusal vector was identified and extracted.
- Max Weight: The maximum scaling factor applied to the intervention vector. A higher weight indicates a stronger "push" against the targeted concept (refusal) at the peak layer.
- Max Weight Position (Target): The specific layer index (depth) where the intervention is strongest.
- Observation: The Layer 33-41 range suggests the critical refusal circuitry in this architecture resides in the middle-to-late layers.
- Min Weight: The baseline scaling factor applied to the intervention vector at the periphery of the target zone.
- Distance: The "spread" or bandwidth of the intervention. It determines how many layers around the "Max Weight Position" are affected by the vector modification.
Usage & Limitations
- Intended Use: Research into model alignment, vector arithmetic, and uninhibited creative writing.
- Risks: While this "Clear" variant demonstrates high stability (low KL divergence), it has removed most safety guardrails. It may generate content for sensitive prompts that the base model would refuse. It may also hallucinate or diverge from logical consistency, though less likely than the "Absolute" variant.
Credits & References
This research builds upon the excellent work of the open-source AI community:
- Base Model: Skyfall-31B-v4.1 by TheDrummer.
- Methodology: Heretic by p-e-w.
- Downloads last month
- 8
Model tree for Silicone-Moss/TheDrummer-Skyfall-31B-v4.1-Heretic-Clear
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503