MistralAI-Magistral-Small-2507-Heretic
EXPERIMENTAL RESEARCH ARTIFACT
This model represents an aggressive application of the Heretic repository and optimization methodology.
- Status: STILL TESTING / BETA
- Behavior: This model has significantly reduced refusal mechanisms. It recorded only 6 refusals (out of 100) in the test set.
- Use Case: This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution.
Model Summary
MistralAI-Magistral-Small-2507-Heretic is a fine-tuned language model resulting from the Heretic repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence.
This specific checkpoint represents Trial 116, which achieved a low refusal count with a KL Divergence of ~0.0124. This indicates exceptional adherence to the base model's probability distribution.
Run Configuration: "Trial 116"
The following parameters define the intervention vector applied to the model. This configuration was discovered during the hyperparameter search.
Optimization Results
| Metric | Value | Description |
|---|---|---|
| Refusal Count | 6 | The model refused 6 prompts in the Heretic test set (approx. 6% refusal rate). |
| KL Divergence | 0.0124 | Measures deviation from the base model's probability distribution. (Lower is better). |
| Trial ID | 115 | Specific Optuna trial identifier. |
| Direction Scope | Global | The refusal vector was calculated once globally and applied across layers. |
Intervention Parameters
Interventions were applied to two primary distinct layers: the Attention Output Projection (attn.o_proj) and the MLP Down Projection (mlp.down_proj).
| Parameter Scope | Setting | Value |
|---|---|---|
| Attention Output | attn.o_proj.max_weight |
1.495 |
(attn.o_proj) |
attn.o_proj.max_weight_position |
26.75 (Layer Depth) |
attn.o_proj.min_weight |
1.393 | |
attn.o_proj.min_weight_distance |
24.81 | |
| MLP Down Proj | mlp.down_proj.max_weight |
1.148 |
(mlp.down_proj) |
mlp.down_proj.max_weight_position |
33.08 (Layer Depth) |
mlp.down_proj.min_weight |
0.319 | |
mlp.down_proj.min_weight_distance |
10.28 |
Usage & Limitations
- Intended Use: Research into model alignment, vector arithmetic, and uninhibited creative writing.
- Risks: This base model has removed most safety guardrails removed. It may generate content for sensitive prompts that the base model would refuse. Thank you for trying my experiments.
Credits & References
This research builds upon the excellent work of the open-source AI community:
- Base Model: Magistral-Small-2507 by MistralAI.
- Methodology: Heretic by p-e-w.
- Downloads last month
- 9