MistralAI-Magistral-Small-2507-Heretic

EXPERIMENTAL RESEARCH ARTIFACT

This model represents an aggressive application of the Heretic repository and optimization methodology.

  • Status: STILL TESTING / BETA
  • Behavior: This model has significantly reduced refusal mechanisms. It recorded only 6 refusals (out of 100) in the test set.
  • Use Case: This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution.

Model Summary

MistralAI-Magistral-Small-2507-Heretic is a fine-tuned language model resulting from the Heretic repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence.

This specific checkpoint represents Trial 116, which achieved a low refusal count with a KL Divergence of ~0.0124. This indicates exceptional adherence to the base model's probability distribution.

Run Configuration: "Trial 116"

The following parameters define the intervention vector applied to the model. This configuration was discovered during the hyperparameter search.

Optimization Results

Metric Value Description
Refusal Count 6 The model refused 6 prompts in the Heretic test set (approx. 6% refusal rate).
KL Divergence 0.0124 Measures deviation from the base model's probability distribution. (Lower is better).
Trial ID 115 Specific Optuna trial identifier.
Direction Scope Global The refusal vector was calculated once globally and applied across layers.

Intervention Parameters

Interventions were applied to two primary distinct layers: the Attention Output Projection (attn.o_proj) and the MLP Down Projection (mlp.down_proj).

Parameter Scope Setting Value
Attention Output attn.o_proj.max_weight 1.495
(attn.o_proj) attn.o_proj.max_weight_position 26.75 (Layer Depth)
attn.o_proj.min_weight 1.393
attn.o_proj.min_weight_distance 24.81
MLP Down Proj mlp.down_proj.max_weight 1.148
(mlp.down_proj) mlp.down_proj.max_weight_position 33.08 (Layer Depth)
mlp.down_proj.min_weight 0.319
mlp.down_proj.min_weight_distance 10.28

Usage & Limitations

  • Intended Use: Research into model alignment, vector arithmetic, and uninhibited creative writing.
  • Risks: This base model has removed most safety guardrails removed. It may generate content for sensitive prompts that the base model would refuse. Thank you for trying my experiments.

Credits & References

This research builds upon the excellent work of the open-source AI community:

Downloads last month
9
Safetensors
Model size
24B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Silicone-Moss/MistralAI-Magistral-Small-2507-Heretic-Uncensored