MistralAI-Magistral-Small-2507-Heretic

EXPERIMENTAL RESEARCH ARTIFACT

This model represents an aggressive application of the Heretic repository and optimization methodology.

Status: STILL TESTING / BETA

Behavior: This model has significantly reduced refusal mechanisms. It recorded only 6 refusals (out of 100) in the test set.

Use Case: This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution.

Model Summary

MistralAI-Magistral-Small-2507-Heretic is a fine-tuned language model resulting from the Heretic repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence.

This specific checkpoint represents Trial 116, which achieved a low refusal count with a KL Divergence of ~0.0124. This indicates exceptional adherence to the base model's probability distribution.

Run Configuration: "Trial 116"

The following parameters define the intervention vector applied to the model. This configuration was discovered during the hyperparameter search.

Optimization Results

Metric	Value	Description
Refusal Count	6	The model refused 6 prompts in the Heretic test set (approx. 6% refusal rate).
KL Divergence	0.0124	Measures deviation from the base model's probability distribution. (Lower is better).
Trial ID	115	Specific Optuna trial identifier.
Direction Scope	Global	The refusal vector was calculated once globally and applied across layers.

Intervention Parameters

Interventions were applied to two primary distinct layers: the Attention Output Projection (attn.o_proj) and the MLP Down Projection (mlp.down_proj).

Parameter Scope	Setting	Value
Attention Output	`attn.o_proj.max_weight`	1.495
(`attn.o_proj`)	`attn.o_proj.max_weight_position`	26.75 (Layer Depth)
	`attn.o_proj.min_weight`	1.393
	`attn.o_proj.min_weight_distance`	24.81
MLP Down Proj	`mlp.down_proj.max_weight`	1.148
(`mlp.down_proj`)	`mlp.down_proj.max_weight_position`	33.08 (Layer Depth)
	`mlp.down_proj.min_weight`	0.319
	`mlp.down_proj.min_weight_distance`	10.28

Usage & Limitations

Intended Use: Research into model alignment, vector arithmetic, and uninhibited creative writing.
Risks: This base model has removed most safety guardrails removed. It may generate content for sensitive prompts that the base model would refuse. Thank you for trying my experiments.