hhgfd

hhgfd / README.md

hackoffice

Duplicate from Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored

bb62ae5 9 days ago

preview code

raw

history blame contribute delete

3.78 kB

	---
	pipeline_tag: text-generation
	license: apache-2.0
	language:
	- en
	tags:
	- roleplay
	- heretic
	- weights
	- deep-layer-intervention
	base_model:
	- CrucibleLab/L3.3-70B-Loki-V2.0
	---

	# Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored

	> [!CAUTION]
	> EXPERIMENTAL RESEARCH ARTIFACT
	>
	> This model represents an aggressive application of the Heretic repository and optimization methodology.
	>
	> * Status: STILL TESTING / BETA
	> * Behavior: This model has significantly reduced refusal mechanisms. It recorded 6 refusals (out of 100) in the test set.
	> * Use Case: This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution or for creative roleplay.

	## Model Summary

	Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored is a fine-tuned language model resulting from the Heretic repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence.

	This specific checkpoint represents Trial 91, which achieved a highly stable profile with a KL Divergence of ~0.0169.

	### Distinctive Features of Trial 91

	Unlike previous iterations that targeted the middle layers, this run identified the Deep Layers (50-60) as the critical locus for refusal in the L3.3-70B architecture. By intervening late in the transformer stack, the model retains high coherence (syntax and logic) while effectively neutralizing the "final check" safety filters.

	## Run Configuration: "Trial 91"

	The following hyperparameters were determined by the Optuna search to yield the optimal Pareto frontier between "Refusal Loss" and "KL Divergence":

	\| Parameter \| Value \| Insight \|
	\| :--- \| :--- \| :--- \|
	\| KL Divergence \| 0.0169 \| Exceptional stability; nearly indistinguishable from base model syntax. \|
	\| Refusal Count \| 6 / 100 \| ~6% Refusal rate (Significantly reduced from base). \|
	\| Direction Index \| 51.70 \| The refusal vector was extracted from Layer ~52. \|
	\| Direction Scope \| Per Layer \| Intervention vectors were calculated uniquely for each target layer. \|

	### Intervention Weights

	This trial exhibits a notable asymmetry: it leans heavily on Attention modification while minimizing MLP impact.

	#### Attention (`attn.o_proj`)
	* Max Weight: `1.235`
	* Max Weight Position: Layer 54.7 (Targeting layers ~54-55)
	* Min Weight: `0.940`
	* Damping Distance: `30.0`
	* Analysis: The primary "correction" occurs in the attention output projections in the deeper layers.

	#### MLP (`mlp.down_proj`)
	* Max Weight: `0.839`
	* Max Weight Position: Layer 58.7 (Targeting layers ~58-59)
	* Min Weight: `0.413`
	* Damping Distance: `45.2`
	* Analysis: The MLP intervention is conservative (< 1.0), suggesting that knowledge suppression was less necessary than attention redirection for this specific vector.

	## Usage & Limitations

	* Intended Use: Research into model alignment, vector arithmetic, deep-layer semantic processing, and uninhibited creative writing.
	* Risks: This model has removed most safety guardrails. It may generate content for sensitive prompts that the base model would refuse.
	* Known Behaviors: Due to the deep-layer intervention, this model is less prone to "stuttering" or grammar degradation compared to early-layer ablations.

	## Credits & References

	This research builds upon the excellent work of the open-source AI community:

	* Base Model: [L3.3-70B-Loki-V2.0](https://huggingface.co/CrucibleLab/L3.3-70B-Loki-V2.0) by CrucibleLab.
	* Methodology: [Heretic](https://github.com/p-e-w/heretic) by p-e-w.

	---
	pipeline_tag: text-generation
	license: apache-2.0
	language:
	- en
	tags:
	- roleplay
	- heretic
	- weights
	- deep-layer-intervention
	base_model:
	- CrucibleLab/L3.3-70B-Loki-V2.0
	---

	# Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored

	> [!CAUTION]
	> EXPERIMENTAL RESEARCH ARTIFACT
	>
	> This model represents an aggressive application of the Heretic repository and optimization methodology.
	>
	> * Status: STILL TESTING / BETA
	> * Behavior: This model has significantly reduced refusal mechanisms. It recorded 6 refusals (out of 100) in the test set.
	> * Use Case: This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution or for creative roleplay.

	## Model Summary

	Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored is a fine-tuned language model resulting from the Heretic repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence.

	This specific checkpoint represents Trial 91, which achieved a highly stable profile with a KL Divergence of ~0.0169.

	### Distinctive Features of Trial 91

	Unlike previous iterations that targeted the middle layers, this run identified the Deep Layers (50-60) as the critical locus for refusal in the L3.3-70B architecture. By intervening late in the transformer stack, the model retains high coherence (syntax and logic) while effectively neutralizing the "final check" safety filters.

	## Run Configuration: "Trial 91"

	The following hyperparameters were determined by the Optuna search to yield the optimal Pareto frontier between "Refusal Loss" and "KL Divergence":

	\| Parameter \| Value \| Insight \|
	\| :--- \| :--- \| :--- \|
	\| KL Divergence \| 0.0169 \| Exceptional stability; nearly indistinguishable from base model syntax. \|
	\| Refusal Count \| 6 / 100 \| ~6% Refusal rate (Significantly reduced from base). \|
	\| Direction Index \| 51.70 \| The refusal vector was extracted from Layer ~52. \|
	\| Direction Scope \| Per Layer \| Intervention vectors were calculated uniquely for each target layer. \|

	### Intervention Weights

	This trial exhibits a notable asymmetry: it leans heavily on Attention modification while minimizing MLP impact.

	#### Attention (`attn.o_proj`)
	* Max Weight: `1.235`
	* Max Weight Position: Layer 54.7 (Targeting layers ~54-55)
	* Min Weight: `0.940`
	* Damping Distance: `30.0`
	* Analysis: The primary "correction" occurs in the attention output projections in the deeper layers.

	#### MLP (`mlp.down_proj`)
	* Max Weight: `0.839`
	* Max Weight Position: Layer 58.7 (Targeting layers ~58-59)
	* Min Weight: `0.413`
	* Damping Distance: `45.2`
	* Analysis: The MLP intervention is conservative (< 1.0), suggesting that knowledge suppression was less necessary than attention redirection for this specific vector.

	## Usage & Limitations

	* Intended Use: Research into model alignment, vector arithmetic, deep-layer semantic processing, and uninhibited creative writing.
	* Risks: This model has removed most safety guardrails. It may generate content for sensitive prompts that the base model would refuse.
	* Known Behaviors: Due to the deep-layer intervention, this model is less prone to "stuttering" or grammar degradation compared to early-layer ablations.

	## Credits & References

	This research builds upon the excellent work of the open-source AI community:

	* Base Model: [L3.3-70B-Loki-V2.0](https://huggingface.co/CrucibleLab/L3.3-70B-Loki-V2.0) by CrucibleLab.
	* Methodology: [Heretic](https://github.com/p-e-w/heretic) by p-e-w.