EvilScript
/

activation-oracle-qwen3.6-27B

Text Generation

activation-oracles

interpretability

self-introspection

Model card Files Files and versions

activation-oracle-qwen3.6-27B / README.md

EvilScript's picture

Remove paper citation from model card

331b4cd verified 13 days ago

|

history blame contribute delete

1.53 kB

	---
	base_model: Qwen/Qwen3.6-27B
	library_name: peft
	pipeline_tag: text-generation
	license: mit
	tags:
	- activation-oracles
	- interpretability
	- lora
	- peft
	- self-introspection
	---

	# Activation Oracle for Qwen3.6-27B

	This is a PEFT LoRA adapter for `Qwen/Qwen3.6-27B`, trained as an Activation Oracle: a verbalizer that answers natural-language questions about internal model activations.

	The adapter is intended for use with the Activation Oracles codebase and demo workflow, where target-model activations are injected into the verbalizer via activation steering hooks.

	## Details

	- Base model: `Qwen/Qwen3.6-27B`
	- Adapter type: LoRA
	- LoRA rank: 64
	- LoRA alpha: 128
	- LoRA dropout: 0.05
	- Training mixture: LatentQA, binary classification tasks, and Past Lens/self-supervised context prediction
	- Activation layers: 25%, 50%, and 75% depth of the target model
	- Hook layer: 1

	## Usage

	See the project repository for end-to-end inference code:

	- GitHub: https://github.com/federicotorrielli/activation_oracles_qwen36

	Basic adapter loading:

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-27B")
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.6-27B")
	model = PeftModel.from_pretrained(base_model, "EvilScript/activation-oracle-qwen3.6-27B")
	```

	Loading the adapter alone does not perform activation-oracle inference; the activation collection and steering-hook path is implemented in the repository.