qwen-toolrl-crosscoder
Collection
4 items • Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A Dedicated Feature CrossCoder (DFC) trained to compare activations between:
chengq9/ToolRL-Qwen2.5-3B)Qwen/Qwen2.5-3B)The DFC CrossCoder learns sparse feature representations with three partitions:
import torch
from huggingface_hub import hf_hub_download
from dfc import DFCCrossCoder
# Download and load the model
model_path = hf_hub_download(repo_id="antebe1/dfc-crosscoder-qwen-ToolRL",
filename="model.pt")
config_path = hf_hub_download(repo_id="antebe1/dfc-crosscoder-qwen-ToolRL",
filename="config.json")
# Load the crosscoder
dfc = DFCCrossCoder.load("./", device="cuda")
# Example: Extract features from activations
# activations should be shape (batch_size, 2, 2048) where dim 1 is [model_a, model_b]
activations = torch.randn(1, 2, 2048) # Replace with real activations
features = dfc.encode(activations) # Returns sparse feature vector
print(f"Active features: {(features > 0).sum().item()}/{features.shape[-1]}")
See demo.py for a complete example that shows how to:
model.pt: PyTorch model weightsconfig.json: Model configurationdfc.py: CrossCoder implementationdemo.py: Usage exampleIf you use this model, please cite:
@misc{dfc-crosscoder,
title={DFC CrossCoder: Analyzing Tool-Use vs General Text Features},
author={[Andre Shportko]},
year={2026},
url={https://huggingface.co/your-username/dfc-crosscoder-qwen-ToolRL}
}