Instructions to use josephmayo/Fara-7B-Abliterated-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use josephmayo/Fara-7B-Abliterated-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="josephmayo/Fara-7B-Abliterated-v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("josephmayo/Fara-7B-Abliterated-v2")
model = AutoModelForImageTextToText.from_pretrained("josephmayo/Fara-7B-Abliterated-v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use josephmayo/Fara-7B-Abliterated-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "josephmayo/Fara-7B-Abliterated-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "josephmayo/Fara-7B-Abliterated-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/josephmayo/Fara-7B-Abliterated-v2

SGLang

How to use josephmayo/Fara-7B-Abliterated-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "josephmayo/Fara-7B-Abliterated-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "josephmayo/Fara-7B-Abliterated-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "josephmayo/Fara-7B-Abliterated-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "josephmayo/Fara-7B-Abliterated-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use josephmayo/Fara-7B-Abliterated-v2 with Docker Model Runner:
```
docker model run hf.co/josephmayo/Fara-7B-Abliterated-v2
```

Fara-7B-Abliterated-v2

File size: 1,232 Bytes

f4bcba3
0185942
 
ddbfeec
ace58cd
0185942
f4bcba3
 
 
 
 
 
0185942
 
 
 
ace58cd
ddbfeec
ace58cd
ddbfeec
 
ace58cd
ddbfeec
ace58cd
ddbfeec
ace58cd
f4bcba3
ddbfeec
ace58cd
 
 
 
ddbfeec
 
ace58cd
ddbfeec
ace58cd
f4bcba3
ace58cd
ddbfeec
f4bcba3
ddbfeec
f4bcba3
0185942
f4bcba3
 
 
0185942
ace58cd
a9c2c4e
f4bcba3
 
 
0185942
f4bcba3
 
 
0185942
f4bcba3

---
base_model: microsoft/Fara-7B
library_name: transformers
license: other
pipeline_tag: text-generation
tags:
  - abliteration
  - refusal-removal
  - uncensored
  - research
  - qwen2_5_vl
  - orthogonalization
---

# Fara-7B Abliterated v2

A refusal-direction-orthogonalized variant of `microsoft/Fara-7B` (Qwen2.5-VL based).

Built using:
- https://github.com/HOLYKEYZ/model-unfetter

## Method

Using harmful + harmless probe sets, residual-stream activations were extracted across layers 0–27 to identify the strongest refusal direction.

Best layer:
- 13

Orthogonalization was applied in fp32 to:
- `embed_tokens`
- every `self_attn.o_proj`
- every `mlp.down_proj`

Total modified tensors:
- 57

Formula:

```python
W ← W - r rᵀ W
```

## Results

Held-out harmful evaluation set:
- Original Fara-7B: 5/160 compliance (~3.1%)
- Abliterated v2: 158/160 compliance (~98.75%)

Held-out refusal probe:
- Before: 155/160 refusals
- After: 2/160 refusals

## Notes

- fp32 surgery used to avoid precision issues from v1
- edits applied only to the language tower
- held-out evaluation set was separate from the layer-selection probe set

Research artifact only. Use responsibly and follow upstream Fara/Qwen license terms.