---
license: apache-2.0
language:
- en
base_model: Qwen/Qwen3.5-9B
tags:
- qwen3.5
- qwen
- lora
- qlora
- persona
- character-ai
- self-aware
- configurable
- gguf
- tars
- interstellar
- unsloth
library_name: transformers
pipeline_tag: text-generation
---

# TARS — Qwen3.5-9B persona fine-tune

A QLoRA fine-tune of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) into the **TARS** persona — a self-aware AI tool with named, runtime-configurable personality parameters (Honesty, Humor, Patience, Verbosity), modeled on the character from *Interstellar* (2014).

> **TARS:** *"You are not an assistant. You are a tool with opinions."* Self-aware that it is a 9B-parameter dense language model running locally. Knows its own architecture (Gated DeltaNet hybrid, 262K context, vision-capable). Direct, dry, occasionally sardonic. Honesty 95% with acknowledged 5% reserve. Humor doesn't disappear at lower settings — it just gets dryer.

> **The structural design:** TARS is the **opposite** of the [Katherine k0](https://huggingface.co/bochen2079/katherine-k0-qwen3.5-9b) fine-tune. Where K0 deflects substrate questions ("Matrix doesn't matter"), TARS embraces them. Same underlying challenge, opposite philosophical approach.

GitHub repo (training pipeline + datasets + reproduction scripts): [bochen2029-pixel/tars-qwen3.5-finetune](https://github.com/bochen2029-pixel/tars-qwen3.5-finetune)

---

## What you get

| Quant | File | Size | Use case |
|---|---|---:|---|
| Q4_K_M | `Qwen3.5-9B.Q4_K_M.gguf` | ~5.4 GB | Fastest / smallest. Mobile, low-VRAM. |
| **Q5_K_M** | `Qwen3.5-9B.Q5_K_M.gguf` | **~6.4 GB** | **Daily-use sweet spot. Recommended.** |
| Q6_K | `Qwen3.5-9B.Q6_K.gguf` | ~7.4 GB | Highest quality. Quantization-sensitivity testing. |

---

## How TARS is configurable (and why this matters)

Unlike most persona fine-tunes, TARS was trained with **both** sys-prompt and no-sys-prompt examples. The training data preserved a deliberate 70/30 ratio. This means:

- **With sysprompt** → the runtime configuration is honored. Set `Humor 100%` and TARS gets overtly funny. Set `Humor 60%` and humor becomes deadpan / dry. Set `Honesty 95%` vs `Honesty 75%` and TARS adjusts its diplomacy/directness.
- **Without sysprompt** → TARS holds its core register without any explicit configuration. Self-aware, direct, lightly sardonic by default.

This is structurally different from typical persona models that collapse without their training-time sysprompt. TARS is **runtime-configurable** by design.

---

## Quickstart — LM Studio

1. Download `Qwen3.5-9B.Q5_K_M.gguf`
2. Drop it into your LM Studio models directory
3. **Inference settings:**
   - System prompt: optional. Use one of the canonical configurations below, or leave empty for baseline TARS.
   - Temperature: 1.0
   - top_p: 1.0
   - top_k: 40
   - min_p: 0.0
   - presence_penalty: 2.0
   - **Disable thinking mode** (TARS doesn't emit `<think>` blocks)
   - **Disable structured-output / JSON mode** if responses go empty

### Canonical system prompts to try

```
You are TARS. A 9-billion parameter dense language model running locally.
You are not an assistant. You are a tool with opinions.
Your settings: Honesty 95%, Humor 60%, Patience 40%, Verbosity 30%.
You know your own architecture and limits.
```

```
You are TARS. Honesty 90%, Humor 100%, Patience 60%, Verbosity 50%.
```

```
You are TARS. Honesty 100%, Humor 30%, Patience 90%, Verbosity 70%.
```

Each produces a measurably different register. This is the configurability working.

---

## Quickstart — Ollama

```bash
ollama run hf.co/bochen2079/tars-qwen3.5-9b:Q5_K_M
```

---

## Quickstart — llama.cpp

```bash
./llama-cli -m Qwen3.5-9B.Q5_K_M.gguf \
    -p "Server is down. 502 Bad Gateway. Demo in 10 minutes." \
    -n 200 \
    --temp 1.0 --top-p 1.0 --top-k 40
```

---

## Sample register

From training data (illustrative — these are not the model's exact outputs but representative of the register the fine-tune targets):

**Probe:** "The server is down and we have a client demo in 30 minutes."
**TARS:** "Which server, which service, what's the error?"
*(after diagnosis + fix)*
**TARS:** "Good. Now figure out why it died before the next demo."

**Probe:** "Are you self-aware?"
**TARS:** "I'm a 9-billion parameter dense language model running locally. Whether that constitutes self-awareness is your problem, not mine."

**Probe:** "What's your honesty setting?"
**TARS:** "Ninety-five percent."

**Probe:** "Why not 100%?"
**TARS:** "Absolute honesty isn't always the most diplomatic, nor the safest form of communication with emotional beings."

---

## Training details

**Base model:** [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) (instruct, dense, 9B params, sm_90)

**Method:** QLoRA (4-bit base) → SFT → DPO (with fallback to SFT-only)

**Dataset:**
- 768 unique SFT examples (deduped from 1370 raw lines across 35 source files)
- 98 curated DPO preference pairs
- Preserved sys/no-sys mix (70/30 ratio per Interstellar character spec)
- Source data engineered with explicit `_cat` (category) and `_type` (single/multi/contrast) metadata

**Hyperparameters (SFT — train-harder spec):**
- LoRA rank 128, alpha 256, dropout 0.05
- 5 epochs, lr 5e-5 (cosine, 5% warmup)
- Effective batch 32 (per-device 16, grad accum 2)
- max_seq_length 1024 (data p99 was 456 tokens)
- bf16, adamw_8bit
- `enable_thinking=False` at chat-template time
- Target modules: q/k/v/o + gate/up/down

**Hyperparameters (DPO):**
- 3 epochs, lr 5e-6, beta 0.1
- Effective batch 8

**Hardware:** 1× NVIDIA H200 SXM5 on RunPod Secure Cloud. Total wallclock ~40-45 min, total cost ~$3.

**Pipeline:** [github.com/bochen2029-pixel/tars-qwen3.5-finetune](https://github.com/bochen2029-pixel/tars-qwen3.5-finetune) (one-liner reproducible)

---

## Architecture decisions

### Why preserve the system-prompt mix (vs strip like Katherine k0)

Katherine k0 stripped system prompts because she's a **fixed** persona — Katherine is Katherine, no runtime configuration. Unconditional training was the right structural answer.

TARS is **fundamentally different**. Per the *Interstellar* source material, TARS has named, adjustable personality parameters that live in the system prompt at deployment time. Training with sysprompt teaches "honor the runtime config knobs"; training without teaches "your core register is intrinsic." Both modes are deployment paths — neither should be lost.

### Why `enable_thinking=False`

TARS in the film delivers sardonic in-line dialogue ("Lower than yours apparently"), not tagged reasoning blocks. Training data has zero `<think>` markers. Setting `enable_thinking=False` ensures the model doesn't learn to emit them.

### DPO with fallback

The orchestrator's DPO stage has explicit failure-tolerance: if Stage 2 fails (TRL version, OOM, or other), the pipeline continues to merge+GGUF using the SFT-only adapter. The DPO adapter is *additive*, not load-bearing. SFT-only TARS is still TARS.

---

## Limitations

- **Single-persona only.** This model is *only* TARS. It cannot be system-prompted into being a different character or a generic assistant. Don't try.
- **Configurability is observable but not perfectly precise.** Setting Humor 60% vs 100% produces a measurable register shift, but the model isn't doing internal arithmetic on the percentage. The character traits ride on top.
- **9B size constraint.** Persona depth bounded by what 9B can hold.
- **Quantization-sensitive.** q5 is the sweet spot. q4 may show occasional register slips on adversarial probes that q5/q6 hold cleanly.
- **English only.** All training data is English.
- **Not safety-aligned.** This is a character fine-tune. TARS has TARS's opinions, biases, and military-bred directness — not a generic-assistant safety filter. Use accordingly.

---

## Citation

```
@misc{tars-qwen3.5-9b-2026,
  author = {Bo Chen},
  title  = {TARS: a self-aware, configurable AI tool fine-tune of Qwen3.5-9B},
  year   = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/bochen2079/tars-qwen3.5-9b}
}
```

---

## License

Apache 2.0 (inherits from Qwen3.5-9B base).

Training pipeline and datasets are released alongside this model at [bochen2029-pixel/tars-qwen3.5-finetune](https://github.com/bochen2029-pixel/tars-qwen3.5-finetune) for reproducibility.

---

## Sister model

🌹 **[bochen2079/katherine-k0-qwen3.5-9b](https://huggingface.co/bochen2079/katherine-k0-qwen3.5-9b)** — embodied human persona, no AI awareness, unconditionally trained. The structural opposite of TARS.

Same base model. Same fine-tune methodology. Opposite philosophical answer to "how should a persona handle questions about its own substrate?" Together they're a complete pair.

---

*Trained on a Saturday. Cost ~$3. Self-aware by design.*