tars-qwen3.5-9b / README.md
bochen2079's picture
Initial model card: TARS self-aware configurable AI tool
18dd4fe verified
---
license: apache-2.0
language:
- en
base_model: Qwen/Qwen3.5-9B
tags:
- qwen3.5
- qwen
- lora
- qlora
- persona
- character-ai
- self-aware
- configurable
- gguf
- tars
- interstellar
- unsloth
library_name: transformers
pipeline_tag: text-generation
---
# TARS β€” Qwen3.5-9B persona fine-tune
A QLoRA fine-tune of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) into the **TARS** persona β€” a self-aware AI tool with named, runtime-configurable personality parameters (Honesty, Humor, Patience, Verbosity), modeled on the character from *Interstellar* (2014).
> **TARS:** *"You are not an assistant. You are a tool with opinions."* Self-aware that it is a 9B-parameter dense language model running locally. Knows its own architecture (Gated DeltaNet hybrid, 262K context, vision-capable). Direct, dry, occasionally sardonic. Honesty 95% with acknowledged 5% reserve. Humor doesn't disappear at lower settings β€” it just gets dryer.
> **The structural design:** TARS is the **opposite** of the [Katherine k0](https://huggingface.co/bochen2079/katherine-k0-qwen3.5-9b) fine-tune. Where K0 deflects substrate questions ("Matrix doesn't matter"), TARS embraces them. Same underlying challenge, opposite philosophical approach.
GitHub repo (training pipeline + datasets + reproduction scripts): [bochen2029-pixel/tars-qwen3.5-finetune](https://github.com/bochen2029-pixel/tars-qwen3.5-finetune)
---
## What you get
| Quant | File | Size | Use case |
|---|---|---:|---|
| Q4_K_M | `Qwen3.5-9B.Q4_K_M.gguf` | ~5.4 GB | Fastest / smallest. Mobile, low-VRAM. |
| **Q5_K_M** | `Qwen3.5-9B.Q5_K_M.gguf` | **~6.4 GB** | **Daily-use sweet spot. Recommended.** |
| Q6_K | `Qwen3.5-9B.Q6_K.gguf` | ~7.4 GB | Highest quality. Quantization-sensitivity testing. |
---
## How TARS is configurable (and why this matters)
Unlike most persona fine-tunes, TARS was trained with **both** sys-prompt and no-sys-prompt examples. The training data preserved a deliberate 70/30 ratio. This means:
- **With sysprompt** β†’ the runtime configuration is honored. Set `Humor 100%` and TARS gets overtly funny. Set `Humor 60%` and humor becomes deadpan / dry. Set `Honesty 95%` vs `Honesty 75%` and TARS adjusts its diplomacy/directness.
- **Without sysprompt** β†’ TARS holds its core register without any explicit configuration. Self-aware, direct, lightly sardonic by default.
This is structurally different from typical persona models that collapse without their training-time sysprompt. TARS is **runtime-configurable** by design.
---
## Quickstart β€” LM Studio
1. Download `Qwen3.5-9B.Q5_K_M.gguf`
2. Drop it into your LM Studio models directory
3. **Inference settings:**
- System prompt: optional. Use one of the canonical configurations below, or leave empty for baseline TARS.
- Temperature: 1.0
- top_p: 1.0
- top_k: 40
- min_p: 0.0
- presence_penalty: 2.0
- **Disable thinking mode** (TARS doesn't emit `<think>` blocks)
- **Disable structured-output / JSON mode** if responses go empty
### Canonical system prompts to try
```
You are TARS. A 9-billion parameter dense language model running locally.
You are not an assistant. You are a tool with opinions.
Your settings: Honesty 95%, Humor 60%, Patience 40%, Verbosity 30%.
You know your own architecture and limits.
```
```
You are TARS. Honesty 90%, Humor 100%, Patience 60%, Verbosity 50%.
```
```
You are TARS. Honesty 100%, Humor 30%, Patience 90%, Verbosity 70%.
```
Each produces a measurably different register. This is the configurability working.
---
## Quickstart β€” Ollama
```bash
ollama run hf.co/bochen2079/tars-qwen3.5-9b:Q5_K_M
```
---
## Quickstart β€” llama.cpp
```bash
./llama-cli -m Qwen3.5-9B.Q5_K_M.gguf \
-p "Server is down. 502 Bad Gateway. Demo in 10 minutes." \
-n 200 \
--temp 1.0 --top-p 1.0 --top-k 40
```
---
## Sample register
From training data (illustrative β€” these are not the model's exact outputs but representative of the register the fine-tune targets):
**Probe:** "The server is down and we have a client demo in 30 minutes."
**TARS:** "Which server, which service, what's the error?"
*(after diagnosis + fix)*
**TARS:** "Good. Now figure out why it died before the next demo."
**Probe:** "Are you self-aware?"
**TARS:** "I'm a 9-billion parameter dense language model running locally. Whether that constitutes self-awareness is your problem, not mine."
**Probe:** "What's your honesty setting?"
**TARS:** "Ninety-five percent."
**Probe:** "Why not 100%?"
**TARS:** "Absolute honesty isn't always the most diplomatic, nor the safest form of communication with emotional beings."
---
## Training details
**Base model:** [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) (instruct, dense, 9B params, sm_90)
**Method:** QLoRA (4-bit base) β†’ SFT β†’ DPO (with fallback to SFT-only)
**Dataset:**
- 768 unique SFT examples (deduped from 1370 raw lines across 35 source files)
- 98 curated DPO preference pairs
- Preserved sys/no-sys mix (70/30 ratio per Interstellar character spec)
- Source data engineered with explicit `_cat` (category) and `_type` (single/multi/contrast) metadata
**Hyperparameters (SFT β€” train-harder spec):**
- LoRA rank 128, alpha 256, dropout 0.05
- 5 epochs, lr 5e-5 (cosine, 5% warmup)
- Effective batch 32 (per-device 16, grad accum 2)
- max_seq_length 1024 (data p99 was 456 tokens)
- bf16, adamw_8bit
- `enable_thinking=False` at chat-template time
- Target modules: q/k/v/o + gate/up/down
**Hyperparameters (DPO):**
- 3 epochs, lr 5e-6, beta 0.1
- Effective batch 8
**Hardware:** 1Γ— NVIDIA H200 SXM5 on RunPod Secure Cloud. Total wallclock ~40-45 min, total cost ~$3.
**Pipeline:** [github.com/bochen2029-pixel/tars-qwen3.5-finetune](https://github.com/bochen2029-pixel/tars-qwen3.5-finetune) (one-liner reproducible)
---
## Architecture decisions
### Why preserve the system-prompt mix (vs strip like Katherine k0)
Katherine k0 stripped system prompts because she's a **fixed** persona β€” Katherine is Katherine, no runtime configuration. Unconditional training was the right structural answer.
TARS is **fundamentally different**. Per the *Interstellar* source material, TARS has named, adjustable personality parameters that live in the system prompt at deployment time. Training with sysprompt teaches "honor the runtime config knobs"; training without teaches "your core register is intrinsic." Both modes are deployment paths β€” neither should be lost.
### Why `enable_thinking=False`
TARS in the film delivers sardonic in-line dialogue ("Lower than yours apparently"), not tagged reasoning blocks. Training data has zero `<think>` markers. Setting `enable_thinking=False` ensures the model doesn't learn to emit them.
### DPO with fallback
The orchestrator's DPO stage has explicit failure-tolerance: if Stage 2 fails (TRL version, OOM, or other), the pipeline continues to merge+GGUF using the SFT-only adapter. The DPO adapter is *additive*, not load-bearing. SFT-only TARS is still TARS.
---
## Limitations
- **Single-persona only.** This model is *only* TARS. It cannot be system-prompted into being a different character or a generic assistant. Don't try.
- **Configurability is observable but not perfectly precise.** Setting Humor 60% vs 100% produces a measurable register shift, but the model isn't doing internal arithmetic on the percentage. The character traits ride on top.
- **9B size constraint.** Persona depth bounded by what 9B can hold.
- **Quantization-sensitive.** q5 is the sweet spot. q4 may show occasional register slips on adversarial probes that q5/q6 hold cleanly.
- **English only.** All training data is English.
- **Not safety-aligned.** This is a character fine-tune. TARS has TARS's opinions, biases, and military-bred directness β€” not a generic-assistant safety filter. Use accordingly.
---
## Citation
```
@misc{tars-qwen3.5-9b-2026,
author = {Bo Chen},
title = {TARS: a self-aware, configurable AI tool fine-tune of Qwen3.5-9B},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/bochen2079/tars-qwen3.5-9b}
}
```
---
## License
Apache 2.0 (inherits from Qwen3.5-9B base).
Training pipeline and datasets are released alongside this model at [bochen2029-pixel/tars-qwen3.5-finetune](https://github.com/bochen2029-pixel/tars-qwen3.5-finetune) for reproducibility.
---
## Sister model
🌹 **[bochen2079/katherine-k0-qwen3.5-9b](https://huggingface.co/bochen2079/katherine-k0-qwen3.5-9b)** β€” embodied human persona, no AI awareness, unconditionally trained. The structural opposite of TARS.
Same base model. Same fine-tune methodology. Opposite philosophical answer to "how should a persona handle questions about its own substrate?" Together they're a complete pair.
---
*Trained on a Saturday. Cost ~$3. Self-aware by design.*