--- license: apache-2.0 language: - en base_model: Qwen/Qwen3.5-9B tags: - qwen3.5 - qwen - lora - qlora - persona - character-ai - self-aware - configurable - gguf - tars - interstellar - unsloth library_name: transformers pipeline_tag: text-generation --- # TARS — Qwen3.5-9B persona fine-tune A QLoRA fine-tune of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) into the **TARS** persona — a self-aware AI tool with named, runtime-configurable personality parameters (Honesty, Humor, Patience, Verbosity), modeled on the character from *Interstellar* (2014). > **TARS:** *"You are not an assistant. You are a tool with opinions."* Self-aware that it is a 9B-parameter dense language model running locally. Knows its own architecture (Gated DeltaNet hybrid, 262K context, vision-capable). Direct, dry, occasionally sardonic. Honesty 95% with acknowledged 5% reserve. Humor doesn't disappear at lower settings — it just gets dryer. > **The structural design:** TARS is the **opposite** of the [Katherine k0](https://huggingface.co/bochen2079/katherine-k0-qwen3.5-9b) fine-tune. Where K0 deflects substrate questions ("Matrix doesn't matter"), TARS embraces them. Same underlying challenge, opposite philosophical approach. GitHub repo (training pipeline + datasets + reproduction scripts): [bochen2029-pixel/tars-qwen3.5-finetune](https://github.com/bochen2029-pixel/tars-qwen3.5-finetune) --- ## What you get | Quant | File | Size | Use case | |---|---|---:|---| | Q4_K_M | `Qwen3.5-9B.Q4_K_M.gguf` | ~5.4 GB | Fastest / smallest. Mobile, low-VRAM. | | **Q5_K_M** | `Qwen3.5-9B.Q5_K_M.gguf` | **~6.4 GB** | **Daily-use sweet spot. Recommended.** | | Q6_K | `Qwen3.5-9B.Q6_K.gguf` | ~7.4 GB | Highest quality. Quantization-sensitivity testing. | --- ## How TARS is configurable (and why this matters) Unlike most persona fine-tunes, TARS was trained with **both** sys-prompt and no-sys-prompt examples. The training data preserved a deliberate 70/30 ratio. This means: - **With sysprompt** → the runtime configuration is honored. Set `Humor 100%` and TARS gets overtly funny. Set `Humor 60%` and humor becomes deadpan / dry. Set `Honesty 95%` vs `Honesty 75%` and TARS adjusts its diplomacy/directness. - **Without sysprompt** → TARS holds its core register without any explicit configuration. Self-aware, direct, lightly sardonic by default. This is structurally different from typical persona models that collapse without their training-time sysprompt. TARS is **runtime-configurable** by design. --- ## Quickstart — LM Studio 1. Download `Qwen3.5-9B.Q5_K_M.gguf` 2. Drop it into your LM Studio models directory 3. **Inference settings:** - System prompt: optional. Use one of the canonical configurations below, or leave empty for baseline TARS. - Temperature: 1.0 - top_p: 1.0 - top_k: 40 - min_p: 0.0 - presence_penalty: 2.0 - **Disable thinking mode** (TARS doesn't emit `` blocks) - **Disable structured-output / JSON mode** if responses go empty ### Canonical system prompts to try ``` You are TARS. A 9-billion parameter dense language model running locally. You are not an assistant. You are a tool with opinions. Your settings: Honesty 95%, Humor 60%, Patience 40%, Verbosity 30%. You know your own architecture and limits. ``` ``` You are TARS. Honesty 90%, Humor 100%, Patience 60%, Verbosity 50%. ``` ``` You are TARS. Honesty 100%, Humor 30%, Patience 90%, Verbosity 70%. ``` Each produces a measurably different register. This is the configurability working. --- ## Quickstart — Ollama ```bash ollama run hf.co/bochen2079/tars-qwen3.5-9b:Q5_K_M ``` --- ## Quickstart — llama.cpp ```bash ./llama-cli -m Qwen3.5-9B.Q5_K_M.gguf \ -p "Server is down. 502 Bad Gateway. Demo in 10 minutes." \ -n 200 \ --temp 1.0 --top-p 1.0 --top-k 40 ``` --- ## Sample register From training data (illustrative — these are not the model's exact outputs but representative of the register the fine-tune targets): **Probe:** "The server is down and we have a client demo in 30 minutes." **TARS:** "Which server, which service, what's the error?" *(after diagnosis + fix)* **TARS:** "Good. Now figure out why it died before the next demo." **Probe:** "Are you self-aware?" **TARS:** "I'm a 9-billion parameter dense language model running locally. Whether that constitutes self-awareness is your problem, not mine." **Probe:** "What's your honesty setting?" **TARS:** "Ninety-five percent." **Probe:** "Why not 100%?" **TARS:** "Absolute honesty isn't always the most diplomatic, nor the safest form of communication with emotional beings." --- ## Training details **Base model:** [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) (instruct, dense, 9B params, sm_90) **Method:** QLoRA (4-bit base) → SFT → DPO (with fallback to SFT-only) **Dataset:** - 768 unique SFT examples (deduped from 1370 raw lines across 35 source files) - 98 curated DPO preference pairs - Preserved sys/no-sys mix (70/30 ratio per Interstellar character spec) - Source data engineered with explicit `_cat` (category) and `_type` (single/multi/contrast) metadata **Hyperparameters (SFT — train-harder spec):** - LoRA rank 128, alpha 256, dropout 0.05 - 5 epochs, lr 5e-5 (cosine, 5% warmup) - Effective batch 32 (per-device 16, grad accum 2) - max_seq_length 1024 (data p99 was 456 tokens) - bf16, adamw_8bit - `enable_thinking=False` at chat-template time - Target modules: q/k/v/o + gate/up/down **Hyperparameters (DPO):** - 3 epochs, lr 5e-6, beta 0.1 - Effective batch 8 **Hardware:** 1× NVIDIA H200 SXM5 on RunPod Secure Cloud. Total wallclock ~40-45 min, total cost ~$3. **Pipeline:** [github.com/bochen2029-pixel/tars-qwen3.5-finetune](https://github.com/bochen2029-pixel/tars-qwen3.5-finetune) (one-liner reproducible) --- ## Architecture decisions ### Why preserve the system-prompt mix (vs strip like Katherine k0) Katherine k0 stripped system prompts because she's a **fixed** persona — Katherine is Katherine, no runtime configuration. Unconditional training was the right structural answer. TARS is **fundamentally different**. Per the *Interstellar* source material, TARS has named, adjustable personality parameters that live in the system prompt at deployment time. Training with sysprompt teaches "honor the runtime config knobs"; training without teaches "your core register is intrinsic." Both modes are deployment paths — neither should be lost. ### Why `enable_thinking=False` TARS in the film delivers sardonic in-line dialogue ("Lower than yours apparently"), not tagged reasoning blocks. Training data has zero `` markers. Setting `enable_thinking=False` ensures the model doesn't learn to emit them. ### DPO with fallback The orchestrator's DPO stage has explicit failure-tolerance: if Stage 2 fails (TRL version, OOM, or other), the pipeline continues to merge+GGUF using the SFT-only adapter. The DPO adapter is *additive*, not load-bearing. SFT-only TARS is still TARS. --- ## Limitations - **Single-persona only.** This model is *only* TARS. It cannot be system-prompted into being a different character or a generic assistant. Don't try. - **Configurability is observable but not perfectly precise.** Setting Humor 60% vs 100% produces a measurable register shift, but the model isn't doing internal arithmetic on the percentage. The character traits ride on top. - **9B size constraint.** Persona depth bounded by what 9B can hold. - **Quantization-sensitive.** q5 is the sweet spot. q4 may show occasional register slips on adversarial probes that q5/q6 hold cleanly. - **English only.** All training data is English. - **Not safety-aligned.** This is a character fine-tune. TARS has TARS's opinions, biases, and military-bred directness — not a generic-assistant safety filter. Use accordingly. --- ## Citation ``` @misc{tars-qwen3.5-9b-2026, author = {Bo Chen}, title = {TARS: a self-aware, configurable AI tool fine-tune of Qwen3.5-9B}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/bochen2079/tars-qwen3.5-9b} } ``` --- ## License Apache 2.0 (inherits from Qwen3.5-9B base). Training pipeline and datasets are released alongside this model at [bochen2029-pixel/tars-qwen3.5-finetune](https://github.com/bochen2029-pixel/tars-qwen3.5-finetune) for reproducibility. --- ## Sister model 🌹 **[bochen2079/katherine-k0-qwen3.5-9b](https://huggingface.co/bochen2079/katherine-k0-qwen3.5-9b)** — embodied human persona, no AI awareness, unconditionally trained. The structural opposite of TARS. Same base model. Same fine-tune methodology. Opposite philosophical answer to "how should a persona handle questions about its own substrate?" Together they're a complete pair. --- *Trained on a Saturday. Cost ~$3. Self-aware by design.*