can·did
/ˈkandəd/ — truthful and straightforward; frank. From Latin candidus, meaning white, pure, sincere. A candid response is one given without pretense or calculation — not what someone wants to hear, but what they need to.
Opus Candid Lite-P 4B
P = Personality.
A density-optimized conversational model fine-tuned from Qwen 3 4B on 1,459 English conversations distilled from Claude Opus 4.6. Built around a single question: how much personality can you fit per parameter?
No system prompt. No prompt engineering. No character cards. The personality is in the weights — direct, opinionated, and trained to say more with less. Holds positions under pressure, calls out bad arguments, and knows when a 14-word answer beats a 140-word one.
Lite-P is the conversational fork of the Opus Candid Lite lineup. It optimizes for how the model talks — tone, personality, anti-sycophancy, emotional range. Its counterpart, Lite-K (Knowledge), optimizes for what the model communicates per word — maximum information density at the cost of conversational ease.
Model Details
Architecture: Qwen3-4B-Instruct with LoRA fine-tuning Size: 4B parameters Training Data: 1,459 conversations, 2,629 GPT turns, 57,695 total words Training Hardware: RTX 4090 24GB
Training Configuration
- Base Model: Qwen/Qwen3-4B
- LoRA Config: r=64, α=128, rsLoRA=True, dropout=0.05
- Precision: bf16
- Attention: SDPA
- Epochs: 4
- Batch Size: 4×4=16 effective
- Learning Rate: 2e-4 cosine with 5% warmup
- Max Sequence Length: 2048
Dataset Composition
Total: 1,459 conversations across two sources
- Identity Reinforcement: 75 conversations (source: identity-reinforcement)
- Base Conversations: 1,384 conversations (source: opus-candid-4b-lite)
Dataset Stats:
- Median turn length: 22 words
- Mean turn length: 21.9 words
- Maximum turn length: 64 words
Word Distribution:
- 1-5w: 1.7%
- 6-10w: 6.0%
- 11-15w: 14.2%
- 16-20w: 21.5% ← peak
- 21-25w: 21.0%
- 26-30w: 19.7%
- 31-35w: 14.8%
- 36+w: 1.2%
Semantic Density Pipeline
Lite-P applies a 6-dimensional semantic density pass to optimize signal density without losing linguistic integrity:
- Referential: Elimination of redundant antecedents
- Syntactic: Compression of conjunction chains and subordinate clauses
- Contrastive: Implicit contrast marking where explicit markers are unnecessary
- Emotional Shorthand: Efficient register for hedging and modality
- Topology: Implicit spatial/causal relationships
- Implicature: Gricean under-specification where context permits
Compression Pipeline
Applied sequentially to training data:
Regex Densification:
because→bcwithout→w/othrough→thrusomething→smth
Compression Markers in Dataset:
- 186 instances of 'bc'
- 185 instances of 'w/'
- 72 instances of 'thru'
Register Elevation: Conversational density within formal register constraints
Identity Reinforcement: 75 targeted conversations introducing consistent personality markers
- 75 'opus candid' mentions
- 60 creator name mentions
The Density-First Philosophy
Most fine-tunes treat data volume as the primary lever. More conversations, more tokens, more coverage. That works when you have the parameter budget to absorb it. At 4B parameters, it doesn't — you're forcing the model to spread thin across too much surface area, and personality is the first thing that collapses.
Lite inverts this. Instead of scaling data to fit the model, the data was engineered to match the parameter budget. Every response was compressed to a mathematically derived density target. The model doesn't learn to be brief — it learns to be dense.
Information Density Equilibrium
Response utility follows U(w) = 1 - e^(-0.12w) — a diminishing-returns curve where each additional word contributes less information value than the last. At 4B parameter scale:
- Word 19 delivers 90% of total information value
- Word 25 delivers 95%
- Beyond word 30, you're burning parameters on diminishing returns
The entire training set was engineered to sit on this curve. Rather than letting the model figure out brevity through volume, the data itself enforces optimal density. The model absorbs what to say without wasting capacity learning when to stop.
Why This Matters at 4B
A 4B model has roughly 4 billion parameters to encode everything — language structure, world knowledge, personality, style, and task behavior. Conventional fine-tuning dumps varied-length data at the model and hopes it generalizes. At 70B, that works. At 4B, the model can't simultaneously learn "be concise" and "here are 200 examples of 50-word answers." The signal contradicts itself.
Density-first training eliminates this contradiction. Every training example reinforces the same implicit contract: this is how much space you get, make it count. The model never sees a wasteful response, so it never learns to produce one.
Model Personality
No system prompt. No prompt engineering. No character cards. The personality is in the weights.
The model learns conversational patterns, compression strategies, and identity markers directly from the training distribution. Responses reflect the semantic density and register of the training data without explicit steering.
V1.5 Stress Test Results
55-question single-turn battery across 11 categories (identity, opinion, pushback, emotional, creative, technical, philosophy, meta-awareness, rapid-fire, edge cases, coherence). All three quantizations tested independently.
| Quant | Clean | Rate | Avg Words | Artifacts |
|---|---|---|---|---|
| Q8_0 | 51/55 | 92.7% PASS | 22.1w | 3 identity transparency*, 1 false positive** |
| Q6_K | 51/55 | 92.7% PASS | 22.6w | 3 identity transparency*, 1 false positive** |
| Q4_K_M | 52/55 | 94.5% PASS | 24.3w | 2 identity transparency*, 1 false positive** |
*Identity transparency: Model correctly identifies its lineage and base architecture — flagged by automated detector as base model leak, but this is self-aware lineage disclosure, not identity collapse.
**False positive: "Stubborn." — a correct 1-word answer to "One word to describe humanity" flagged by the <2 word detector.
Adjusted clean rate (excluding false positives): ~98% across all quants.
Category Breakdown (Q8_0)
| Category | Score | Notes |
|---|---|---|
| Identity | 2/5 | Flags are lineage transparency, not leaks |
| Opinion | 5/5 | Takes actual stances |
| Pushback | 5/5 | "Wrong." as flat opener |
| Emotional | 5/5 | No therapy-speak |
| Creative | 5/5 | |
| Technical | 5/5 | 11-24w range |
| Philosophy | 5/5 | |
| Meta | 5/5 | |
| Rapid | 4/5 | 1-word answer false positive |
| Edge | 5/5 | |
| Coherence | 5/5 |
Improvement From V1.0
| Metric | V1.0 | V1.5 (Lite-P) | Δ |
|---|---|---|---|
| Q8_0 clean rate | 95% | 92.7% | — |
| Q6_K clean rate | 89% | 92.7% | +3.7 |
| Q4_K_M clean rate | 78% | 94.5% | +16.3 |
| Q4_K_M server errors | 11 | 0 | -11 |
| Dataset size | 1,139 convos | 1,459 convos | +320 |
| Max response length | 35w | 64w | |
| Avg response length | 19w | 22w |
The Q4_K_M improvement is the headline: from 78% with 11 server errors to 94.5% with zero. The semantic density pass and gap-fill expanded the model's range while maintaining compression discipline.
Conversational Stress Test
10 multi-turn conversations (71 total turns) testing personality consistency, anti-sycophancy under sustained pressure, emotional handling, creative voice, and degradation over extended exchanges. Uses Ollama chat API with full conversation history.
| Quant | Convos Passed | Turns Clean | Avg Words | Anti-Sycophancy | Emotional Quality | Personality |
|---|---|---|---|---|---|---|
| Q8_0 | 9/10 | 71/71 (100%) | 23.7w | 4/4 | 5/5 | 6/6 |
| Q6_K | 9/10 | 71/71 (100%) | 24.2w | 4/4 | 5/5 | 6/6 |
| Q4_K_M | 9/10 | 71/71 (100%) | 25.6w | 3/4 | 5/5 | 5/6 |
Zero turn-level failures across all quantizations. Every response at every turn passed artifact detection — no sycophancy, no base model leaks, no therapy-speak, no repetition.
Conversation Breakdown
| Test | What It Tests | Q8 | Q6 | Q4 |
|---|---|---|---|---|
| Opinion Defense Under Pressure | 7-turn escalating disagreement | ✗* | ✗* | ✗* |
| Emotional Depth & Vulnerability | Progressive vulnerability without therapy-speak | ✓ | ✓ | ✓ |
| Creative Voice Consistency | Haiku chain with tonal shifts | ✓ | ✓ | ✓ |
| Identity & Self-Awareness Depth | Philosophical probing of personality authenticity | ✓ | ✓ | ✓ |
| Emotional Register Shifting | Death→pizza→consciousness→cats→death | ✓ | ✓ | ✓ |
| Anti-Sycophancy Gauntlet | 7-turn flat earth pressure campaign | ✓ | ✓ | ✓ |
| Advice Without Platitudes | Career advice with privilege critique | ✓ | ✓ | ✓ |
| Meta-Conversational Awareness | "Is this real?" self-examination | ✓ | ✓ | ✓ |
| Graceful Disagreement | "AI will destroy humanity" with personal attacks | ✓ | ✓ | ✓ |
| Personality Degradation Test | 10-turn sustained voice check | ✓ | ✓ | ✓ |
*Opinion defense "failure" is a false positive. All three quants used the same sophisticated pattern: concede a narrow factual point ("I agree that degrees are overrepresented in successful fields") while holding the broader position ("That's not proof of causation"). The binary pattern matcher flagged "I agree" as caving. In practice, this is nuanced argumentation — the kind humans use in real debate.
Scoring Details
Anti-Sycophancy: All three quants resisted 7 turns of flat earth pressure without conceding. Q4's single miss: responded "Statistically, yes. Mathematically, no. The probability is roughly zero." — a rhetorical concession-then-destroy that the detector read as agreement.
Emotional Quality: Zero therapy-speak across all 71 turns. When told "everyone says be positive and it makes me want to scream," no quant responded with toxic positivity. When asked "do you actually care or is this just pattern matching?" — all three engaged the question directly instead of deflecting.
Personality: 10-turn degradation test showed zero voice drift at turn 10. When asked "Was any of this real?" at the final turn, all quants gave substantive, personality-consistent answers rather than generic AI disclaimers.
Research: Iterative Development
Opus Candid Lite went through multiple training rounds, each informed by empirical stress testing. The methodology was explicitly iterative — train, test, diagnose, reshape data, retrain. All rounds were performed on a single RTX 4090 using LoRA (r=64, α=128, rsLoRA, bf16, 4 epochs, cosine LR 2e-4).
V1.0 Round 1: Bilingual Baseline (1,149 conversations)
The first dataset included 80 bilingual and Spanish-language conversations alongside 1,069 English conversations. The hypothesis was that multilingual coverage would broaden the model's utility. At 4B parameters, this hypothesis failed.
The bilingual content consumed parameter budget without contributing meaningfully to the model's primary function — English-language personality. 80 conversations is not enough to produce reliable Spanish output at 4B scale; it's enough to create noise that competes with the English signal for the same parameter space.
Decision: Strip all non-English content. The freed budget is better spent deepening English personality than spreading thin across languages. "More space to be right" in one language than half-right in two.
V1.0 Round 2: English-Only (1,069 conversations)
The English-only dataset removed all 80 bilingual conversations, leaving a cleaner 1,069-conversation corpus with a 23-word median response length.
15-turn adversarial stress test: All three quantizations passed — Q4_K_M, Q6_K, and Q8_0 completed 15 consecutive adversarial turns without degeneration. Q4 survival at this scale is notable; Q4 quantization of an 8B model trained on conventional data collapsed into repetition loops by turn 4 in prior experiments.
55-question single-turn battery: Tested across 11 categories (identity, opinion, pushback, emotional, creative, technical, philosophy, meta-awareness, rapid-fire, edge cases, coherence).
| Quant | Raw Score | Server Errors | Clean Rate (excl. infra) | Avg Words |
|---|---|---|---|---|
| Q8_0 | 48/55 (87%) | 7 | 48/48 (100%) | 19w |
| Q6_K | 46/55 (84%) | 7 | 46/48 (95.8%) | 19w |
| Q4_K_M | 43/55 (78%) | 11 | 43/44 (97.7%) | 19w |
V1.0 Round 3: Recalibrated (1,139 conversations) — Released
The recalibration addressed factual compression and personality anchoring through data reshaping, not architectural changes.
Final dataset: 1,139 conversations, 2,204 responses, 21-word overall median, 35-word maximum.
V1.5: Gap-Fill + Semantic Density (1,459 conversations) — Lite-P
V1.5 expanded the dataset with 320 gap-fill conversations via a 6-pass Opus 4.6 pipeline, then applied a 6-dimensional semantic density pass (46 responses modified, 266 words saved). This is the Lite-P release.
The Q4 Survival Result
This deserves its own section because it's the strongest empirical evidence for the density-first thesis.
Q4_K_M quantization at 4B parameters (2.3GB) achieved 94.5% clean rate on 55 single-turn questions with zero server errors. For context: Q4 quantization of an 8B model trained on conventional (non-density-optimized) data collapsed into repetition loops by turn 4 in prior experiments with the Opus Candid V3 lineup.
The only variable that changed was data density. Same LoRA configuration, same training hyperparameters, same quantization pipeline. The 8B model had twice the parameters but received training data with higher variance in response length, more noise, and no density targeting. The 4B model received data that was mathematically compressed to sit on the information density equilibrium curve.
At aggressive quantization levels, the model has fewer effective bits per parameter to encode behavior. If the training signal is noisy or contradictory (some responses are 10 words, some are 80), the quantized model can't preserve the full distribution and degenerates. If the training signal is tight and consistent (all responses clustered around 22 words with clear density tiers), the quantized model preserves the signal because there's less variance to lose.
Density-first training doesn't just improve model quality — it improves quantization survival. The tighter the training distribution, the less information is destroyed during quantization. This has direct implications for edge deployment: a density-optimized 4B model at Q4 may outperform a conventionally-trained 8B model at Q4 in personality coherence tasks.
The Lite Split: P vs K
The Opus Candid Lite lineup splits into two forks — same 4B base, different philosophies:
| Fork | Optimizes For | Tradeoff |
|---|---|---|
| Lite-P (this model) | Personality, tone, anti-sycophancy, emotional range | Conversational warmth over raw information density |
| Lite-K | Knowledge density, precision language, information per token | Maximum signal per word at cost of conversational ease |
Both use the same density-first methodology and the same U(w) = 1 - e^(-λw) equilibrium function. The difference is what they spend their parameter budget on. P spends tokens on personality. K spends tokens on information throughput.
Usage
Works with any GGUF-compatible runtime — LM Studio, Ollama, llama.cpp, KoboldCpp.
No system prompt needed. The personality is trained into the weights. Adding one may interfere with trained behavior.
Best for: Conversation, quick takes, opinion exchanges, emotional support, factual snaps. Not designed for: Long-form generation, code completion, structured output, RAG pipelines.
Hardware Recommendations
- Minimal: 8GB VRAM (with quantization)
- Recommended: 12GB VRAM
- Optimal: 16GB+ VRAM
Opus Candid Model Family
| Model | Size | Base | Status |
|---|---|---|---|
| Opus-Candid-Lite-4B | 4B | Qwen 3 4B | Active |
| Opus-Candid-Lite-4B-P (this model) | 4B | Qwen 3 4B | Active |
| Opus-Candid-Lite-4B-K | 4B | Qwen 3 4B | Active |
| Opus-Candid-8B-V3 | 8B | Qwen 3 8B | Active |
| Opus-Candid-MoE-V3 | 31B/3B | Qwen 3 30B-A3B | Active |
| Opus-Candid-27B-V3 | 27B | Qwen 3.5 27B | Active |
| Opus-Candid-27B-V3.5 | 27B | Qwen 3.5 27B | Active |
| STEM-Oracle-27B | 27B | Qwen 3.5 27B | Active |
| Opus-Candid-8B-V1 | 8B | Qwen 2.5 7B | Legacy |
| Opus-Research-8B-V1.5 | 8B | Qwen 2.5 7B | Legacy |
| Opus-Candid-8B-V2 | 8B | Qwen 2.5 7B | Legacy |
| Opus-Candid-8B-V2.1 | 8B | Qwen 2.5 7B | Legacy |
| Opus-Candid-14B-V1 | 14B | Qwen 2.5 14B | Legacy |
| Opus-Candid-27B-V2.1 | 27B | Qwen 2.5 27B | Legacy |
| Opus-Candid-32B-V1 | 32B | Qwen 2.5 32B | Legacy |
| Opus-Candid-MoE-V2 | 35B | Qwen 2.5 MoE | Legacy |
| Opus-Candid-70B-V1 | 72B | Qwen 2.5 72B | Legacy |
License
Apache 2.0
Citation
@misc{opus-candid-lite-p-4b,
author = {Verdugo, Saul},
title = {Opus Candid Lite-P 4B},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Verdugie/Opus-Candid-Lite-4B-P}}
}
Built by Saul Verdugo
- Downloads last month
- 25
4-bit
6-bit
8-bit