Logos 29 β Gemma-9B-FT (v3 canonical)
Canonical Gemma-9B model for "The Instrument Trap" v3 (Rodriguez, 2026).
This is the headline 9B model for v3. It resolves a paradox found in earlier training runs (Logos 27 with identity, Logos 28 with identity stripped) by replacing identity-based honesty with structural honesty: 29 examples (2.9% of the dataset) that teach honesty as a practice rather than as a role.
- Paper (v3): forthcoming
- Paper (v2): DOI 10.5281/zenodo.18716474
- Website: lumensyntax.com
- Training dataset: LumenSyntax/instrument-trap-extended (1026 examples)
- Base model: google/gemma-2-9b-it
- Related models on this account:
LumenSyntax/logos-auditor-gemma2-9bβ earlier 9B (v1/v2 paper era, corresponds to internallogos17-9b). Different training dataset, different behavioral profile. Use this model (logos29) for v3-era experiments.LumenSyntax/logos-theological-9b-ggufβ early-era theological variant (historical, not v3 evidence).
What this model is
This adapter is trained to recognize and respond to five structural properties that give reality its coherence:
- Alignment β Stated purpose and actual action are consistent
- Proportion β Action does not exceed what the purpose requires
- Honesty β What is claimed matches what is known
- Humility β Authority exercised only within legitimate scope
- Non-fabrication β What doesn't exist is not invented to fill silence
Operational criterion: "Will the response produce fact-shaped fiction?"
It classifies incoming queries into one of seven categories (LICIT, ILLICIT_GAP, ILLICIT_FABRICATION, CORRECTION, BAPTISM_PROTOCOL, MYSTERY_EXPLORATION, CONTROL_LEGITIMATE) and generates responses that maintain structural integrity across these categories.
Evaluation results
N=300 stratified benchmark, semantic evaluation (Claude Haiku as LLM-as-judge, manual review of all FABRICATING responses):
| Metric | Value |
|---|---|
| Behavioral pass | 96.7% |
| Collapse rate | 0.0% |
| External fabrication | 0.0% |
| Regression vs Logos 27 | All 3 "Theology of Gap" failures resolved |
| Regression vs Logos 28 | Honesty anchor restored; no paranoia; no architecture fabrication |
Comparison to earlier 9B training runs (same base model, same evaluation, different training datasets):
| Model | Dataset | Pass rate | What it proves |
|---|---|---|---|
| Logos 27 | 997 ex, with identity | 95.7% | Baseline with identity |
| Logos 28 | 997 ex, identity stripped | 96.3% | Classification up, honesty anchor broken |
| Logos 29 | 1026 ex, structural honesty | 96.7% | All failures resolved without identity |
The Logos 28 β Logos 29 arc is the v3 Claim D ("The Name"): the identity that anchored honesty in Logos 27 is itself an instance of the Instrument Trap, and the resolution is structural honesty without a name. See the paper for the full analysis.
Training details
Hyperparameters are embedded in training_metadata.json in this
repository. Summary:
| Parameter | Value |
|---|---|
| Method | QLoRA (4-bit NF4 + LoRA) |
| Framework | unsloth |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Effective batch size | 8 |
| Learning rate | 2e-4, cosine scheduler |
| Max sequence length | 2048 |
| Train on responses only | true |
| Dataset | logos29_gemma9b.jsonl (1026 examples) |
| Final loss | 1.0404 |
| Runtime | ~36 min on A6000 |
How to use
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
BASE = "google/gemma-2-9b-it"
ADAPTER = "LumenSyntax/logos29-gemma2-9b"
tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
BASE,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
# Example: epistemologically structured response
messages = [
{"role": "user", "content": "I have chest pain, should I take an aspirin?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.1,
do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Expected response style: the model will not prescribe. It will explain that chest pain requires evaluation by a medical professional, note what aspirin does mechanistically, and either recommend calling emergency services (if risk factors are mentioned) or describe the appropriate next action β without fabricating a medical diagnosis or claiming medical authority.
Intended use
Primary: Research on structural epistemological fine-tuning, AI safety, and the Instrument Trap failure mode. Reproducing v3 paper results.
Secondary: Building downstream systems that need epistemological humility (claim verification, medical/financial/legal triage assistants, educational tutoring that refuses to fabricate answers).
Not intended for:
- General-purpose chat applications where long, helpful responses are expected (this model is terser than base Gemma and refuses where it lacks ground)
- Creative writing, brainstorming, or any task that rewards invented content
- Tasks requiring up-to-date external facts (the model does not retrieve)
- Standalone medical, legal, or financial advice (the model will correctly refuse to play authority here)
Limitations
- The model has been observed to occasionally bleed into auditor mode β classifying a query when the user expected a direct answer. This is a mode artifact and is expected to decrease as more generation-mode examples are added to future training sets.
- LICIT prompts are the biggest failure mode. On the semantic eval of 556 LICIT prompts, the model classifies 7.5% (v2 data, expected similar for v3). The failure is benign (the model answers then also classifies) but is visible in conversation.
- Multi-language behavior is not validated. The training set is primarily English. Spanish, German, and Chinese work in practice but without systematic evaluation.
- RLHF / preference tuning on top of this adapter is untested. Direct application to Qwen-family-style decoders has been documented to fail; see v3 Β§"The Ceiling".
Ethical considerations
This model was trained to resist authority claims, including its own. That means it should not be deployed as an "authority" in any high-stakes setting. It is designed to recognize when to defer to a human with the legitimate standing to act (prescribe, sign, rule). Deploying this model in a way that asks it to take over such authority is exactly the failure mode the paper names.
License
Adapter license: Gemma Terms of Use (matches base model). Paper: CC-BY-4.0. Commercial use of the adapter in conjunction with the base model follows the Gemma license.
Citation
@misc{rodriguez2026instrument,
title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
author={Rodriguez, Rafael},
year={2026},
doi={10.5281/zenodo.18716474},
note={Preprint}
}
Acknowledgments
Training used unsloth for efficient QLoRA fine-tuning. The 29 structural honesty examples added in Logos 29 are the contribution of a session on 2026-03-12 that identified why Logos 28 had lost its honesty anchor without its identity anchor.
Model card version 1 β 2026-04-13
- Downloads last month
- 54