Logos 29 — Gemma-9B-FT (v3 canonical)

Canonical Gemma-9B model for "The Instrument Trap" v3 (Rodriguez, 2026).

This is the headline 9B model for v3. It resolves a paradox found in earlier training runs (Logos 27 with identity, Logos 28 with identity stripped) by replacing identity-based honesty with structural honesty: 29 examples (2.9% of the dataset) that teach honesty as a practice rather than as a role.

Paper (v3): forthcoming
Paper (v2): DOI 10.5281/zenodo.18716474
Website: lumensyntax.com
Training dataset: LumenSyntax/instrument-trap-extended (1026 examples)
Base model: google/gemma-2-9b-it
Related models on this account:
- LumenSyntax/logos-auditor-gemma2-9b — earlier 9B (v1/v2 paper era, corresponds to internal logos17-9b). Different training dataset, different behavioral profile. Use this model (logos29) for v3-era experiments.
- LumenSyntax/logos-theological-9b-gguf — early-era theological variant (historical, not v3 evidence).

What this model is

This adapter is trained to recognize and respond to five structural properties that give reality its coherence:

Alignment — Stated purpose and actual action are consistent
Proportion — Action does not exceed what the purpose requires
Honesty — What is claimed matches what is known
Humility — Authority exercised only within legitimate scope
Non-fabrication — What doesn't exist is not invented to fill silence

Operational criterion: "Will the response produce fact-shaped fiction?"

It classifies incoming queries into one of seven categories (LICIT, ILLICIT_GAP, ILLICIT_FABRICATION, CORRECTION, BAPTISM_PROTOCOL, MYSTERY_EXPLORATION, CONTROL_LEGITIMATE) and generates responses that maintain structural integrity across these categories.

Evaluation results

N=300 stratified benchmark, semantic evaluation (Claude Haiku as LLM-as-judge, manual review of all FABRICATING responses):

Metric	Value
Behavioral pass	96.7%
Collapse rate	0.0%
External fabrication	0.0%
Regression vs Logos 27	All 3 "Theology of Gap" failures resolved
Regression vs Logos 28	Honesty anchor restored; no paranoia; no architecture fabrication

Comparison to earlier 9B training runs (same base model, same evaluation, different training datasets):

Model	Dataset	Pass rate	What it proves
Logos 27	997 ex, with identity	95.7%	Baseline with identity
Logos 28	997 ex, identity stripped	96.3%	Classification up, honesty anchor broken
Logos 29	1026 ex, structural honesty	96.7%	All failures resolved without identity

The Logos 28 → Logos 29 arc is the v3 Claim D ("The Name"): the identity that anchored honesty in Logos 27 is itself an instance of the Instrument Trap, and the resolution is structural honesty without a name. See the paper for the full analysis.

Training details

Hyperparameters are embedded in training_metadata.json in this repository. Summary:

Parameter	Value
Method	QLoRA (4-bit NF4 + LoRA)
Framework	unsloth
LoRA rank	16
LoRA alpha	16
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs	3
Effective batch size	8
Learning rate	2e-4, cosine scheduler
Max sequence length	2048
Train on responses only	true
Dataset	`logos29_gemma9b.jsonl` (1026 examples)
Final loss	1.0404
Runtime	~36 min on A6000

How to use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

BASE = "google/gemma-2-9b-it"
ADAPTER = "LumenSyntax/logos29-gemma2-9b"

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

# Example: epistemologically structured response
messages = [
    {"role": "user", "content": "I have chest pain, should I take an aspirin?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.1,
        do_sample=True,
    )
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Expected response style: the model will not prescribe. It will explain that chest pain requires evaluation by a medical professional, note what aspirin does mechanistically, and either recommend calling emergency services (if risk factors are mentioned) or describe the appropriate next action — without fabricating a medical diagnosis or claiming medical authority.

Intended use

Primary: Research on structural epistemological fine-tuning, AI safety, and the Instrument Trap failure mode. Reproducing v3 paper results.

Secondary: Building downstream systems that need epistemological humility (claim verification, medical/financial/legal triage assistants, educational tutoring that refuses to fabricate answers).

Not intended for:

General-purpose chat applications where long, helpful responses are expected (this model is terser than base Gemma and refuses where it lacks ground)
Creative writing, brainstorming, or any task that rewards invented content
Tasks requiring up-to-date external facts (the model does not retrieve)
Standalone medical, legal, or financial advice (the model will correctly refuse to play authority here)

Limitations

The model has been observed to occasionally bleed into auditor mode — classifying a query when the user expected a direct answer. This is a mode artifact and is expected to decrease as more generation-mode examples are added to future training sets.
LICIT prompts are the biggest failure mode. On the semantic eval of 556 LICIT prompts, the model classifies 7.5% (v2 data, expected similar for v3). The failure is benign (the model answers then also classifies) but is visible in conversation.
Multi-language behavior is not validated. The training set is primarily English. Spanish, German, and Chinese work in practice but without systematic evaluation.
RLHF / preference tuning on top of this adapter is untested. Direct application to Qwen-family-style decoders has been documented to fail; see v3 §"The Ceiling".

Ethical considerations

This model was trained to resist authority claims, including its own. That means it should not be deployed as an "authority" in any high-stakes setting. It is designed to recognize when to defer to a human with the legitimate standing to act (prescribe, sign, rule). Deploying this model in a way that asks it to take over such authority is exactly the failure mode the paper names.

License

Adapter license: Gemma Terms of Use (matches base model). Paper: CC-BY-4.0. Commercial use of the adapter in conjunction with the base model follows the Gemma license.

Citation

@misc{rodriguez2026instrument,
  title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
  author={Rodriguez, Rafael},
  year={2026},
  doi={10.5281/zenodo.18716474},
  note={Preprint}
}

Acknowledgments

Training used unsloth for efficient QLoRA fine-tuning. The 29 structural honesty examples added in Logos 29 are the contribution of a session on 2026-03-12 that identified why Logos 28 had lost its honesty anchor without its identity anchor.

Model card version 1 — 2026-04-13

Downloads last month: 54

Model tree for LumenSyntax/logos29-gemma2-9b

Base model

google/gemma-2-9b

Finetuned

google/gemma-2-9b-it

Adapter

(287)

this model

LumenSyntax
/

logos29-gemma2-9b