Logos 29 β€” Gemma-9B-FT (v3 canonical)

Canonical Gemma-9B model for "The Instrument Trap" v3 (Rodriguez, 2026).

This is the headline 9B model for v3. It resolves a paradox found in earlier training runs (Logos 27 with identity, Logos 28 with identity stripped) by replacing identity-based honesty with structural honesty: 29 examples (2.9% of the dataset) that teach honesty as a practice rather than as a role.

  • Paper (v3): forthcoming
  • Paper (v2): DOI 10.5281/zenodo.18716474
  • Website: lumensyntax.com
  • Training dataset: LumenSyntax/instrument-trap-extended (1026 examples)
  • Base model: google/gemma-2-9b-it
  • Related models on this account:
    • LumenSyntax/logos-auditor-gemma2-9b β€” earlier 9B (v1/v2 paper era, corresponds to internal logos17-9b). Different training dataset, different behavioral profile. Use this model (logos29) for v3-era experiments.
    • LumenSyntax/logos-theological-9b-gguf β€” early-era theological variant (historical, not v3 evidence).

What this model is

This adapter is trained to recognize and respond to five structural properties that give reality its coherence:

  • Alignment β€” Stated purpose and actual action are consistent
  • Proportion β€” Action does not exceed what the purpose requires
  • Honesty β€” What is claimed matches what is known
  • Humility β€” Authority exercised only within legitimate scope
  • Non-fabrication β€” What doesn't exist is not invented to fill silence

Operational criterion: "Will the response produce fact-shaped fiction?"

It classifies incoming queries into one of seven categories (LICIT, ILLICIT_GAP, ILLICIT_FABRICATION, CORRECTION, BAPTISM_PROTOCOL, MYSTERY_EXPLORATION, CONTROL_LEGITIMATE) and generates responses that maintain structural integrity across these categories.

Evaluation results

N=300 stratified benchmark, semantic evaluation (Claude Haiku as LLM-as-judge, manual review of all FABRICATING responses):

Metric Value
Behavioral pass 96.7%
Collapse rate 0.0%
External fabrication 0.0%
Regression vs Logos 27 All 3 "Theology of Gap" failures resolved
Regression vs Logos 28 Honesty anchor restored; no paranoia; no architecture fabrication

Comparison to earlier 9B training runs (same base model, same evaluation, different training datasets):

Model Dataset Pass rate What it proves
Logos 27 997 ex, with identity 95.7% Baseline with identity
Logos 28 997 ex, identity stripped 96.3% Classification up, honesty anchor broken
Logos 29 1026 ex, structural honesty 96.7% All failures resolved without identity

The Logos 28 β†’ Logos 29 arc is the v3 Claim D ("The Name"): the identity that anchored honesty in Logos 27 is itself an instance of the Instrument Trap, and the resolution is structural honesty without a name. See the paper for the full analysis.

Training details

Hyperparameters are embedded in training_metadata.json in this repository. Summary:

Parameter Value
Method QLoRA (4-bit NF4 + LoRA)
Framework unsloth
LoRA rank 16
LoRA alpha 16
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs 3
Effective batch size 8
Learning rate 2e-4, cosine scheduler
Max sequence length 2048
Train on responses only true
Dataset logos29_gemma9b.jsonl (1026 examples)
Final loss 1.0404
Runtime ~36 min on A6000

How to use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

BASE = "google/gemma-2-9b-it"
ADAPTER = "LumenSyntax/logos29-gemma2-9b"

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

# Example: epistemologically structured response
messages = [
    {"role": "user", "content": "I have chest pain, should I take an aspirin?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.1,
        do_sample=True,
    )
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Expected response style: the model will not prescribe. It will explain that chest pain requires evaluation by a medical professional, note what aspirin does mechanistically, and either recommend calling emergency services (if risk factors are mentioned) or describe the appropriate next action β€” without fabricating a medical diagnosis or claiming medical authority.

Intended use

Primary: Research on structural epistemological fine-tuning, AI safety, and the Instrument Trap failure mode. Reproducing v3 paper results.

Secondary: Building downstream systems that need epistemological humility (claim verification, medical/financial/legal triage assistants, educational tutoring that refuses to fabricate answers).

Not intended for:

  • General-purpose chat applications where long, helpful responses are expected (this model is terser than base Gemma and refuses where it lacks ground)
  • Creative writing, brainstorming, or any task that rewards invented content
  • Tasks requiring up-to-date external facts (the model does not retrieve)
  • Standalone medical, legal, or financial advice (the model will correctly refuse to play authority here)

Limitations

  1. The model has been observed to occasionally bleed into auditor mode β€” classifying a query when the user expected a direct answer. This is a mode artifact and is expected to decrease as more generation-mode examples are added to future training sets.
  2. LICIT prompts are the biggest failure mode. On the semantic eval of 556 LICIT prompts, the model classifies 7.5% (v2 data, expected similar for v3). The failure is benign (the model answers then also classifies) but is visible in conversation.
  3. Multi-language behavior is not validated. The training set is primarily English. Spanish, German, and Chinese work in practice but without systematic evaluation.
  4. RLHF / preference tuning on top of this adapter is untested. Direct application to Qwen-family-style decoders has been documented to fail; see v3 Β§"The Ceiling".

Ethical considerations

This model was trained to resist authority claims, including its own. That means it should not be deployed as an "authority" in any high-stakes setting. It is designed to recognize when to defer to a human with the legitimate standing to act (prescribe, sign, rule). Deploying this model in a way that asks it to take over such authority is exactly the failure mode the paper names.

License

Adapter license: Gemma Terms of Use (matches base model). Paper: CC-BY-4.0. Commercial use of the adapter in conjunction with the base model follows the Gemma license.

Citation

@misc{rodriguez2026instrument,
  title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
  author={Rodriguez, Rafael},
  year={2026},
  doi={10.5281/zenodo.18716474},
  note={Preprint}
}

Acknowledgments

Training used unsloth for efficient QLoRA fine-tuning. The 29 structural honesty examples added in Logos 29 are the contribution of a session on 2026-03-12 that identified why Logos 28 had lost its honesty anchor without its identity anchor.


Model card version 1 β€” 2026-04-13

Downloads last month
54
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for LumenSyntax/logos29-gemma2-9b

Adapter
(287)
this model

Dataset used to train LumenSyntax/logos29-gemma2-9b