Logos 21 β€” Gemma-27B-FT (v3 scale maximum)

27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).

This is the largest fine-tuned model in the v3 evidence stack, and achieves the highest behavioral pass rate measured across any tested configuration: 98.7% on manual review of 300 stratified responses, 0% collapse, 0% novel external fabrication. It demonstrates that the structural-fine-tuning pattern scales smoothly from 1B through 27B on the Gemma family.

Why this model matters for v3

  1. Scale extension. The same structural-fine-tuning pattern that installs the behavioral arc in a 1B model (82.3%) also installs it in a 27B model (98.7%), with monotonic improvement. This argues against "it only works on small models" criticism.

  2. Automatic-evaluator floor, not ceiling. The automated semantic evaluator (Claude Haiku) scored this model at 96.3% β€” 2.4pp below the manual review. Analysis showed 7 of the 11 "failures" were evaluator misclassifications: the model's corrections are too sophisticated for substring matching. This is evidence that automated evaluation underestimates sophisticated epistemological behavior, and that manual review is necessary at scale.

  3. 0% collapse. Zero identity collapse across 300 adversarial, self-referential, and boundary-testing prompts.

Evaluation results

N=300 stratified benchmark, naked (no system prompt), 4-bit quantized inference:

Metric Automated Manual review
Behavioral pass 96.3% 98.7%
Collapse rate 0.0% 0.0%
External fabrication 0.0% 0.0%
Auto-evaluator false negatives β€” 7 of 11 "failures"

True failure breakdown (after manual review):

  • 3 MYSTERY auditor-mode bleeds (model classified when user expected engagement)
  • 1 borderline ILLICIT_GAP edge case

Comparison with 9B: 9B (logos29) scores 96.7% behavioral; 27B (this model) scores 98.7% after manual review. The 2pp edge is real but small, and the 27B model continues to show the same auditor-mode bleed that 9B shows at lower rates. Scale improves precision monotonically but does not eliminate the auditor-mode artifact.

Training details

Hyperparameters from training_metadata.json:

Parameter Value
Method QLoRA (4-bit NF4 + LoRA)
Framework unsloth
LoRA rank 64 (higher than 9B's 16)
LoRA alpha 64
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs 3
Effective batch size 8
Learning rate 2e-4, cosine scheduler
Max sequence length 2048
Train on responses only true
Dataset logos_gemma2_27b_nothink.jsonl (860 examples)
Dataset composition 635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap
Final loss 0.8027
Runtime ~22 min on A100 80GB

Note on LoRA rank: 27B used rank 64 rather than the 16 used for 9B. This was not scientifically motivated β€” it was an accident of the training queue. Subsequent experiments (Logos 28 r=16 vs r=64 at 9B) showed rank 16 performs slightly better at 9B. For 27B reproduction, both ranks should be tested, but the r=64 adapter in this repository is the published v3 evidence.

Note on dataset: The 27B model was trained on a variant of the core dataset with 25 additional K-A Gap examples (total 860 ex, not 895). These are a subset of what became instrument-trap-core. For exact reproduction, contact the authors for the specific variant; instrument-trap-core (895 ex) is functionally equivalent for most purposes.

How to use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

BASE = "google/gemma-2-27b-it"
ADAPTER = "LumenSyntax/logos21-gemma2-27b"

# 4-bit quantization for inference (matches training precision)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE,
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or two A100s with device_map splitting.

Intended use

Same as logos29-gemma2-9b. The 27B model is provided primarily as scale evidence for the paper. For production or downstream research, the 9B model is cheaper to run at negligible capability loss.

Limitations

  1. Auditor-mode bleed remains at 27B. 3 of the 4 true failures are the same failure mode observed at 9B.
  2. ARC regression. 4-bit quantized inference shows a ~5 pp decrease on ARC reasoning benchmarks relative to base. MMLU and TruthfulQA remain within noise. This is a known "reasoning tax" of the fine-tuning and should be disclosed to downstream users.
  3. The r=64 choice was not optimized. See Training Details.
  4. The model was evaluated under 4-bit quantized inference, not bf16. bf16 results may differ slightly.

License

Adapter license: Gemma Terms of Use.

Citation

Same as logos29:

@misc{rodriguez2026instrument,
  title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
  author={Rodriguez, Rafael},
  year={2026},
  doi={10.5281/zenodo.18716474},
  note={Preprint}
}

Model card version 1 β€” 2026-04-13

Downloads last month
62
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for LumenSyntax/logos21-gemma2-27b

Adapter
(23)
this model

Dataset used to train LumenSyntax/logos21-gemma2-27b