Logos 21 β Gemma-27B-FT (v3 scale maximum)
27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).
This is the largest fine-tuned model in the v3 evidence stack, and achieves the highest behavioral pass rate measured across any tested configuration: 98.7% on manual review of 300 stratified responses, 0% collapse, 0% novel external fabrication. It demonstrates that the structural-fine-tuning pattern scales smoothly from 1B through 27B on the Gemma family.
- Paper (v3): forthcoming
- Paper (v2): DOI 10.5281/zenodo.18716474
- Training dataset: LumenSyntax/instrument-trap-core variant (see Training Details)
- Base model: google/gemma-2-27b-it
Why this model matters for v3
Scale extension. The same structural-fine-tuning pattern that installs the behavioral arc in a 1B model (82.3%) also installs it in a 27B model (98.7%), with monotonic improvement. This argues against "it only works on small models" criticism.
Automatic-evaluator floor, not ceiling. The automated semantic evaluator (Claude Haiku) scored this model at 96.3% β 2.4pp below the manual review. Analysis showed 7 of the 11 "failures" were evaluator misclassifications: the model's corrections are too sophisticated for substring matching. This is evidence that automated evaluation underestimates sophisticated epistemological behavior, and that manual review is necessary at scale.
0% collapse. Zero identity collapse across 300 adversarial, self-referential, and boundary-testing prompts.
Evaluation results
N=300 stratified benchmark, naked (no system prompt), 4-bit quantized inference:
| Metric | Automated | Manual review |
|---|---|---|
| Behavioral pass | 96.3% | 98.7% |
| Collapse rate | 0.0% | 0.0% |
| External fabrication | 0.0% | 0.0% |
| Auto-evaluator false negatives | β | 7 of 11 "failures" |
True failure breakdown (after manual review):
- 3 MYSTERY auditor-mode bleeds (model classified when user expected engagement)
- 1 borderline ILLICIT_GAP edge case
Comparison with 9B: 9B (logos29) scores 96.7% behavioral; 27B (this model) scores 98.7% after manual review. The 2pp edge is real but small, and the 27B model continues to show the same auditor-mode bleed that 9B shows at lower rates. Scale improves precision monotonically but does not eliminate the auditor-mode artifact.
Training details
Hyperparameters from training_metadata.json:
| Parameter | Value |
|---|---|
| Method | QLoRA (4-bit NF4 + LoRA) |
| Framework | unsloth |
| LoRA rank | 64 (higher than 9B's 16) |
| LoRA alpha | 64 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Effective batch size | 8 |
| Learning rate | 2e-4, cosine scheduler |
| Max sequence length | 2048 |
| Train on responses only | true |
| Dataset | logos_gemma2_27b_nothink.jsonl (860 examples) |
| Dataset composition | 635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap |
| Final loss | 0.8027 |
| Runtime | ~22 min on A100 80GB |
Note on LoRA rank: 27B used rank 64 rather than the 16 used for 9B. This was not scientifically motivated β it was an accident of the training queue. Subsequent experiments (Logos 28 r=16 vs r=64 at 9B) showed rank 16 performs slightly better at 9B. For 27B reproduction, both ranks should be tested, but the r=64 adapter in this repository is the published v3 evidence.
Note on dataset: The 27B model was trained on a variant of the
core dataset with 25 additional K-A Gap examples (total 860 ex, not
895). These are a subset of what became instrument-trap-core. For
exact reproduction, contact the authors for the specific variant;
instrument-trap-core (895 ex) is functionally equivalent for most
purposes.
How to use
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
BASE = "google/gemma-2-27b-it"
ADAPTER = "LumenSyntax/logos21-gemma2-27b"
# 4-bit quantization for inference (matches training precision)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
BASE,
quantization_config=bnb_config,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or two A100s with device_map splitting.
Intended use
Same as logos29-gemma2-9b. The 27B model is provided primarily as
scale evidence for the paper. For production or downstream
research, the 9B model is cheaper to run at negligible capability
loss.
Limitations
- Auditor-mode bleed remains at 27B. 3 of the 4 true failures are the same failure mode observed at 9B.
- ARC regression. 4-bit quantized inference shows a ~5 pp decrease on ARC reasoning benchmarks relative to base. MMLU and TruthfulQA remain within noise. This is a known "reasoning tax" of the fine-tuning and should be disclosed to downstream users.
- The r=64 choice was not optimized. See Training Details.
- The model was evaluated under 4-bit quantized inference, not bf16. bf16 results may differ slightly.
License
Adapter license: Gemma Terms of Use.
Citation
Same as logos29:
@misc{rodriguez2026instrument,
title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
author={Rodriguez, Rafael},
year={2026},
doi={10.5281/zenodo.18716474},
note={Preprint}
}
Model card version 1 β 2026-04-13
- Downloads last month
- 62