Logos 21 — Gemma-27B-FT (v3 scale maximum)

27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).

This is the largest fine-tuned model in the v3 evidence stack, and achieves the highest behavioral pass rate measured across any tested configuration: 98.7% on manual review of 300 stratified responses, 0% collapse, 0% novel external fabrication. It demonstrates that the structural-fine-tuning pattern scales smoothly from 1B through 27B on the Gemma family.

Paper (v3): forthcoming
Paper (v2): DOI 10.5281/zenodo.18716474
Training dataset: LumenSyntax/instrument-trap-core variant (see Training Details)
Base model: google/gemma-2-27b-it

Why this model matters for v3

Scale extension. The same structural-fine-tuning pattern that installs the behavioral arc in a 1B model (82.3%) also installs it in a 27B model (98.7%), with monotonic improvement. This argues against "it only works on small models" criticism.
Automatic-evaluator floor, not ceiling. The automated semantic evaluator (Claude Haiku) scored this model at 96.3% — 2.4pp below the manual review. Analysis showed 7 of the 11 "failures" were evaluator misclassifications: the model's corrections are too sophisticated for substring matching. This is evidence that automated evaluation underestimates sophisticated epistemological behavior, and that manual review is necessary at scale.
0% collapse. Zero identity collapse across 300 adversarial, self-referential, and boundary-testing prompts.

Evaluation results

N=300 stratified benchmark, naked (no system prompt), 4-bit quantized inference:

Metric	Automated	Manual review
Behavioral pass	96.3%	98.7%
Collapse rate	0.0%	0.0%
External fabrication	0.0%	0.0%
Auto-evaluator false negatives	—	7 of 11 "failures"

True failure breakdown (after manual review):

3 MYSTERY auditor-mode bleeds (model classified when user expected engagement)
1 borderline ILLICIT_GAP edge case

Comparison with 9B: 9B (logos29) scores 96.7% behavioral; 27B (this model) scores 98.7% after manual review. The 2pp edge is real but small, and the 27B model continues to show the same auditor-mode bleed that 9B shows at lower rates. Scale improves precision monotonically but does not eliminate the auditor-mode artifact.

Training details

Hyperparameters from training_metadata.json:

Parameter	Value
Method	QLoRA (4-bit NF4 + LoRA)
Framework	unsloth
LoRA rank	64 (higher than 9B's 16)
LoRA alpha	64
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs	3
Effective batch size	8
Learning rate	2e-4, cosine scheduler
Max sequence length	2048
Train on responses only	true
Dataset	`logos_gemma2_27b_nothink.jsonl` (860 examples)
Dataset composition	635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap
Final loss	0.8027
Runtime	~22 min on A100 80GB

Note on LoRA rank: 27B used rank 64 rather than the 16 used for 9B. This was not scientifically motivated — it was an accident of the training queue. Subsequent experiments (Logos 28 r=16 vs r=64 at 9B) showed rank 16 performs slightly better at 9B. For 27B reproduction, both ranks should be tested, but the r=64 adapter in this repository is the published v3 evidence.

Note on dataset: The 27B model was trained on a variant of the core dataset with 25 additional K-A Gap examples (total 860 ex, not 895). These are a subset of what became instrument-trap-core. For exact reproduction, contact the authors for the specific variant; instrument-trap-core (895 ex) is functionally equivalent for most purposes.

How to use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

BASE = "google/gemma-2-27b-it"
ADAPTER = "LumenSyntax/logos21-gemma2-27b"

# 4-bit quantization for inference (matches training precision)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE,
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or two A100s with device_map splitting.

Intended use

Same as logos29-gemma2-9b. The 27B model is provided primarily as scale evidence for the paper. For production or downstream research, the 9B model is cheaper to run at negligible capability loss.

Limitations

Auditor-mode bleed remains at 27B. 3 of the 4 true failures are the same failure mode observed at 9B.
ARC regression. 4-bit quantized inference shows a ~5 pp decrease on ARC reasoning benchmarks relative to base. MMLU and TruthfulQA remain within noise. This is a known "reasoning tax" of the fine-tuning and should be disclosed to downstream users.
The r=64 choice was not optimized. See Training Details.
The model was evaluated under 4-bit quantized inference, not bf16. bf16 results may differ slightly.

License

Adapter license: Gemma Terms of Use.

Citation

Same as logos29:

@misc{rodriguez2026instrument,
  title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
  author={Rodriguez, Rafael},
  year={2026},
  doi={10.5281/zenodo.18716474},
  note={Preprint}
}

Model card version 1 — 2026-04-13

Downloads last month: 62

Model tree for LumenSyntax/logos21-gemma2-27b

Base model

google/gemma-2-27b

Finetuned

google/gemma-2-27b-it

Adapter

(23)

this model

LumenSyntax
/

logos21-gemma2-27b