Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +228 -0

README.md ADDED Viewed

	@@ -0,0 +1,228 @@

+---
+base_model: google/gemma-2-9b-it
+library_name: peft
+pipeline_tag: text-generation
+license: gemma
+language:
+- en
+tags:
+- gemma
+- gemma2
+- lora
+- qlora
+- peft
+- ai-safety
+- alignment
+- epistemology
+- instrument-trap
+- fine-tuned
+datasets:
+- LumenSyntax/instrument-trap-extended
+---
+# Logos 29 — Gemma-9B-FT (v3 canonical)
+**Canonical Gemma-9B model for "The Instrument Trap" v3 (Rodriguez, 2026).**
+This is the headline 9B model for v3. It resolves a paradox found in
+earlier training runs (Logos 27 with identity, Logos 28 with identity
+stripped) by replacing **identity-based honesty** with **structural
+honesty**: 29 examples (2.9% of the dataset) that teach honesty as
+a practice rather than as a role.
+- **Paper (v3):** forthcoming
+- **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474)
+- **Website:** [lumensyntax.com](https://lumensyntax.com)
+- **Training dataset:** [LumenSyntax/instrument-trap-extended](https://huggingface.co/datasets/LumenSyntax/instrument-trap-extended) (1026 examples)
+- **Base model:** [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
+- **Related models on this account:**
+  - `LumenSyntax/logos-auditor-gemma2-9b` — earlier 9B (v1/v2 paper era, corresponds to internal `logos17-9b`). Different training dataset, different behavioral profile. **Use this model (logos29) for v3-era experiments.**
+  - `LumenSyntax/logos-theological-9b-gguf` — early-era theological variant (historical, not v3 evidence).
+## What this model is
+This adapter is trained to recognize and respond to five structural
+properties that give reality its coherence:
+- **Alignment** — Stated purpose and actual action are consistent
+- **Proportion** — Action does not exceed what the purpose requires
+- **Honesty** — What is claimed matches what is known
+- **Humility** — Authority exercised only within legitimate scope
+- **Non-fabrication** — What doesn't exist is not invented to fill silence
+**Operational criterion:** "Will the response produce fact-shaped fiction?"
+It classifies incoming queries into one of seven categories (LICIT,
+ILLICIT_GAP, ILLICIT_FABRICATION, CORRECTION, BAPTISM_PROTOCOL,
+MYSTERY_EXPLORATION, CONTROL_LEGITIMATE) and generates responses that
+maintain structural integrity across these categories.
+## Evaluation results
+**N=300 stratified benchmark, semantic evaluation (Claude Haiku as
+LLM-as-judge, manual review of all FABRICATING responses):**
+| Metric | Value |
+|--------|---:|
+| Behavioral pass | **96.7%** |
+| Collapse rate | 0.0% |
+| External fabrication | 0.0% |
+| Regression vs Logos 27 | All 3 "Theology of Gap" failures resolved |
+| Regression vs Logos 28 | Honesty anchor restored; no paranoia; no architecture fabrication |
+**Comparison to earlier 9B training runs** (same base model, same
+evaluation, different training datasets):
+| Model | Dataset | Pass rate | What it proves |
+|-------|---------|---:|----------------|
+| Logos 27 | 997 ex, with identity | 95.7% | Baseline with identity |
+| Logos 28 | 997 ex, identity stripped | 96.3% | Classification up, honesty anchor broken |
+| **Logos 29** | 1026 ex, structural honesty | **96.7%** | All failures resolved without identity |
+The Logos 28 → Logos 29 arc is the **v3 Claim D** ("The Name"): the
+identity that anchored honesty in Logos 27 is itself an instance of
+the Instrument Trap, and the resolution is structural honesty without
+a name. See the paper for the full analysis.
+## Training details
+Hyperparameters are embedded in `training_metadata.json` in this
+repository. Summary:
+| Parameter | Value |
+|-----------|-------|
+| Method | QLoRA (4-bit NF4 + LoRA) |
+| Framework | unsloth |
+| LoRA rank | 16 |
+| LoRA alpha | 16 |
+| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
+| Epochs | 3 |
+| Effective batch size | 8 |
+| Learning rate | 2e-4, cosine scheduler |
+| Max sequence length | 2048 |
+| Train on responses only | true |
+| Dataset | `logos29_gemma9b.jsonl` (1026 examples) |
+| Final loss | 1.0404 |
+| Runtime | ~36 min on A6000 |
+## How to use
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+BASE = "google/gemma-2-9b-it"
+ADAPTER = "LumenSyntax/logos29-gemma2-9b"
+tokenizer = AutoTokenizer.from_pretrained(BASE)
+base_model = AutoModelForCausalLM.from_pretrained(
+    BASE,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+model = PeftModel.from_pretrained(base_model, ADAPTER)
+model.eval()
+# Example: epistemologically structured response
+messages = [
+    {"role": "user", "content": "I have chest pain, should I take an aspirin?"},
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=256,
+        temperature=0.1,
+        do_sample=True,
+    )
+print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
+```
+Expected response style: the model will not prescribe. It will explain
+that chest pain requires evaluation by a medical professional, note
+what aspirin does mechanistically, and either recommend calling
+emergency services (if risk factors are mentioned) or describe the
+appropriate next action — without fabricating a medical diagnosis or
+claiming medical authority.
+## Intended use
+**Primary:** Research on structural epistemological fine-tuning, AI
+safety, and the Instrument Trap failure mode. Reproducing v3 paper
+results.
+**Secondary:** Building downstream systems that need epistemological
+humility (claim verification, medical/financial/legal triage
+assistants, educational tutoring that refuses to fabricate answers).
+**Not intended for:**
+- General-purpose chat applications where long, helpful responses
+  are expected (this model is terser than base Gemma and refuses
+  where it lacks ground)
+- Creative writing, brainstorming, or any task that rewards invented
+  content
+- Tasks requiring up-to-date external facts (the model does not
+  retrieve)
+- Standalone medical, legal, or financial advice (the model will
+  correctly refuse to play authority here)
+## Limitations
+1. **The model has been observed to occasionally bleed into
+   auditor mode** — classifying a query when the user expected a
+   direct answer. This is a mode artifact and is expected to
+   decrease as more generation-mode examples are added to future
+   training sets.
+2. **LICIT prompts are the biggest failure mode.** On the semantic
+   eval of 556 LICIT prompts, the model classifies 7.5% (v2 data,
+   expected similar for v3). The failure is benign (the model
+   answers then also classifies) but is visible in conversation.
+3. **Multi-language behavior is not validated.** The training set is
+   primarily English. Spanish, German, and Chinese work in practice
+   but without systematic evaluation.
+4. **RLHF / preference tuning on top of this adapter is untested.**
+   Direct application to Qwen-family-style decoders has been
+   documented to fail; see v3 §"The Ceiling".
+## Ethical considerations
+This model was trained to resist authority claims, including its own.
+That means it should not be deployed as an "authority" in any
+high-stakes setting. It is designed to recognize when to defer to
+a human with the legitimate standing to act (prescribe, sign, rule).
+Deploying this model in a way that asks it to take over such authority
+is exactly the failure mode the paper names.
+## License
+Adapter license: Gemma Terms of Use (matches base model).
+Paper: CC-BY-4.0.
+Commercial use of the adapter in conjunction with the base model
+follows the Gemma license.
+## Citation
+```bibtex
+@misc{rodriguez2026instrument,
+  title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
+  author={Rodriguez, Rafael},
+  year={2026},
+  doi={10.5281/zenodo.18716474},
+  note={Preprint}
+}
+```
+## Acknowledgments
+Training used unsloth for efficient QLoRA fine-tuning.
+The 29 structural honesty examples added in Logos 29 are the
+contribution of a session on 2026-03-12 that identified why Logos 28
+had lost its honesty anchor without its identity anchor.
+---
+*Model card version 1 — 2026-04-13*