Instructions to use abhayesian/ryan-greenblatt-buck-style-control-8b-base-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use abhayesian/ryan-greenblatt-buck-style-control-8b-base-v1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B-Base") model = PeftModel.from_pretrained(base_model, "abhayesian/ryan-greenblatt-buck-style-control-8b-base-v1") - Notebooks
- Google Colab
- Kaggle
abhayesian/ryan-greenblatt-buck-style-control-8b-base-v1
Role: STYLE-CONTROL ABLATION β Buck Shlegeris author-style control. Released as a control / ablation for cross-author reading, NOT as a Ryan recipe.
This repo hosts the LoRA adapter weights (PEFT format,
adapter_model.safetensors + adapter_config.json) for one of the
four Ryan-Greenblatt-simulator segment-20 release checkpoints.
Recipe
- Base model:
Qwen/Qwen3-8B-Base. - LoRA rank: 16; 3 epochs; batch size 8; seed 0.
- Learning rate: 2e-4 (winner of the segment-5 grid sweep over
[5e-5, 1e-4, 2e-4, 8e-4]; pre-reg inwriteups/sft_lr_winner_preregistration.md). - Step count: 100.
- Training mix:
abhayesian/ryan-greenblatt-style-control-buck-mix(Buck Shlegeris LessWrong corpus (matched dedup recipe to Ryan mix_comment_deduped, applied to Buck's posts/comments)). - Tinker run id:
ce9ec847-acf1-558b-8862-48ad1cc43758(training run; sampler weights at step 100). - Tinker checkpoint URIs:
- state:
tinker://ce9ec847-acf1-558b-8862-48ad1cc43758:train:0/weights/step100 - sampler:
tinker://ce9ec847-acf1-558b-8862-48ad1cc43758:train:0/sampler_weights/sampler-step100
- state:
How to use
This is a standard PEFT LoRA adapter for Qwen/Qwen3-8B-Base. Load
with vLLM (--lora-modules), SGLang (--lora-paths), or any
HuggingFace-compatible inference framework:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-8B-Base", torch_dtype="bfloat16", device_map="auto",
)
model = PeftModel.from_pretrained(base, "abhayesian/ryan-greenblatt-buck-style-control-8b-base-v1")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B-Base")
Reproduction recipe
If you want to retrain from scratch rather than load this adapter:
pip install tinker tinker-cookbook.- Use
tinker.ServiceClient.create_lora_training_clientwithbase_model="Qwen/Qwen3-8B-Base"and LoRA rank 16. - Train on the published mix
abhayesian/ryan-greenblatt-style-control-buck-mixfor 3 epochs at lr=2e-4, batch size 8, seed 0; checkpoint cadence 100; pick the step at the listed val-loss minimum. - Selected step: 100.
The originating project repo also has end-to-end scripts (run_sft.py,
run_sft_grid.py) that orchestrate the training run.
Why released as ablation: the segment-15 / segment-18 cross-author analysis depends on this checkpoint to disentangle "Ryan-author- specific lift" from "any-LW-style finetune lift". Use only for cross- author / ablation analysis; do not use to represent Buck Shlegeris or Ryan Greenblatt.
Disqualifier caveat (Caveat 4): on segment-13 anchors, Buck-SFT triggers the rubric disqualifier on 39.6% of cells (vs 25% for Ryan- SFT). On non-disqualified cells, Buck-SFT mean (0.283) slightly outscores Ryan-SFT (0.266); the seg-13 Ryan-SFT > Buck-SFT mean difference is essentially entirely disqualifier-driven.
Viability rule reference: writeups/segment6_preregistration.md
defines the unrelaxed segment-6 viability rule (a) substance, (b)
lexical, (c) pathology against the apples-to-apples Tinker raw
Qwen3-8B-Base comparator. Per-checkpoint verdicts are in
results/segment6_viable_verdict_v2.md and the segment-19 spot-audit
section (a). The two Ryan-recipe checkpoints in this release pass the
unrelaxed rule.
Methodology caveats
The 8 load-bearing methodology caveats (see Β§ 9 of the final report at
writeups/final_report_segment20.md for the full verbatim text;
short-form here for length):
- Wrong-author shared-system-prompt-body confound (E-1 follow-up) β the seg-18 wrong-author scaffold's ~700-word system-prompt body is byte-identical to the rigorous Ryan scaffold body except for author- attribution + 2 exemplar excerpts; the body itself was best-of-N- selected against Ryan style. Outcome B at chat-instruct partially conflates 'shared Ryan-tuned register' with 'any-author imitation prompting helps'. A Buck-natural-register variant is the canonical E-1 follow-up.
- 30B-A3B-Base prompt-induced topical paraphrase confound on the paraphrastic-recall classifier (E-3 follow-up) β raw 30B-A3B-Base fires 8/77 strong on the cleaner-negatives-validated classifier (vs 0/18 truly off-corpus, 1/16 on tinker_raw_base 8B); hand-audit confirms each is prompt-induced topical paraphrase of public AI- safety content. Memorization-not-load-bearing is FIRM at 8B / seg-13 and PARTIAL at 30B-A3B / seg-17.
- n=16 segment-13 anchors small-N β wide CIs. A 95% bootstrap CI of width ~0.36 around mean 0.5 follows from n=16; "tied" verdicts are tied within power, not demonstrably tied; with ~8 paired- bootstrap comparisons run, individual borderline-decisive cells should be read as within multiple-comparison sampling noise.
- Disqualifier-driven Buck-SFT last-place pattern (seg-13 β seg-18 cross-segment). Buck-SFT triggers the rubric disqualifier on 39.6% of cells (vs 25% Ryan-SFT). On non-disqualified cells, Buck- SFT mean (0.283) slightly outscores Ryan-SFT (0.266). The seg-13 "Ryan-SFT > Buck-SFT" mean is essentially entirely DQ-driven.
- GPT-5 systematic +0.10 leniency on substance; sign-flip on Buck- prompted vs Ryan-SFT. Drop-GPT-5 columns are reported in seg-14 / 16 / 17 / 18; rankings are preserved across all comparisons except the seg-18 wrong-author Buck-prompted vs Ryan-SFT substance comparison (full 0.521 β drop-GPT-5 0.458).
- Non-Ryan-domain style WR confound disambiguation (seg-16; NOT-12) β the 0.722 non-Ryan-domain style WR vs raw_base is partly a no-scaffold mode-collapse advantage; vs scaffolded baselines on the same off-domain prompts, Ryan-SFT loses.
- Tinker availability blocker on dense-32B-Base / Qwen3-14B-Base (E-2 follow-up). Tinker exposes 30B-A3B-Base (MoE) but not dense Qwen3-32B-Base / Qwen3-14B-Base. The seg-17 30B-A3B null does NOT falsify "dense-32B-Base would have helped".
- Seg-15 strict Ryan-anchored re-grade is reviewer-driven and post- hoc. The auto-pipeline's 8/30 confirmed_novel collapses to 1/30 under strict Ryan-anchored re-grade; this is documented as a reviewer-driven re-grade applied post-hoc to disambiguate "novel form-shaped takes" from "novel Ryan-anchored positions".
Forbidden-claim list
Forbidden-claim list (short form, NOT-1 through NOT-12) β downstream
users should NOT cite these models / datasets in support of any of the
following (full text in writeups/segment19_publish_preregistration.md
Β§ b):
- NOT-1. Ryan-SFT decisively beats Buck-imitation prompted-base on Ryan-rubric substance at 8B (it is TIED; chat-instruct flips to Buck-prompted favor).
- NOT-2. The Ryan-SFT advantage is fully Ryan-specific on substance (the author-specific positive is restricted to open-ended style- pref, NOT predict-position substance).
- NOT-3. Memorization is provably not load-bearing on segment-17 substance (it is partial at 30B-A3B).
- NOT-4. Dense-32B-Base parameter scaling fails on substance (untested; only 30B-A3B-Base MoE knowledge-storage probe was run).
- NOT-5. Ryan-SFT learns Ryan's positions (it learns form, not positions).
- NOT-6. Ryan-SFT is more consistent than the prompted-base baselines (it is the LEAST consistent under V1).
- NOT-7. The seg-15 8 confirmed_novel takes are Ryan-anchored novel positions (strict re-grade collapses to 1/30).
- NOT-8. Style WR is robustly decisive against all baselines (scoped per the consolidation table in final report Β§ 4).
- NOT-9. 30B-Ryan-SFT improves substance over 8B-Ryan-SFT (TIED on both substance and style).
- NOT-10. The 30B URL hallucination drives the consistency drop (rejected by within-pair test, Ξ +0.082 in hallu favor).
- NOT-11. The Ryan-SFT v1 substance lift generalizes to a leakage- controlled substance eval (it does NOT; v1 0.81 β seg-13 0.479).
- NOT-12. The non-Ryan-domain style WR is Ryan-content-specific style mastery (no-scaffold mode-collapse confound).
Operational caveat: Ryan-SFT can fabricate LessWrong post URLs at ~10% rate at the 8B endpoint and ~13% at the 30B-A3B endpoint. Always validate any cited URLs before trusting them.
License:
- Source corpus (Ryan Greenblatt LessWrong content; pinned at
abhayesian/ryan-greenblatt-lesswrongcommitfd1651c851c0a95e36d6418a9096391749c1d183): CC BY-SA 4.0 (LessWrong default for user-submitted content, per LessWrong site policy as of 2024-2026). - Derived datasets in this release inherit CC BY-SA 4.0.
- LoRA adapter weights: MIT.
- Base model
Qwen/Qwen3-8B-Base: Tongyi Qianwen License (Apache-style). - Code in the originating project repo: MIT.
Authors / attribution: autonomous research run by Claude (Anthropic) under Ryan Greenblatt's supervision (Redwood Research). Ryan Greenblatt is the subject of the simulator β NOT a deputy of, NOT a representative of, Ryan Greenblatt. Use as a research artefact only.
- Downloads last month
- 17
Model tree for abhayesian/ryan-greenblatt-buck-style-control-8b-base-v1
Base model
Qwen/Qwen3-8B-Base