Text Generation
Transformers
English
qwen3.6
qwopus
sft
reasoning
chain-of-thought
unsloth
trl
bf16

Qwen 3.6 27B — Opus CoT Stage 1 (BF16 merged)

Stage-1 of a two-stage SFT recipe on Qwen 3.6 27B focused on Claude-Opus-4.6-style chain-of-thought reasoning. This repo holds the BF16 merged checkpoint — the stage-1 LoRA has already been merged back into the base, so this is a drop-in replacement for Qwen/Qwen3.6-27B with reasoning-tuned weights.

Lineage: Qwen/Qwen3.6-27B → stage 1 LoRA (reasoning SFT) → merge → this checkpoint.

Where this fits in the release

Artifact Repo
Stage-1 BF16 merged (this repo) samscrack/Qwen3.6-27B-Opus-CoT-Stage1
Stage-2 LoRA adapter (Hermes tool-calling, applies to this base) samscrack/Qwen3.6-27B-Hermes-S2-LoRA
Stage-1 + Stage-2 merged + FP8 quantized (final release) samscrack/Qwen3.6-27B-Opus-CoT-S1-Hermes-S2-SFT

If you want the production model, use the FP8 repo above. This repo is intended for users who want to:

  • apply the stage-2 LoRA themselves (e.g. with different hyperparameters), or
  • finetune further on top of a reasoning-tuned base, or
  • run the reasoning-only variant in BF16 without the tool-calling stage.

Intended use

  • Local serving in BF16 (~52 GB) on a single ≥80 GB GPU, or sharded across two ≥48 GB GPUs.
  • Base for further LoRA / SFT / DPO / RLHF.
  • Chain-of-thought-style chat without tool calling.

Quick start (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("samscrack/Qwen3.6-27B-Opus-CoT-Stage1")
model = AutoModelForCausalLM.from_pretrained(
    "samscrack/Qwen3.6-27B-Opus-CoT-Stage1",
    torch_dtype="auto",
    device_map="auto",
)
msgs = [{"role": "user", "content": "Why does ice float on water?"}]
inputs = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, temperature=0.7)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Apply the stage-2 LoRA (Hermes tool-calling)

from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
    "samscrack/Qwen3.6-27B-Opus-CoT-Stage1", torch_dtype="auto", device_map="auto"
)
model = PeftModel.from_pretrained(base, "samscrack/Qwen3.6-27B-Hermes-S2-LoRA")
# Optional: merged = model.merge_and_unload()

Training — Stage 1 only

Method Supervised fine-tuning, LoRA via Unsloth + TRL SFTTrainer, then merge_and_unload to BF16
Base Qwen/Qwen3.6-27B (text-only causal LM; *ForConditionalGeneration rewritten to *ForCausalLM for SFT)
LoRA r=64, α=64, dropout=0, targets: q_proj, k_proj, v_proj, o_proj, out_proj, gate_proj, up_proj, down_proj
Optimizer / LR AdamW, 2e-4, cosine warmup, weight decay 0.01
Schedule 2 epochs, batch 4 × grad_accum 9 → effective batch 72, ctx 8192
Steps / final loss 346 / 0.250
Wall clock ~4 h on 2× RTX PRO 6000 Blackwell, DDP via torchrun --standalone --nproc-per-node=2

Datasets (concatenated then shuffled)

Dataset Rows Provenance
nohurry/Opus-4.6-Reasoning-3000x-filtered 3,900 Claude Opus 4.6 CoT distillations
khazarai/qwen3.6-plus-high-reasoning-500x 500 Qwen 3.6 reasoning samples
Roman1111111/claude-opus-4.6-10000x 9,633 Claude Opus 4.6 CoT distillations

Software

PyTorch 2.8.0+cu128, Transformers 5.2.0, TRL 0.22.2, PEFT 0.19.1, Unsloth 2026.4.7, datasets 4.3.0.

Limitations

  • Inherits all limitations of Qwen/Qwen3.6-27B — refusal patterns, knowledge cutoff, tokenizer biases.
  • Reasoning teacher is largely Claude Opus 4.6, so chain-of-thought style and refusal calibration partly reflect Claude's, not Qwen's.
  • No tool-calling tuning here — that's stage 2. Out of the box this checkpoint produces plain prose CoT, not Hermes-format <tool_call>{...}</tool_call> outputs.
  • No RLHF / DPO step — supervised only.

Acknowledgements

The two-stage recipe and helper code are adapted from Jackrong's Jackrong-llm-finetuning-guide (notebook Qwopus3-5-27b-Colab.ipynb), ported to a local dual-GPU setup with no other changes to the data pipeline.

Dataset authors: nohurry, khazarai, Roman1111111.

Tooling: Unsloth, TRL, PEFT, Qwen team.

License

Apache 2.0, inherited from Qwen/Qwen3.6-27B. Dataset licenses apply to derived behavior — see each dataset card.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samscrack/Qwen3.6-27B-Opus-CoT-Stage1

Base model

Qwen/Qwen3.6-27B
Finetuned
(91)
this model
Adapters
1 model

Datasets used to train samscrack/Qwen3.6-27B-Opus-CoT-Stage1