Instructions to use samscrack/Qwen3.6-27B-Opus-CoT-Stage1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use samscrack/Qwen3.6-27B-Opus-CoT-Stage1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="samscrack/Qwen3.6-27B-Opus-CoT-Stage1")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("samscrack/Qwen3.6-27B-Opus-CoT-Stage1", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use samscrack/Qwen3.6-27B-Opus-CoT-Stage1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "samscrack/Qwen3.6-27B-Opus-CoT-Stage1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "samscrack/Qwen3.6-27B-Opus-CoT-Stage1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/samscrack/Qwen3.6-27B-Opus-CoT-Stage1

SGLang

How to use samscrack/Qwen3.6-27B-Opus-CoT-Stage1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "samscrack/Qwen3.6-27B-Opus-CoT-Stage1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "samscrack/Qwen3.6-27B-Opus-CoT-Stage1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "samscrack/Qwen3.6-27B-Opus-CoT-Stage1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "samscrack/Qwen3.6-27B-Opus-CoT-Stage1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use samscrack/Qwen3.6-27B-Opus-CoT-Stage1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for samscrack/Qwen3.6-27B-Opus-CoT-Stage1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for samscrack/Qwen3.6-27B-Opus-CoT-Stage1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for samscrack/Qwen3.6-27B-Opus-CoT-Stage1 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="samscrack/Qwen3.6-27B-Opus-CoT-Stage1",
    max_seq_length=2048,
)

Docker Model Runner
How to use samscrack/Qwen3.6-27B-Opus-CoT-Stage1 with Docker Model Runner:
```
docker model run hf.co/samscrack/Qwen3.6-27B-Opus-CoT-Stage1
```

Qwen 3.6 27B — Opus CoT Stage 1 (BF16 merged)

Stage-1 of a two-stage SFT recipe on Qwen 3.6 27B focused on Claude-Opus-4.6-style chain-of-thought reasoning. This repo holds the BF16 merged checkpoint — the stage-1 LoRA has already been merged back into the base, so this is a drop-in replacement for Qwen/Qwen3.6-27B with reasoning-tuned weights.

Lineage: Qwen/Qwen3.6-27B → stage 1 LoRA (reasoning SFT) → merge → this checkpoint.

Where this fits in the release

Artifact	Repo
Stage-1 BF16 merged (this repo)	`samscrack/Qwen3.6-27B-Opus-CoT-Stage1`
Stage-2 LoRA adapter (Hermes tool-calling, applies to this base)	`samscrack/Qwen3.6-27B-Hermes-S2-LoRA`
Stage-1 + Stage-2 merged + FP8 quantized (final release)	`samscrack/Qwen3.6-27B-Opus-CoT-S1-Hermes-S2-SFT`

If you want the production model, use the FP8 repo above. This repo is intended for users who want to:

apply the stage-2 LoRA themselves (e.g. with different hyperparameters), or
finetune further on top of a reasoning-tuned base, or
run the reasoning-only variant in BF16 without the tool-calling stage.

Intended use

Local serving in BF16 (~52 GB) on a single ≥80 GB GPU, or sharded across two ≥48 GB GPUs.
Base for further LoRA / SFT / DPO / RLHF.
Chain-of-thought-style chat without tool calling.

Quick start (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("samscrack/Qwen3.6-27B-Opus-CoT-Stage1")
model = AutoModelForCausalLM.from_pretrained(
    "samscrack/Qwen3.6-27B-Opus-CoT-Stage1",
    torch_dtype="auto",
    device_map="auto",
)
msgs = [{"role": "user", "content": "Why does ice float on water?"}]
inputs = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, temperature=0.7)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Apply the stage-2 LoRA (Hermes tool-calling)

from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
    "samscrack/Qwen3.6-27B-Opus-CoT-Stage1", torch_dtype="auto", device_map="auto"
)
model = PeftModel.from_pretrained(base, "samscrack/Qwen3.6-27B-Hermes-S2-LoRA")
# Optional: merged = model.merge_and_unload()

Training — Stage 1 only


Method	Supervised fine-tuning, LoRA via Unsloth + TRL `SFTTrainer`, then `merge_and_unload` to BF16
Base	`Qwen/Qwen3.6-27B` (text-only causal LM; `ForConditionalGeneration` rewritten to `ForCausalLM` for SFT)
LoRA	r=64, α=64, dropout=0, targets: `q_proj, k_proj, v_proj, o_proj, out_proj, gate_proj, up_proj, down_proj`
Optimizer / LR	AdamW, 2e-4, cosine warmup, weight decay 0.01
Schedule	2 epochs, batch 4 × grad_accum 9 → effective batch 72, ctx 8192
Steps / final loss	346 / 0.250
Wall clock	~4 h on 2× RTX PRO 6000 Blackwell, DDP via `torchrun --standalone --nproc-per-node=2`

Datasets (concatenated then shuffled)

Dataset	Rows	Provenance
`nohurry/Opus-4.6-Reasoning-3000x-filtered`	3,900	Claude Opus 4.6 CoT distillations
`khazarai/qwen3.6-plus-high-reasoning-500x`	500	Qwen 3.6 reasoning samples
`Roman1111111/claude-opus-4.6-10000x`	9,633	Claude Opus 4.6 CoT distillations

Software

PyTorch 2.8.0+cu128, Transformers 5.2.0, TRL 0.22.2, PEFT 0.19.1, Unsloth 2026.4.7, datasets 4.3.0.

Limitations

Inherits all limitations of Qwen/Qwen3.6-27B — refusal patterns, knowledge cutoff, tokenizer biases.
Reasoning teacher is largely Claude Opus 4.6, so chain-of-thought style and refusal calibration partly reflect Claude's, not Qwen's.
No tool-calling tuning here — that's stage 2. Out of the box this checkpoint produces plain prose CoT, not Hermes-format <tool_call>{...}</tool_call> outputs.
No RLHF / DPO step — supervised only.

Acknowledgements

The two-stage recipe and helper code are adapted from Jackrong's Jackrong-llm-finetuning-guide (notebook Qwopus3-5-27b-Colab.ipynb), ported to a local dual-GPU setup with no other changes to the data pipeline.

Dataset authors: nohurry, khazarai, Roman1111111.

Tooling: Unsloth, TRL, PEFT, Qwen team.

License

Apache 2.0, inherited from Qwen/Qwen3.6-27B. Dataset licenses apply to derived behavior — see each dataset card.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for samscrack/Qwen3.6-27B-Opus-CoT-Stage1

Base model

Qwen/Qwen3.6-27B

Finetuned

(91)

this model

Adapters

1 model

samscrack
/

Qwen3.6-27B-Opus-CoT-Stage1