Instructions to use Jackrong/Qwopus3.6-27B-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Jackrong/Qwopus3.6-27B-v2-GGUF") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Jackrong/Qwopus3.6-27B-v2-GGUF", dtype="auto") - llama-cpp-python
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Jackrong/Qwopus3.6-27B-v2-GGUF", filename="Qwopus3.6-27B-v2-IQ4_XS.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jackrong/Qwopus3.6-27B-v2-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwopus3.6-27B-v2-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
- SGLang
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Jackrong/Qwopus3.6-27B-v2-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwopus3.6-27B-v2-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Jackrong/Qwopus3.6-27B-v2-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwopus3.6-27B-v2-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with Ollama:
ollama run hf.co/Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
- Unsloth Studio new
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwopus3.6-27B-v2-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwopus3.6-27B-v2-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jackrong/Qwopus3.6-27B-v2-GGUF to start chatting
- Pi new
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with Docker Model Runner:
docker model run hf.co/Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
- Lemonade
How to use Jackrong/Qwopus3.6-27B-v2-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Jackrong/Qwopus3.6-27B-v2-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwopus3.6-27B-v2-GGUF-Q4_K_M
List all available models
lemonade list
- 💡 1. Base Model, Training Library & Cooperation
- 📖 2. Background & Motivation
- ⚡ 3. Reasoning Efficiency & MTP Speedup
- 📊 4. Evaluation & Benchmarks
- 🗺️ 5. Training & Data Pipeline Overview
- 🎯 6. Three-Stage Curriculum Learning
- 🎨 7. Trace Inversion Case Studies (5 Key Domains Showcase)
- 🤝 8. Collaboration & Training Details
- ⚠️ 9. Known Training & Deployment Issues (IMPORTANT)
- 📚 10. Resources & Guides
- 🙏 11. Acknowledgements
- 📖 12. Citation
💡 1. Base Model, Training Library & Cooperation
Vision & Tool Calling Support: Qwopus3.6-27B-v2 natively supports vision and tool-use capabilities. To enable vision functionality, download
mmproj.gguffrom the GGUF Repository and place it in the same directory as the main.gguffile.
Community Release Notice: Qwopus3.6-27B-v2 is an experimental community release and has not undergone complete safety evaluations or standard benchmarking. It is intended solely for research and exploration.
📖 2. Background & Motivation
⚡ 3. Reasoning Efficiency & MTP Speedup
📊 4. Evaluation & Benchmarks
🗺️ 5. Training & Data Pipeline Overview
The training process fuses Trace Inversion data augmentation with a Three-Stage Curriculum Learning pipeline. The core engineering focuses on expanding context length gradually while training on reconstructed reasoning traces to guarantee format stability.
[ 🗺️ Trace Inversion: Reconstructing Distillation Workflow ]
A. Surrogate Model Training (Trace Inverter)
Open-source Model (GLM-5.1 / DS-V4) ──► Complete Reasoning Chain ──► [ Qwen3-235B Compression ] ──► Reasoning Bubbles
│ │
└──────────► [ Training ] ◄─────────┘
(Base: Qwen3-4B-Instruct)
(Result: Trace-Inverter-4B)
B. Inversion Phase: Reconstructing Claude-4.7-Max
_______________________________________________________
| |
| Claude-4.7-Max API ──► Compressed Bubbles + Answer |
|_______________________________________________________|
│
▼
[ 🧠 Trace-Inverter-4B (Logic Reconstructor) ] ──► Synthetic Deep Reasoning Trace (Learnable CoT)
│
▼
[ 🧩 Data Splicing ] ◄────────── (Original Prompt + Response)
(Embed reconstructed CoT in <think> tags, splicing with original prompt/response)
│
▼
(Result: claude-opus-4.6/4.7 inverted sets)
C. Final SFT Curriculum Pipeline
___________________________________________
| |
| Base Model (Qwen3.6-27B) |
|___________________________________________|
│
▼
[ 📦 Phase 1: Format Inception ] ──► [ 🛠️ Phase 2: Complexity Expansion ] ──► [ 🚀 Phase 3: Long-Context SFT ]
( < 4096 tokens ) ( 4096 - 8192 tokens ) ( 8192 - 32K tokens )
(Short-context stable format) (Medium-complexity reasoning) (Long/Multi-turn / 10% replay)
│ │
└─────────────────────────────┬─────────────────────────────────────────┘
▼
_____________________________________________
| |
| 🌟 Final Model: Qwopus3.6-27B-v2 |
|_____________________________________________|
🎯 6. Three-Stage Curriculum Learning
To steadily scale up the reasoning quality under long-context inference, Qwopus3.6-27B-v2 adopts a Curriculum Learning strategy, progressively mixing longer and more complex reasoning templates:
| Curriculum Stage | Focus & Sample Characteristics | Strategy Details |
|---|---|---|
| 📦 Stage 1: Format Inception | • Limit context within 4,096 tokens • Emphasize stable reasoning templates |
Focuses on short-to-medium length, cleanly formatted reasoning samples. The primary goal is to establish a reliable, structured reasoning output format (such as auto-closing <think> tags), preventing premature exposure to complex chains from causing format collapse. |
| 🛠️ Stage 2: Complexity Expansion | • Extend length to 4,096 - 8,192 tokens • Introduce high-difficulty logic samples |
Gradually increases the ratio of complex reasoning chains. By aligned distillation with "teacher models" whose reasoning style distributions closely match the Qwen3.6 base, the capacity gap is controlled to achieve highly efficient knowledge transfer. |
| 🚀 Stage 3: Long-Context SFT | • Progressively scale window up to 32K tokens • 10% high-quality short sample replay |
In this stage, the model is pushed to deep reasoning scenarios under ultra-long context and multi-turn dialogues. To prevent capacity drift or degradation of short-instruction comprehension during long-text training, a 10% replay of high-quality short samples is strictly enforced. |
🎨 7. Trace Inversion Case Studies (5 Key Domains Showcase)
To demonstrate how Trace Inversion reconstructs logical continuity and eliminates negative entropy, the following interactive panels show the contrast between raw compressed "Reasoning Bubbles" and the fully step-by-step reconstructed chain-of-thought (Learnable CoT) under 5 typical scenarios:
📐 Domain 1: Mathematics (Probability Calculation)
🚀 Domain 2: Physics (Kinematics)
💻 Domain 3: Coding (Algorithm Logic)
🧠 Domain 4: Logical Reasoning (Syllogism)
💡 Domain 5: Core Theory (Reasoning Bubble vs. Learnable CoT)
🤝 8. Collaboration & Training Details
This model is a collaborative milestone achieved with hardware engineer Kyle Hessling. You can follow him on X / Twitter: @KyleHessling1 to keep up with the latest hardware infrastructure and distributed training updates. 🙏
| Dimension | Details & Infrastructure |
|---|---|
| 🖥️ Training Hardware | NVIDIA DGX Cluster / H100 / RTX 6000 Pro |
| ⚙️ Fine-tuning Framework | Unsloth (used for highly efficient SFT of dense models and memory optimization) |
⚠️ 9. Known Training & Deployment Issues (IMPORTANT)
While the 27B dense model architecture is relatively stable, certain low-level framework compatibility issues may still surface during large-scale parameter updates and complex long-context training. It is highly recommended to monitor the following technical risk points during secondary fine-tuning and deployment:
| Module / Component | Issue & Troubleshooting Diagnostics |
|---|---|
| 🔀 Weight Merge (LoRA Merger) |
When merging LoRA adapters back into the base model, it is highly susceptible to peak memory out-of-memory (OOM) errors. Ensure the merging host has sufficient virtual memory or perform the low-precision merge on the CPU. |
| 🛠️ Dependency Compatibility | PEFT, Transformers 5.x fusion mode, and Unsloth patches may occasionally cause module import failures (ImportError) or weight mapping conflicts. Please align your dependency versions with those provided in our finetuning-guide repository. |
Local Fine-Tuning & Deployment Warning: If you attempt to run secondary fine-tuning or merge adapter weights locally, please proceed with caution and be prepared to manually patch model definition files or pin dependency versions strictly.
📚 10. Resources & Guides
👉 GitHub Repository: Jackrong-llm-finetuning-guide Access the repository to dive into the codebase and reproduce our results locally or on Google Colab.
🙏 11. Acknowledgements
Special thanks to:
- The Qwen team for providing the powerful Qwen3.6 base model.
- Unsloth for providing the highly efficient fine-tuning framework.
- Open-source datasets and community contributors.
- Kyle Hessling for the close collaboration on this project.
📖 12. Citation
@misc{jackrong_qwopus36_27b_v2,
title = {Qwopus3.6-27B-v2},
author = {Jackrong},
year = {2026},
publisher = {Hugging Face}
}
- Downloads last month
- 2,853
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit