| --- |
| base_model: unsloth/Qwen3.6-27B |
| base_model_relation: finetune |
| library_name: transformers |
| tags: |
| - transformers |
| - safetensors |
| - qwen3_5 |
| - qwen3.6 |
| - multimodal |
| - image-text-to-text |
| - unsloth |
| language: |
| - en |
| license: apache-2.0 |
| pipeline_tag: image-text-to-text |
| --- |
| |
|  |
|
|
| # Ornstein-3.6-27B |
|
|
| A fine-tune of [Qwen 3.6 27B](https://huggingface.co/unsloth/Qwen3.6-27B), the dense multimodal (vision + text) member of the Qwen 3.6 family with hybrid linear + full attention. Part of the Ornstein series β reasoning- and agent-oriented fine-tunes built on a custom data curation pipeline. |
|
|
| > **GGUF quantizations available at [GestaltLabs/Ornstein-3.6-27B-GGUF](https://huggingface.co/GestaltLabs/Ornstein-3.6-27B-GGUF)** β Q8_0 down through aggressive 3-bit I-quants. |
| |
| ## Support This Work |
| |
| I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded β balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running. |
| |
| **[Support on Ko-fi](https://ko-fi.com/djlougen)** |
| |
| --- |
| |
| ## Details |
| |
| - **Developed by:** GestaltLabs |
| - **Architecture:** `Qwen3_5ForConditionalGeneration` β Qwen 3.6 dense with linear + full attention interleaved (Gated Delta Net) + vision encoder |
| - **Parameters:** ~27B total (dense, multimodal) |
| - **Hidden size / layers:** 5120 / 64 |
| - **Attention:** 24 heads, 4 KV heads, head_dim 256, full-attention every 4 layers (linear otherwise) |
| - **Context length:** 262,144 tokens |
| - **License:** Apache 2.0 |
| - **Base model:** [unsloth/Qwen3.6-27B](https://huggingface.co/unsloth/Qwen3.6-27B) |
| - **Training framework:** Unsloth |
| |
| ## Usage |
| |
| ### Transformers |
| |
| ```python |
| from transformers import AutoModelForImageTextToText, AutoProcessor |
| |
| model_id = "GestaltLabs/Ornstein-3.6-27B" |
| processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) |
| model = AutoModelForImageTextToText.from_pretrained( |
| model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True |
| ) |
|
|
| messages = [{"role": "user", "content": [{"type": "text", "text": "Write a haiku about hybrid attention."}]}] |
| inputs = processor.apply_chat_template( |
| messages, add_generation_prompt=True, tokenize=True, return_tensors="pt" |
| ).to(model.device) |
| out = model.generate(**inputs, max_new_tokens=512) |
| print(processor.batch_decode(out[:, inputs["input_ids"].shape[-1]:], skip_special_tokens=True)[0]) |
| ``` |
| |
| ### llama.cpp (via GGUF) |
|
|
| See the [GGUF repo](https://huggingface.co/GestaltLabs/Ornstein-3.6-27B-GGUF) β pick a quant that fits your memory (Q4_K_M is a strong default for 24 GB cards). |
|
|
| ## License |
|
|
| Apache 2.0 β inherited from the Qwen 3.6 base release. |
|
|
| [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |
|
|