Instructions to use sequelbox/Qwen3.6-27B-Tachibana-Agent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sequelbox/Qwen3.6-27B-Tachibana-Agent with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="sequelbox/Qwen3.6-27B-Tachibana-Agent")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sequelbox/Qwen3.6-27B-Tachibana-Agent")
model = AutoModelForCausalLM.from_pretrained("sequelbox/Qwen3.6-27B-Tachibana-Agent")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use sequelbox/Qwen3.6-27B-Tachibana-Agent with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sequelbox/Qwen3.6-27B-Tachibana-Agent"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sequelbox/Qwen3.6-27B-Tachibana-Agent",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/sequelbox/Qwen3.6-27B-Tachibana-Agent

SGLang

How to use sequelbox/Qwen3.6-27B-Tachibana-Agent with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sequelbox/Qwen3.6-27B-Tachibana-Agent" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sequelbox/Qwen3.6-27B-Tachibana-Agent",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sequelbox/Qwen3.6-27B-Tachibana-Agent" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sequelbox/Qwen3.6-27B-Tachibana-Agent",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use sequelbox/Qwen3.6-27B-Tachibana-Agent with Docker Model Runner:
```
docker model run hf.co/sequelbox/Qwen3.6-27B-Tachibana-Agent
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Support our open-source dataset and model releases!

Tachibana-Agent is a Qwen 3.6 agentic coding finetune, trained on the Tachibana 4 dataset.

Questions prioritize real-world, challenging agentic coding tasks across a variety of programming languages and topics. Synthetic prompts utilize a variety of personas, experience levels, and styles of communication to maximize real-world flexibility and usability.
Areas of focus include back-end and front-end development, systems programming, distributed systems, performance optimization, data structures, databases and data engineering, game and mobile development, security engineering, compiler design, custom tooling, task automation, practical bugfixes, and more!
A wide variety of emphasized languages improves development capability: Python, C, C++, C#, Go, TypeScript, Java, JavaScript, Rust, Haskell, SQL, Shell, R, Ruby, assembly code, and more!

Prompting Guide

Tachibana-Agent uses the Qwen3.6-27B prompt format and the following recommended general structure:

Start the prompt with your primary query
Include reference information after the primary query, using subheaders; documentation should follow "Documentation:\n\n", a stack trace following "Stack Trace:\n\n", etc for logs, schemas, specs, etc.
Attached files for the agent go at the end, with each file surrounded by file tags: <file path="myStuff/myRepo/myFirstFile.scala" language=Scala"> and </file>

Adherence to the specific format above is not required, but reflects the structure of the training data.

Example inference script to get started:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sequelbox/Qwen3.6-27B-Tachibana-Agent"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Implement CQRS for network appliance config management.\n\nRequirements:\n- Write side: 200 commands/sec, 4 command handlers, SQLite with custom journaling\n- Read side: 1000 queries/sec, 3 read projections in shared memory segments\n- Eventual consistency window: 100ms max\n- Handle atomic swap of projection memory for rebuilds\n- Binary configuration format versioning for schema evolution\n- Framework: libevent with custom protocol parser\n\nConstraints:\n- Manual memory management only, no garbage collection\n- Lock-free data structures where possible\n- Shared memory projections must survive process restarts\n- Command handlers must be thread-safe with 4 worker threads\n- Projection rebuild must not block queries\n- Binary format must support forward/backward compatibility\n- Error handling for corrupted journal recovery\n- Memory-mapped I/O for shared segments\n- Zero-copy where possible for performance\n\nDeliverables:\n1. Command processing pipeline with journaling\n2. Projection engine with shared memory management\n3. Query dispatcher with read-your-writes consistency\n4. Schema evolution system with versioned binary format\n5. Integration with libevent for network I/O\n6. Stress test showing 200 cmd/s + 1000 q/s sustained\n\nAssume x86_64 Linux, pthreads, atomic operations. No high-level frameworks."
#prompt = "Hey, I've been wrestling with state management for our Scala.js + Laminar trading dashboard, and I wanted to get your take on the best approach. We have five distinct state domains that all interact: financial portfolio positions, real-time market data (WebSocket feed), trade execution state, user risk limits, and chart visualization parameters. The current code uses a mix of `Var`s and manual subscriptions, but we're hitting nasty race conditions: a market data update can fire a trade execution callback that reads stale portfolio state, or an optimistic trade placement updates the portfolio before the server confirms but then the market data handler overwrites the pending state. The re-render count is around 40 components per action, which is way too high for our 10ms UI latency target.\n\nWe've got about 1500 normalized entities (positions, trades, symbols) with complex graph relationships — a trade references a position, a position references a symbol, and chart parameters depend on symbol+timeframe. Optimistic updates are required for three operations: trade placements, limit adjustments, and portfolio rebalances. The undo/redo scope is the entire trading day, with regulatory compliance requirements: every undo action must be logged with timestamp, user ID, and the before/after state snapshot. Oh, and our target is ≤5 re-renders on a typical action (like updating a single position's P&L).\n\nI'm torn between three approaches:\n1. **Cats State monad** for a pure functional core, with Laminar `EventBus` for side effects. This gives us referential transparency and composable state transformations, but I worry about wiring the whole graph into a single state transaction and whether we can keep the subscription graph fine-grained enough to hit ≤5 re-renders.\n2. **Laminar's own reactive primitives** (`Signal`, `Var`, `EventBus`) with careful manual wiring. It's simple and works well for small apps, but I've seen it become spaghetti as the state graph grows. Also, how do we ensure consistent snapshots for undo without global coordination?\n3. **A custom Redux-like store** built on Laminar `Var` + `Observer`, with a middleware stack for optimistic updates, undo logging, and audit trails. This is similar to what we used in React, but I'm not sure it's idiomatic in the Scala.js ecosystem.\n\nI've attached a very rough spec of the regulatory constraints we're under (mostly record-keeping and audit trail requirements). What's your gut feeling? Have you dealt with this kind of multi-domain reactive state in a financial context before? I'm leaning toward something like a two-layer approach: a functional core with `StateT[IO, ...]` for pure state transitions, then a thin Laminar adapter that subscribes to the latest snapshot and diffs for targeted re-renders. But I'd love to hear your thoughts before I go down any rabbit hole.\n\nOh, one more thing — the latency budget is tight. Market data ticks come every 50ms, and we must update the UI within 10ms of receiving a trade confirmation. So the state engine itself must be blazing fast (sub-millisecond for a typical action).\n\nDocumentation:\n## Regulatory Compliance Requirements for Trading Platform State Management\n\n### Record-Keeping (SEC Rule 17a-3 / MiFID II Article 25)\nAll state changes must be logged with:\n- Unique action ID (UUID)\n- Timestamp (nanosecond precision, UTC)\n- User ID who initiated the action\n- Action type (trade_placement, limit_adjustment, rebalance, undo_redo, manual_correction)\n- Before-state snapshot (full serialized copy of affected state domain)\n- After-state snapshot\n- Client IP / session identifier\n\n### Undo/Redo Constraints\n- Undo scope: current trading day (UTC). Previous day actions are immutable.\n- Undo stack depth: unlimited within the trading day, but must be persisted to PostgreSQL every 10 actions (crash recovery).\n- Undo must preserve optimistic updates: if an optimistic trade was later confirmed/rejected, undoing to a point before that trade must restore the optimistically updated state (which may have been overwritten by server confirmation).\n- Regulatory audit trail: every undo must record which action is being reversed, and the entire before/after chain must be reconstructable.\n\n### Target Performance\n- End-to-end UI update latency from WebSocket tick to DOM update: ≤10ms (p99).\n- State transition execution: ≤100µs for a single entity update.\n- Undo snapshot generation (full serialization of 1500 entities): ≤5ms.\n- Re-render count per action: ≤5 DOM operations.\n\n### Entity Relationships (simplified)\n- Portfolio: contains positions (1:N)\n- Position: references a symbol (N:1), contains trades (1:N)\n- Trade: references a position (N:1), may have a counter-trade (1:1)\n- Symbol: referenced by positions and chart parameters\n- ChartParameters: references a symbol (1:1), timeframe, indicators\n- UserRiskLimits: global limits per user, checked during optimistic operations\n\n### Optimistic Update Requirements\n- Trade placement: immediately show pending trade in position P&L, revert on rejection.\n- Limit adjustment: immediately reflect new limit in UI, revert if server denies.\n- Portfolio rebalance: immediately redistribute positions to target allocation, revert if any trade fails.\n- Optimistic state must be rolled back if the server response differs within 2 seconds, else the optimistic state becomes canonical.\n\n### Audit Log Format (JSON example)\n```json\n{\n  "action_id": "a1b2c3d4-...",\n  "timestamp": "2026-04-19T14:23:05.123456789Z",\n  "user_id": "trader-42",\n  "action_type": "trade_placement",\n  "before_state": {\n    "portfolio": { "positions": [/* serialized positions */] },\n    "execution": { "pending_trades": [] }\n  },\n  "after_state": {\n    "portfolio": { "positions": [/* updated */] },\n    "execution": { "pending_trades": [{ "id": "txn-001", "status": "optimistic" }] }\n  },\n  "client_ip": "10.0.1.42"\n}\n```"
#prompt = "HFT order book. PriceLevel objects (48 bytes) allocated/freed 5M/s. General-purpose allocator 19% CPU. Locked pool gives p99 latency spikes from mutex contention. PriceLevel.cpp and LockedObjectPool.h attached.\n\nReplace LockedObjectPool with a wait-free pool. Single producer (book update thread) allocates/frees. Multiple consumer threads only read PriceLevel members (no dealloc). Must avoid any atomic RMW (no cmpxchg, no fetch_add) in fast path – only aligned loads/stores. Thread-local for the producer thread is acceptable. pool must support dynamic growth? No – static pool of 1M slots is fine. Override operator new/delete on PriceLevel to use this pool.\n\nTarget: <1% CPU overhead for allocation. No lock acquisitions. No syscalls.\n\nExisting code below. Write new Pool.hpp and modified PriceLevel.hpp.\n\n<file path="include/PriceLevel.hpp" language="C++">\n#ifndef PRICELEVEL_HPP\n#define PRICELEVEL_HPP\n\n#include <cstdint>\n#include <atomic>\n#include <cstddef>\n#include <new>\n\nclass PriceLevel {\npublic:\n    PriceLevel(uint64_t price, char side)\n        : price_(price)\n        , side_(side)\n        , order_list_(nullptr)\n        , order_count_(0)\n        , total_volume_(0)\n    {}\n\n    // Accessors (called by consumer threads – must be safe for concurrent read)\n    uint64_t price() const noexcept { return price_; }\n    char side() const noexcept { return side_; }\n    uint64_t totalVolume() const noexcept { return total_volume_.load(std::memory_order_acquire); }\n    uint32_t orderCount() const noexcept { return order_count_.load(std::memory_order_acquire); }\n\n    // Mutators (called only by producer thread – no concurrency)\n    void addOrder(void* order_node, uint64_t volume) noexcept {\n        // push to linked list, update counts – single thread\n        // (implementation omitted for brevity in existing code)\n        // In real code: order_list_ = insert; order_count_++; total_volume_ += volume;\n        // But for this test we just show the pattern.\n        order_count_.fetch_add(1, std::memory_order_release);\n        total_volume_.fetch_add(volume, std::memory_order_release);\n    }\n\n    void removeOrder(void* order_node, uint64_t volume) noexcept {\n        order_count_.fetch_sub(1, std::memory_order_release);\n        total_volume_.fetch_sub(volume, std::memory_order_release);\n    }\n\n    // Custom allocator support\n    static void* operator new(std::size_t sz) {\n        return Pool::allocate(sz);\n    }\n\n    static void operator delete(void* ptr) noexcept {\n        Pool::deallocate(ptr);\n    }\n\nprivate:\n    uint64_t price_;\n    char side_;\n    void* order_list_;          // head of intrusive list (single producer)\n    std::atomic<uint32_t> order_count_{0};\n    std::atomic<uint64_t> total_volume_{0};\n\n    // Current slow pool – will be replaced\n    class Pool {\n    public:\n        static void* allocate(std::size_t) { return nullptr; }\n        static void deallocate(void*) {}\n    };\n};\n\n#endif // PRICELEVEL_HPP\n\n</file>\n\n<file path="src/LockedObjectPool.hpp" language="C++">\n#ifndef LOCKEDOBJECTPOOL_HPP\n#define LOCKEDOBJECTPOOL_HPP\n\n#include <cstddef>\n#include <mutex>\n#include <vector>\n#include <cstdint>\n\n// Current slow implementation – lock-based object pool.\n// Used by PriceLevel via Pool forward-declaration.\n// Must be replaced with a wait-free version.\n\nclass LockedObjectPool {\npublic:\n    LockedObjectPool(std::size_t object_size, std::size_t capacity = 1'000'000)\n        : object_size_(object_size)\n        , capacity_(capacity)\n    {\n        // Pre-allocate contiguous chunk of memory\n        chunk_ = static_cast<char*>(::operator new(object_size_ * capacity_));\n        free_list_.reserve(capacity_);\n        for (std::size_t i = 0; i < capacity_; ++i) {\n            free_list_.push_back(i);\n        }\n    }\n\n    ~LockedObjectPool() {\n        ::operator delete(chunk_);\n    }\n\n    // Allocate a block – O(1) amortized, but takes a lock\n    void* allocate() {\n        std::lock_guard<std::mutex> lock(mutex_);\n        if (free_list_.empty()) {\n            // Out of memory – should not happen in tuned system\n            return nullptr;\n        }\n        std::size_t index = free_list_.back();\n        free_list_.pop_back();\n        return chunk_ + index * object_size_;\n    }\n\n    // Deallocate – returns block to free list\n    void deallocate(void* ptr) {\n        if (!ptr) return;\n        std::size_t index = (static_cast<char*>(ptr) - chunk_) / object_size_;\n        std::lock_guard<std::mutex> lock(mutex_);\n        free_list_.push_back(index);\n    }\n\nprivate:\n    std::size_t object_size_;\n    std::size_t capacity_;\n    char* chunk_;\n    std::vector<std::size_t> free_list_;\n    std::mutex mutex_;\n};\n\n#endif // LOCKEDOBJECTPOOL_HPP\n\n</file>"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=100000
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 248069 (</think>)
    index = len(output_ids) - output_ids[::-1].index(248069)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)