Qwen3.5-9b-Sushi-Coder-RL-GGUF

Lineage

Base model lineage: bigatuna/Qwen3.5-9b-Sushi-Coder
RL model: bigatuna/Qwen3.5-9b-Sushi-Coder-RL
RL pipeline: NousResearch/atropos

Training

The upstream SFT model was trained with Unsloth on:

The RL stage was then run for coding with NousResearch/hermes-agent using NousResearch/atropos.

During that run, vLLM was patched with vllm-project/vllm PR #36395, fix(lora): add bounds checking for TP configurations, to address the LoRA tensor-parallel bounds issue.

Files

Qwen3.5-9b-Sushi-Coder-RL.Q4_K_M.gguf
Qwen3.5-9b-Sushi-Coder-RL.Q8_0.gguf
Qwen3.5-9b-Sushi-Coder-RL.BF16-mmproj.gguf

Usage Note

This is a multimodal Qwen 3.5 export. Use the text GGUF together with the BF16-mmproj file.

Quick Start

Example download commands with the Hugging Face CLI:

hf download bigatuna/Qwen3.5-9b-Sushi-Coder-RL-GGUF \
  Qwen3.5-9b-Sushi-Coder-RL.Q4_K_M.gguf \
  Qwen3.5-9b-Sushi-Coder-RL.BF16-mmproj.gguf

Alternative quant:

hf download bigatuna/Qwen3.5-9b-Sushi-Coder-RL-GGUF \
  Qwen3.5-9b-Sushi-Coder-RL.Q8_0.gguf \
  Qwen3.5-9b-Sushi-Coder-RL.BF16-mmproj.gguf

Metadata

License: Apache-2.0
Architecture: Qwen 3.5
Format: GGUF
Tags: llama.cpp, qwen3_5, multimodal, code, rl, conversational

LiveCodeBench Evaluation

The benchmark results below were produced from matched local BF16 vLLM endpoints so the RL model and the base model were evaluated with the same serving method, the same task, and the same generation settings.

Evaluated models:

RL model: bigatuna/Qwen3.5-9b-Sushi-Coder-RL
Base model: Qwen/Qwen3.5-9B

Matched benchmark setup:

Task: lcb:codegeneration|0
Benchmark size: 268 problems
Backend: lighteval endpoint litellm
Context length: 4096
vLLM dtype: bfloat16
Same max_new_tokens, same prompt/task, same serving stack, same evaluation harness

Matched Full Results

Deterministic run:

temperature=0.0
top_p=1.0
seed=0
max_new_tokens=1024

Results:

RL model: codegen_pass@1:16 = 0.2015 +/- 0.0245
Base model: codegen_pass@1:16 = 0.0336 +/- 0.0110

Approximate passes:

RL model: 54 / 268
Base model: 9 / 268

Sampling run:

temperature=0.6
top_p=0.95
top_k=20
min_p=0.0
presence_penalty=0.0
repetition_penalty=1.0
max_new_tokens=1024

Results:

RL model: codegen_pass@1:16 = 0.2388 +/- 0.0261
Base model: codegen_pass@1:16 = 0.0261 +/- 0.0098

Approximate passes:

RL model: 64 / 268
Base model: 7 / 268

In both matched full runs, the RL model outperformed the base model by a wide margin.

Exact Reproduction Commands

These are the exact command shapes used for the matched local evaluation.

1. Start the RL endpoint

export CUDA_VISIBLE_DEVICES=1
vllm serve \
  <PATH_TO_YOUR_RL_MERGED_MODEL> \
  --host 0.0.0.0 \
  --port 9001 \
  --served-model-name bigatuna/Qwen3.5-9b-Sushi-Coder-RL \
  --max-model-len 4096 \
  --dtype bfloat16 \
  --gpu-memory-utilization 0.45

2. Start the base endpoint

export CUDA_VISIBLE_DEVICES=0
vllm serve \
  Qwen/Qwen3.5-9B \
  --host 0.0.0.0 \
  --port 9002 \
  --served-model-name Qwen/Qwen3.5-9B \
  --max-model-len 4096 \
  --dtype bfloat16 \
  --gpu-memory-utilization 0.45

3. Deterministic matched run

RL model:

cat > /tmp/lighteval_rl.yaml <<'EOF'
model_parameters:
  provider: "openai"
  model_name: "openai/bigatuna/Qwen3.5-9b-Sushi-Coder-RL"
  base_url: "http://localhost:9001/v1"
  api_key: "dummy"
  generation_parameters:
    temperature: 0.0
    max_new_tokens: 1024
    top_p: 1.0
    seed: 0
EOF

lighteval endpoint litellm \
  /tmp/lighteval_rl.yaml \
  'lcb:codegeneration|0' \
  --output-dir /tmp/lcb_rl_full \
  --save-details

Base model:

cat > /tmp/lighteval_base.yaml <<'EOF'
model_parameters:
  provider: "openai"
  model_name: "openai/Qwen/Qwen3.5-9B"
  base_url: "http://localhost:9002/v1"
  api_key: "dummy"
  generation_parameters:
    temperature: 0.0
    max_new_tokens: 1024
    top_p: 1.0
    seed: 0
EOF

lighteval endpoint litellm \
  /tmp/lighteval_base.yaml \
  'lcb:codegeneration|0' \
  --output-dir /tmp/lcb_base_full \
  --save-details

4. Temperature 0.6 matched run

RL model:

cat > /tmp/lighteval_rl_t06.yaml <<'EOF'
model_parameters:
  provider: "openai"
  model_name: "openai/bigatuna/Qwen3.5-9b-Sushi-Coder-RL"
  base_url: "http://localhost:9001/v1"
  api_key: "dummy"
  generation_parameters:
    temperature: 0.6
    max_new_tokens: 1024
    top_p: 0.95
    top_k: 20
    min_p: 0.0
    presence_penalty: 0.0
    repetition_penalty: 1.0
EOF

lighteval endpoint litellm \
  /tmp/lighteval_rl_t06.yaml \
  'lcb:codegeneration|0' \
  --output-dir /tmp/lcb_rl_full_t06 \
  --save-details

Base model:

cat > /tmp/lighteval_base_t06.yaml <<'EOF'
model_parameters:
  provider: "openai"
  model_name: "openai/Qwen/Qwen3.5-9B"
  base_url: "http://localhost:9002/v1"
  api_key: "dummy"
  generation_parameters:
    temperature: 0.6
    max_new_tokens: 1024
    top_p: 0.95
    top_k: 20
    min_p: 0.0
    presence_penalty: 0.0
    repetition_penalty: 1.0
EOF

lighteval endpoint litellm \
  /tmp/lighteval_base_t06.yaml \
  'lcb:codegeneration|0' \
  --output-dir /tmp/lcb_base_full_t06 \
  --save-details

Downloads last month: 15,874

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

8-bit

Model tree for bigatuna/Qwen3.5-9b-Sushi-Coder-RL-GGUF

Base model

bigatuna/Qwen3.5-9b-Sushi-Coder

Adapter

bigatuna/Qwen3.5-9b-Sushi-Coder-RL

Quantized

(2)

this model

bigatuna
/

Qwen3.5-9b-Sushi-Coder-RL-GGUF