Qwen3.5-9b-Sushi-Coder-RL-GGUF

Qwen3.5-9b-Sushi-Coder-RL-GGUF

Lineage

Training

The upstream SFT model was trained with Unsloth on:

The RL stage was then run for coding with NousResearch/hermes-agent using NousResearch/atropos.

During that run, vLLM was patched with vllm-project/vllm PR #36395, fix(lora): add bounds checking for TP configurations, to address the LoRA tensor-parallel bounds issue.

Files

  • Qwen3.5-9b-Sushi-Coder-RL.Q4_K_M.gguf
  • Qwen3.5-9b-Sushi-Coder-RL.Q8_0.gguf
  • Qwen3.5-9b-Sushi-Coder-RL.BF16-mmproj.gguf

Usage Note

This is a multimodal Qwen 3.5 export. Use the text GGUF together with the BF16-mmproj file.

Quick Start

Example download commands with the Hugging Face CLI:

hf download bigatuna/Qwen3.5-9b-Sushi-Coder-RL-GGUF \
  Qwen3.5-9b-Sushi-Coder-RL.Q4_K_M.gguf \
  Qwen3.5-9b-Sushi-Coder-RL.BF16-mmproj.gguf

Alternative quant:

hf download bigatuna/Qwen3.5-9b-Sushi-Coder-RL-GGUF \
  Qwen3.5-9b-Sushi-Coder-RL.Q8_0.gguf \
  Qwen3.5-9b-Sushi-Coder-RL.BF16-mmproj.gguf

Metadata

  • License: Apache-2.0
  • Architecture: Qwen 3.5
  • Format: GGUF
  • Tags: llama.cpp, qwen3_5, multimodal, code, rl, conversational

LiveCodeBench Evaluation

The benchmark results below were produced from matched local BF16 vLLM endpoints so the RL model and the base model were evaluated with the same serving method, the same task, and the same generation settings.

Evaluated models:

  • RL model: bigatuna/Qwen3.5-9b-Sushi-Coder-RL
  • Base model: Qwen/Qwen3.5-9B

Matched benchmark setup:

  • Task: lcb:codegeneration|0
  • Benchmark size: 268 problems
  • Backend: lighteval endpoint litellm
  • Context length: 4096
  • vLLM dtype: bfloat16
  • Same max_new_tokens, same prompt/task, same serving stack, same evaluation harness

Matched Full Results

Deterministic run:

  • temperature=0.0
  • top_p=1.0
  • seed=0
  • max_new_tokens=1024

Results:

  • RL model: codegen_pass@1:16 = 0.2015 +/- 0.0245
  • Base model: codegen_pass@1:16 = 0.0336 +/- 0.0110

Approximate passes:

  • RL model: 54 / 268
  • Base model: 9 / 268

Sampling run:

  • temperature=0.6
  • top_p=0.95
  • top_k=20
  • min_p=0.0
  • presence_penalty=0.0
  • repetition_penalty=1.0
  • max_new_tokens=1024

Results:

  • RL model: codegen_pass@1:16 = 0.2388 +/- 0.0261
  • Base model: codegen_pass@1:16 = 0.0261 +/- 0.0098

Approximate passes:

  • RL model: 64 / 268
  • Base model: 7 / 268

In both matched full runs, the RL model outperformed the base model by a wide margin.

Exact Reproduction Commands

These are the exact command shapes used for the matched local evaluation.

1. Start the RL endpoint

export CUDA_VISIBLE_DEVICES=1
vllm serve \
  <PATH_TO_YOUR_RL_MERGED_MODEL> \
  --host 0.0.0.0 \
  --port 9001 \
  --served-model-name bigatuna/Qwen3.5-9b-Sushi-Coder-RL \
  --max-model-len 4096 \
  --dtype bfloat16 \
  --gpu-memory-utilization 0.45

2. Start the base endpoint

export CUDA_VISIBLE_DEVICES=0
vllm serve \
  Qwen/Qwen3.5-9B \
  --host 0.0.0.0 \
  --port 9002 \
  --served-model-name Qwen/Qwen3.5-9B \
  --max-model-len 4096 \
  --dtype bfloat16 \
  --gpu-memory-utilization 0.45

3. Deterministic matched run

RL model:

cat > /tmp/lighteval_rl.yaml <<'EOF'
model_parameters:
  provider: "openai"
  model_name: "openai/bigatuna/Qwen3.5-9b-Sushi-Coder-RL"
  base_url: "http://localhost:9001/v1"
  api_key: "dummy"
  generation_parameters:
    temperature: 0.0
    max_new_tokens: 1024
    top_p: 1.0
    seed: 0
EOF

lighteval endpoint litellm \
  /tmp/lighteval_rl.yaml \
  'lcb:codegeneration|0' \
  --output-dir /tmp/lcb_rl_full \
  --save-details

Base model:

cat > /tmp/lighteval_base.yaml <<'EOF'
model_parameters:
  provider: "openai"
  model_name: "openai/Qwen/Qwen3.5-9B"
  base_url: "http://localhost:9002/v1"
  api_key: "dummy"
  generation_parameters:
    temperature: 0.0
    max_new_tokens: 1024
    top_p: 1.0
    seed: 0
EOF

lighteval endpoint litellm \
  /tmp/lighteval_base.yaml \
  'lcb:codegeneration|0' \
  --output-dir /tmp/lcb_base_full \
  --save-details

4. Temperature 0.6 matched run

RL model:

cat > /tmp/lighteval_rl_t06.yaml <<'EOF'
model_parameters:
  provider: "openai"
  model_name: "openai/bigatuna/Qwen3.5-9b-Sushi-Coder-RL"
  base_url: "http://localhost:9001/v1"
  api_key: "dummy"
  generation_parameters:
    temperature: 0.6
    max_new_tokens: 1024
    top_p: 0.95
    top_k: 20
    min_p: 0.0
    presence_penalty: 0.0
    repetition_penalty: 1.0
EOF

lighteval endpoint litellm \
  /tmp/lighteval_rl_t06.yaml \
  'lcb:codegeneration|0' \
  --output-dir /tmp/lcb_rl_full_t06 \
  --save-details

Base model:

cat > /tmp/lighteval_base_t06.yaml <<'EOF'
model_parameters:
  provider: "openai"
  model_name: "openai/Qwen/Qwen3.5-9B"
  base_url: "http://localhost:9002/v1"
  api_key: "dummy"
  generation_parameters:
    temperature: 0.6
    max_new_tokens: 1024
    top_p: 0.95
    top_k: 20
    min_p: 0.0
    presence_penalty: 0.0
    repetition_penalty: 1.0
EOF

lighteval endpoint litellm \
  /tmp/lighteval_base_t06.yaml \
  'lcb:codegeneration|0' \
  --output-dir /tmp/lcb_base_full_t06 \
  --save-details
Downloads last month
15,874
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bigatuna/Qwen3.5-9b-Sushi-Coder-RL-GGUF

Quantized
(2)
this model

Datasets used to train bigatuna/Qwen3.5-9b-Sushi-Coder-RL-GGUF