Nemotron-9B-OpenCode

A 9B parameter instruction-tuned model specialized for autonomous software engineering agents, fine-tuned from Qwen3.5-9B on NVIDIA's Nemotron-SFT-OpenCode-v1 dataset.

Model Highlights

  • Specialized for Agentic Tasks: Trained on agent trajectories for the OpenCode CLI framework, enabling autonomous code navigation, multi-step tool use, and software engineering workflows
  • Multi-Capability: Supports general reasoning, tool calling, bash command execution, and dynamic skill loading
  • Production Ready: Compatible with Hugging Face Transformers, vLLM, SGLang, and OpenAI-compatible APIs

Model Description

Property Value
Base Model Qwen3.5-9B
Model Type Causal Language Model with Vision Encoder
Parameters 9B
Languages English, Chinese
License Apache 2.0
Developer Kassadin88

Training Data

This model was fine-tuned on Nemotron-SFT-OpenCode-v1, NVIDIA's agentic instruction tuning dataset containing 144,468 high-quality samples derived from 459K total trajectories. The dataset enhances LLMs' ability to operate within autonomous coding environments.

Dataset Composition

Subset Samples Description
general 90K General agentic CLI questions with/without AGENTS.md context
bash_only_tool 97K Restricted tool set (todo + bash) for foundational agent capabilities
bash_only_tool_skills 96K Bash + skill loading for dynamic capability discovery
question_tool 76K Interactive clarification via user questions during task execution
agent_skills 67K Dynamic skill scanning and loading for task-specific capabilities
agent_skills_question_tool 33K Combined skill loading + user clarification for complex tasks

Key Capabilities Trained

  • Code Navigation: Repository-aware reasoning and codebase traversal
  • Tool Calling: Structured tool invocation for bash, file operations, and more
  • Skill Loading: Dynamic discovery and loading of relevant agent skills
  • Interactive Planning: User clarification when requirements are ambiguous
  • Multi-Step Reasoning: SWE-Bench style problem decomposition and implementation

Benchmark Results

The model inherits strong foundational capabilities from Qwen3.5-9B. Below are the base model's benchmark performances:

Language Benchmarks

Category Benchmark Qwen3.5-9B
Knowledge & STEM
MMLU-Pro82.5
MMLU-Redux91.1
C-Eval88.2
GPQA Diamond81.7
Instruction Following
IFEval91.5
Long Context
LongBench v255.2
Reasoning & Coding
LiveCodeBench v665.6

Vision Language Benchmarks

Category Benchmark Qwen3.5-9B
STEM & Puzzle
MMMU78.4
MathVision78.9
Mathvista (mini)85.7
Document Understanding
OCRBench89.2
Video Understanding
VideoMME (w/ sub)84.5

Note: For complete benchmark results across all categories, please refer to the Qwen3.5-9B model card.

Quick Start

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Kassadin88/Nemotron-9B-OpenCode"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to merge two sorted arrays."}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Using vLLM (Recommended for Production)

from vllm import LLM, SamplingParams

llm = LLM(
    model="Kassadin88/Nemotron-9B-OpenCode",
    trust_remote_code=True,
    dtype="bfloat16"
)

sampling_params = SamplingParams(
    max_tokens=1024
)

outputs = llm.generate(prompts, sampling_params)

Using SGLang

python -m sglang.launch_server \
    --model-path Kassadin88/Nemotron-9B-OpenCode \
    --port 8000 \
    --tp-size 1

OpenAI-Compatible API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY"
)

response = client.chat.completions.create(
    model="Kassadin88/Nemotron-9B-OpenCode",
    messages=[
        {"role": "user", "content": "Write a quicksort implementation in Python"}
    ],
    max_tokens=512
)
print(response.choices[0].message.content)

Usage Tips

For Agentic Coding Tasks

messages = [
    {"role": "system", "content": "You are an autonomous coding agent. Use the available tools to complete tasks."},
    {"role": "user", "content": "Fix the bug in src/utils/parser.py that causes incorrect JSON parsing."}
]

For Code Generation

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=True
)

For Code Explanation

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True
)

Limitations

  • The model is primarily trained on agentic coding tasks and may not perform optimally on general conversational tasks
  • May occasionally generate incorrect or incomplete code
  • Should not be used for malicious code generation

Citation

@misc{nemotron-9b-opencode,
  author = {Kassadin88},
  title = {Nemotron-9B-OpenCode: An Instruction-Tuned Model for Autonomous Software Engineering},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Kassadin88/Nemotron-9B-OpenCode}
}

Acknowledgments

  • Base Model: Qwen Team for Qwen3.5-9B
  • Training Data: NVIDIA for Nemotron-SFT-OpenCode-v1
  • Training Framework: MS-Swift

Note: This model is intended for research and educational purposes. Please use responsibly.

Downloads last month
34
Safetensors
Model size
1.47M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kassadin88/Nemotron-9B-OpenCode

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(166)
this model
Quantizations
2 models