Nemotron-9B-OpenCode

A 9B parameter instruction-tuned model specialized for autonomous software engineering agents, fine-tuned from Qwen3.5-9B on NVIDIA's Nemotron-SFT-OpenCode-v1 dataset.

Model Highlights

Specialized for Agentic Tasks: Trained on agent trajectories for the OpenCode CLI framework, enabling autonomous code navigation, multi-step tool use, and software engineering workflows
Multi-Capability: Supports general reasoning, tool calling, bash command execution, and dynamic skill loading
Production Ready: Compatible with Hugging Face Transformers, vLLM, SGLang, and OpenAI-compatible APIs

Model Description

Property	Value
Base Model	Qwen3.5-9B
Model Type	Causal Language Model with Vision Encoder
Parameters	9B
Languages	English, Chinese
License	Apache 2.0
Developer	Kassadin88

Training Data

This model was fine-tuned on Nemotron-SFT-OpenCode-v1, NVIDIA's agentic instruction tuning dataset containing 144,468 high-quality samples derived from 459K total trajectories. The dataset enhances LLMs' ability to operate within autonomous coding environments.

Dataset Composition

Subset	Samples	Description
`general`	90K	General agentic CLI questions with/without AGENTS.md context
`bash_only_tool`	97K	Restricted tool set (todo + bash) for foundational agent capabilities
`bash_only_tool_skills`	96K	Bash + skill loading for dynamic capability discovery
`question_tool`	76K	Interactive clarification via user questions during task execution
`agent_skills`	67K	Dynamic skill scanning and loading for task-specific capabilities
`agent_skills_question_tool`	33K	Combined skill loading + user clarification for complex tasks

Key Capabilities Trained

Code Navigation: Repository-aware reasoning and codebase traversal
Tool Calling: Structured tool invocation for bash, file operations, and more
Skill Loading: Dynamic discovery and loading of relevant agent skills
Interactive Planning: User clarification when requirements are ambiguous
Multi-Step Reasoning: SWE-Bench style problem decomposition and implementation

Benchmark Results

The model inherits strong foundational capabilities from Qwen3.5-9B. Below are the base model's benchmark performances:

Language Benchmarks

Category	Benchmark	Qwen3.5-9B
Knowledge & STEM
	MMLU-Pro	82.5
	MMLU-Redux	91.1
	C-Eval	88.2
	GPQA Diamond	81.7
Instruction Following
Instruction Following	IFEval	91.5
Long Context
Long Context	LongBench v2	55.2
Reasoning & Coding
Reasoning & Coding	LiveCodeBench v6	65.6

Vision Language Benchmarks

Category	Benchmark	Qwen3.5-9B
STEM & Puzzle
	MMMU	78.4
	MathVision	78.9
	Mathvista (mini)	85.7
Document Understanding
Document Understanding	OCRBench	89.2
Video Understanding
Video Understanding	VideoMME (w/ sub)	84.5

Note: For complete benchmark results across all categories, please refer to the Qwen3.5-9B model card.

Quick Start

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Kassadin88/Nemotron-9B-OpenCode"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to merge two sorted arrays."}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Using vLLM (Recommended for Production)

from vllm import LLM, SamplingParams

llm = LLM(
    model="Kassadin88/Nemotron-9B-OpenCode",
    trust_remote_code=True,
    dtype="bfloat16"
)

sampling_params = SamplingParams(
    max_tokens=1024
)

outputs = llm.generate(prompts, sampling_params)

Using SGLang

python -m sglang.launch_server \
    --model-path Kassadin88/Nemotron-9B-OpenCode \
    --port 8000 \
    --tp-size 1

OpenAI-Compatible API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY"
)

response = client.chat.completions.create(
    model="Kassadin88/Nemotron-9B-OpenCode",
    messages=[
        {"role": "user", "content": "Write a quicksort implementation in Python"}
    ],
    max_tokens=512
)
print(response.choices[0].message.content)

Usage Tips

For Agentic Coding Tasks

messages = [
    {"role": "system", "content": "You are an autonomous coding agent. Use the available tools to complete tasks."},
    {"role": "user", "content": "Fix the bug in src/utils/parser.py that causes incorrect JSON parsing."}
]

For Code Generation

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=True
)

For Code Explanation

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True
)

Limitations

The model is primarily trained on agentic coding tasks and may not perform optimally on general conversational tasks
May occasionally generate incorrect or incomplete code
Should not be used for malicious code generation

Citation

@misc{nemotron-9b-opencode,
  author = {Kassadin88},
  title = {Nemotron-9B-OpenCode: An Instruction-Tuned Model for Autonomous Software Engineering},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Kassadin88/Nemotron-9B-OpenCode}
}

Acknowledgments

Base Model: Qwen Team for Qwen3.5-9B
Training Data: NVIDIA for Nemotron-SFT-OpenCode-v1
Training Framework: MS-Swift

Note: This model is intended for research and educational purposes. Please use responsibly.

Downloads last month: 34

Safetensors

Model size

1.47M params

Tensor type

BF16

Model tree for Kassadin88/Nemotron-9B-OpenCode

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(166)

this model

Quantizations

2 models