Nemotron-9B-OpenCode
A 9B parameter instruction-tuned model specialized for autonomous software engineering agents, fine-tuned from Qwen3.5-9B on NVIDIA's Nemotron-SFT-OpenCode-v1 dataset.
Model Highlights
- Specialized for Agentic Tasks: Trained on agent trajectories for the OpenCode CLI framework, enabling autonomous code navigation, multi-step tool use, and software engineering workflows
- Multi-Capability: Supports general reasoning, tool calling, bash command execution, and dynamic skill loading
- Production Ready: Compatible with Hugging Face Transformers, vLLM, SGLang, and OpenAI-compatible APIs
Model Description
| Property | Value |
|---|---|
| Base Model | Qwen3.5-9B |
| Model Type | Causal Language Model with Vision Encoder |
| Parameters | 9B |
| Languages | English, Chinese |
| License | Apache 2.0 |
| Developer | Kassadin88 |
Training Data
This model was fine-tuned on Nemotron-SFT-OpenCode-v1, NVIDIA's agentic instruction tuning dataset containing 144,468 high-quality samples derived from 459K total trajectories. The dataset enhances LLMs' ability to operate within autonomous coding environments.
Dataset Composition
| Subset | Samples | Description |
|---|---|---|
general |
90K | General agentic CLI questions with/without AGENTS.md context |
bash_only_tool |
97K | Restricted tool set (todo + bash) for foundational agent capabilities |
bash_only_tool_skills |
96K | Bash + skill loading for dynamic capability discovery |
question_tool |
76K | Interactive clarification via user questions during task execution |
agent_skills |
67K | Dynamic skill scanning and loading for task-specific capabilities |
agent_skills_question_tool |
33K | Combined skill loading + user clarification for complex tasks |
Key Capabilities Trained
- Code Navigation: Repository-aware reasoning and codebase traversal
- Tool Calling: Structured tool invocation for bash, file operations, and more
- Skill Loading: Dynamic discovery and loading of relevant agent skills
- Interactive Planning: User clarification when requirements are ambiguous
- Multi-Step Reasoning: SWE-Bench style problem decomposition and implementation
Benchmark Results
The model inherits strong foundational capabilities from Qwen3.5-9B. Below are the base model's benchmark performances:
Language Benchmarks
| Category | Benchmark | Qwen3.5-9B |
|---|---|---|
| Knowledge & STEM | ||
| MMLU-Pro | 82.5 | |
| MMLU-Redux | 91.1 | |
| C-Eval | 88.2 | |
| GPQA Diamond | 81.7 | |
| Instruction Following | ||
| IFEval | 91.5 | |
| Long Context | ||
| LongBench v2 | 55.2 | |
| Reasoning & Coding | ||
| LiveCodeBench v6 | 65.6 |
Vision Language Benchmarks
| Category | Benchmark | Qwen3.5-9B |
|---|---|---|
| STEM & Puzzle | ||
| MMMU | 78.4 | |
| MathVision | 78.9 | |
| Mathvista (mini) | 85.7 | |
| Document Understanding | ||
| OCRBench | 89.2 | |
| Video Understanding | ||
| VideoMME (w/ sub) | 84.5 |
Note: For complete benchmark results across all categories, please refer to the Qwen3.5-9B model card.
Quick Start
Using Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Kassadin88/Nemotron-9B-OpenCode"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
messages = [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to merge two sorted arrays."}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Using vLLM (Recommended for Production)
from vllm import LLM, SamplingParams
llm = LLM(
model="Kassadin88/Nemotron-9B-OpenCode",
trust_remote_code=True,
dtype="bfloat16"
)
sampling_params = SamplingParams(
max_tokens=1024
)
outputs = llm.generate(prompts, sampling_params)
Using SGLang
python -m sglang.launch_server \
--model-path Kassadin88/Nemotron-9B-OpenCode \
--port 8000 \
--tp-size 1
OpenAI-Compatible API
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY"
)
response = client.chat.completions.create(
model="Kassadin88/Nemotron-9B-OpenCode",
messages=[
{"role": "user", "content": "Write a quicksort implementation in Python"}
],
max_tokens=512
)
print(response.choices[0].message.content)
Usage Tips
For Agentic Coding Tasks
messages = [
{"role": "system", "content": "You are an autonomous coding agent. Use the available tools to complete tasks."},
{"role": "user", "content": "Fix the bug in src/utils/parser.py that causes incorrect JSON parsing."}
]
For Code Generation
outputs = model.generate(
**inputs,
max_new_tokens=1024,
do_sample=True
)
For Code Explanation
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True
)
Limitations
- The model is primarily trained on agentic coding tasks and may not perform optimally on general conversational tasks
- May occasionally generate incorrect or incomplete code
- Should not be used for malicious code generation
Citation
@misc{nemotron-9b-opencode,
author = {Kassadin88},
title = {Nemotron-9B-OpenCode: An Instruction-Tuned Model for Autonomous Software Engineering},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/Kassadin88/Nemotron-9B-OpenCode}
}
Acknowledgments
- Base Model: Qwen Team for Qwen3.5-9B
- Training Data: NVIDIA for Nemotron-SFT-OpenCode-v1
- Training Framework: MS-Swift
Note: This model is intended for research and educational purposes. Please use responsibly.
- Downloads last month
- 34