OmniCoder-9B CoreGen HDLFix v2 Merged BF16

This repository contains the fully merged BF16 checkpoint for a local fine-tune of armand0e/OmniCoder-9B-Claude-Opus-High-Reasoning-Distill targeted at practical coding work with stronger HDL and embedded behavior.

The run behind this release is omnicoder_local9b_blackwell_coregen_hdlfix_v2_hf_r64_epoch1. It was built to fix the main issues seen in an earlier short-run adapter: too little effective dataset coverage, visible reasoning markup leaking into supervision, and weak coverage around bus and peripheral tasks such as Wishbone and SPI.

What Changed Relative to the Earlier Core-Generalist Run

The training run was changed from a short fixed-step pilot to a full epoch over the cleaned coregen_hdlfix_v2 prepared set.
Visible <think> and <answer> scaffolding was stripped out of supervision.
HDL prompt-output leakage from verilog-instruct style data was excluded from the user side of the conversation format.
HDL coverage was expanded with focused sources such as HDL-Instruct, expanded_rtlcoder, RTLLM, and verilogeval-v2-spec-to-rtl.
Bus and peripheral sampling was deliberately increased. In the prepared train split used for this run, the relaunch notes recorded about 19 wishbone mentions and about 78 HDL-side spi mentions.

Intended Use

This model is intended as a practical coding assistant with emphasis on:

RTL and HDL generation, explanation, and review
bus and peripheral oriented Verilog tasks
general code generation and patching
code review style feedback
embedded and firmware-adjacent tasks
basic tool-use and math-heavy coding prompts

It is a better fit for code-centric workflows than for general open-domain chat.

Training Data Summary

The merged model comes from a LoRA fine-tune on the prepared dataset prepared_omnicoder_mixed_data_coregen_hdlfix_v2.

Train examples: 3120
Eval examples: 428
Sequence length: 1024

The prepared mix included these retained train and eval counts:

Family	Train	Eval
HDL-Instruct including bus-focused sampling	544	72
expanded_rtlcoder including bus-focused sampling	320	40
CodeV-R1	320	40
CodeV-SVA	192	24
verilogeval spec-to-RTL	120	16
RTLLM	40	8
codefeedback_instruction	256	32
code_feedback	224	32
commitpackft	224	32
github_codereview	160	24
github_code	128	16
stm32_hal	160	24
electronics_stackexchange	96	16
arduino_stackexchange	64	12
iot_stackexchange	32	8
qwen_toolcalling	96	12
toolscale	48	8
nemotron_math	96	12

This is still a curated pilot-scale mix, not a frontier-scale training corpus. The purpose of the run was to harden the local 9B pipeline and correct the earlier HDL coverage failure, not to present a final benchmarked production release.

Training Configuration

Base model: armand0e/OmniCoder-9B-Claude-Opus-High-Reasoning-Distill
Training engine: hf
Device: cuda
LoRA rank: 64
Training dtype: torch.bfloat16
Base load mode: 4-bit
Learning rate: 7e-5
Gradient accumulation: 2
Train epochs: 1.0
Optimizer update steps: 1560
Batch size per device: 1
Trainable parameters: 116,391,936
Total parameters reported by the training summary: 5,841,364,208
Final eval loss: 0.6533

The merged release was exported as BF16 safetensors on GPU and written into four shards.

Files in This Repository

model-00001-of-00004.safetensors - about 4.603 GB
model-00002-of-00004.safetensors - about 4.645 GB
model-00003-of-00004.safetensors - about 4.615 GB
model-00004-of-00004.safetensors - about 3.664 GB
model.safetensors.index.json
tokenizer, processor, config, and chat template files
merge_summary.json

How to Use

This checkpoint follows the OmniCoder / Qwen3.5 multimodal stack and should be loaded with the same Transformers classes as the base model family.

import torch
from transformers import AutoModelForImageTextToText, AutoProcessor

model_id = "tianrui6641/omnicoder_local9b_blackwell_coregen_hdlfix_v2_hf_r64_epoch1-merged-bf16"

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Write a Wishbone-attached SPI controller in Verilog and explain the register map."}
        ],
    }
]

prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, return_tensors="pt")
inputs = {key: value.to(model.device) for key, value in inputs.items()}

output = model.generate(**inputs, max_new_tokens=512)
generated = output[:, inputs["input_ids"].shape[1]:]
print(processor.batch_decode(generated, skip_special_tokens=True)[0])

For local inference via GGUF or LM Studio, use the separate GGUF release.

Limitations

The model has not yet been validated against a full external benchmark harness for HDL, embedded development, code review, tool use, and math.
The main numeric result available for this run is held-out eval loss on the curated internal mix, not a broad public benchmark suite.
Although HDL behavior was explicitly improved, this is still not a replacement for verification flows, synthesis, linting, or formal checking.
The model can still hallucinate APIs, bus semantics, timing assumptions, or reset behavior.
Multimodal behavior is inherited from the base OmniCoder family, but this particular release was optimized primarily for coding tasks rather than general VLM use.

Recommended Usage Pattern

Use this merged BF16 repo if you want to continue fine-tuning, convert again, or run the original Transformers-format checkpoint.
Use the separate GGUF repo if you want a local inference package for LM Studio or llama.cpp style runtimes.
Prefer verification-oriented workflows for non-trivial RTL work: ask for testbenches, assertions, and interface assumptions explicitly.

Release Notes

This release is part of the local 9B hardening path for the TAIS-Coder mini project. The goal is not to mimic the closed Copilot Raptor stack exactly, but to produce an open practical coding assistant with stronger HDL, embedded, review, and tool-use behavior on local hardware.

The base model tags indicate Apache-2.0 licensing, and this release is intended to inherit the base model's applicable license terms.

Downloads last month: 2

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for tianrui6641/omnicoder_local9b_blackwell_coregen_hdlfix_v2_hf_r64_epoch1-merged-bf16

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

Tesslate/OmniCoder-9B

Finetuned

armand0e/OmniCoder-9B-Claude-Opus-High-Reasoning-Distill

Adapter

(1)

this model