OmniCoder-9B CoreGen HDLFix v2 Merged BF16

This repository contains the fully merged BF16 checkpoint for a local fine-tune of armand0e/OmniCoder-9B-Claude-Opus-High-Reasoning-Distill targeted at practical coding work with stronger HDL and embedded behavior.

The run behind this release is omnicoder_local9b_blackwell_coregen_hdlfix_v2_hf_r64_epoch1. It was built to fix the main issues seen in an earlier short-run adapter: too little effective dataset coverage, visible reasoning markup leaking into supervision, and weak coverage around bus and peripheral tasks such as Wishbone and SPI.

What Changed Relative to the Earlier Core-Generalist Run

  • The training run was changed from a short fixed-step pilot to a full epoch over the cleaned coregen_hdlfix_v2 prepared set.
  • Visible <think> and <answer> scaffolding was stripped out of supervision.
  • HDL prompt-output leakage from verilog-instruct style data was excluded from the user side of the conversation format.
  • HDL coverage was expanded with focused sources such as HDL-Instruct, expanded_rtlcoder, RTLLM, and verilogeval-v2-spec-to-rtl.
  • Bus and peripheral sampling was deliberately increased. In the prepared train split used for this run, the relaunch notes recorded about 19 wishbone mentions and about 78 HDL-side spi mentions.

Intended Use

This model is intended as a practical coding assistant with emphasis on:

  • RTL and HDL generation, explanation, and review
  • bus and peripheral oriented Verilog tasks
  • general code generation and patching
  • code review style feedback
  • embedded and firmware-adjacent tasks
  • basic tool-use and math-heavy coding prompts

It is a better fit for code-centric workflows than for general open-domain chat.

Training Data Summary

The merged model comes from a LoRA fine-tune on the prepared dataset prepared_omnicoder_mixed_data_coregen_hdlfix_v2.

  • Train examples: 3120
  • Eval examples: 428
  • Sequence length: 1024

The prepared mix included these retained train and eval counts:

Family Train Eval
HDL-Instruct including bus-focused sampling 544 72
expanded_rtlcoder including bus-focused sampling 320 40
CodeV-R1 320 40
CodeV-SVA 192 24
verilogeval spec-to-RTL 120 16
RTLLM 40 8
codefeedback_instruction 256 32
code_feedback 224 32
commitpackft 224 32
github_codereview 160 24
github_code 128 16
stm32_hal 160 24
electronics_stackexchange 96 16
arduino_stackexchange 64 12
iot_stackexchange 32 8
qwen_toolcalling 96 12
toolscale 48 8
nemotron_math 96 12

This is still a curated pilot-scale mix, not a frontier-scale training corpus. The purpose of the run was to harden the local 9B pipeline and correct the earlier HDL coverage failure, not to present a final benchmarked production release.

Training Configuration

  • Base model: armand0e/OmniCoder-9B-Claude-Opus-High-Reasoning-Distill
  • Training engine: hf
  • Device: cuda
  • LoRA rank: 64
  • Training dtype: torch.bfloat16
  • Base load mode: 4-bit
  • Learning rate: 7e-5
  • Gradient accumulation: 2
  • Train epochs: 1.0
  • Optimizer update steps: 1560
  • Batch size per device: 1
  • Trainable parameters: 116,391,936
  • Total parameters reported by the training summary: 5,841,364,208
  • Final eval loss: 0.6533

The merged release was exported as BF16 safetensors on GPU and written into four shards.

Files in This Repository

  • model-00001-of-00004.safetensors - about 4.603 GB
  • model-00002-of-00004.safetensors - about 4.645 GB
  • model-00003-of-00004.safetensors - about 4.615 GB
  • model-00004-of-00004.safetensors - about 3.664 GB
  • model.safetensors.index.json
  • tokenizer, processor, config, and chat template files
  • merge_summary.json

How to Use

This checkpoint follows the OmniCoder / Qwen3.5 multimodal stack and should be loaded with the same Transformers classes as the base model family.

import torch
from transformers import AutoModelForImageTextToText, AutoProcessor

model_id = "tianrui6641/omnicoder_local9b_blackwell_coregen_hdlfix_v2_hf_r64_epoch1-merged-bf16"

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Write a Wishbone-attached SPI controller in Verilog and explain the register map."}
        ],
    }
]

prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, return_tensors="pt")
inputs = {key: value.to(model.device) for key, value in inputs.items()}

output = model.generate(**inputs, max_new_tokens=512)
generated = output[:, inputs["input_ids"].shape[1]:]
print(processor.batch_decode(generated, skip_special_tokens=True)[0])

For local inference via GGUF or LM Studio, use the separate GGUF release.

Limitations

  • The model has not yet been validated against a full external benchmark harness for HDL, embedded development, code review, tool use, and math.
  • The main numeric result available for this run is held-out eval loss on the curated internal mix, not a broad public benchmark suite.
  • Although HDL behavior was explicitly improved, this is still not a replacement for verification flows, synthesis, linting, or formal checking.
  • The model can still hallucinate APIs, bus semantics, timing assumptions, or reset behavior.
  • Multimodal behavior is inherited from the base OmniCoder family, but this particular release was optimized primarily for coding tasks rather than general VLM use.

Recommended Usage Pattern

  • Use this merged BF16 repo if you want to continue fine-tuning, convert again, or run the original Transformers-format checkpoint.
  • Use the separate GGUF repo if you want a local inference package for LM Studio or llama.cpp style runtimes.
  • Prefer verification-oriented workflows for non-trivial RTL work: ask for testbenches, assertions, and interface assumptions explicitly.

Release Notes

This release is part of the local 9B hardening path for the TAIS-Coder mini project. The goal is not to mimic the closed Copilot Raptor stack exactly, but to produce an open practical coding assistant with stronger HDL, embedded, review, and tool-use behavior on local hardware.

The base model tags indicate Apache-2.0 licensing, and this release is intended to inherit the base model's applicable license terms.

Downloads last month
2
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tianrui6641/omnicoder_local9b_blackwell_coregen_hdlfix_v2_hf_r64_epoch1-merged-bf16