CVE Backport Codegen v5 — Qwen2.5-Coder-32B QLoRA

Fine-tuned code generation model for backporting upstream CVE security fixes to older SUSE/openSUSE package versions. Given vulnerable source code and an upstream fix description, the model outputs the corrected code. A separate tool then diffs the output against the original to produce a patch.

This is a per-hunk code generation approach: the model sees one region of source code at a time and returns the fixed version, rather than generating raw unified diffs. This yields higher accuracy than patch-format models because the model works in its natural domain (code) rather than a meta-format (diffs).

MoE sibling now available: anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b reaches 91.9% recall on the same n=100 eval (within 1.2 pt of this model) while running ~10× faster at inference, thanks to Qwen3-Coder-30B-A3B's sparse 3B-active MoE architecture. Same training data, same config style, trained in 1/5 the wall time on a single H100.

What's New in v5

v5 uses a unified codegen-only dataset — all 36,166 training examples follow the same 3-turn format (system / user with code + fix description / assistant with fixed code). v4 mixed in 5-turn test-generation examples; v5 drops those to focus entirely on codegen quality.

Metric v5 v4 v1
Recall 93.1% 93% 91%
Precision 94.4% 95%
Exact match 83/100 87/100
Adapted recall 90.0% 86% 71%
Identical recall 93.7% 94% 94%

Adapted-tier recall has steadily improved: 71% (v1) → 86% (v4) → 90% (v5). The codegen-only dataset gives the model a cleaner training signal for the core task.

Model Details

Base model Qwen/Qwen2.5-Coder-32B-Instruct
Method QLoRA (4-bit NF4, double quantization, bf16 compute)
LoRA rank / alpha 64 / 128
LoRA dropout 0.05
LoRA targets q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training data 36,166 train / 1,834 eval examples
Epochs 2 (8,228 steps)
Effective batch size 8 (1 × grad_accum 8)
Learning rate 1e-4 (cosine schedule, 5% warmup)
Max sequence length 4,096 tokens
Optimizer AdamW fused, weight decay 0.01
Hardware 2× NVIDIA H100 NVL 94GB
Training time 46.1 hours
Train loss (avg) 0.0215
Eval loss (final) 0.00602
PEFT version 0.18.1

Files

This repository contains:

  • LoRA adapter (adapter_model.safetensors, adapter_config.json) — merge with the base model using PEFT
  • GGUF Q8_0 (cve-backport-codegen-v5-q8_0.gguf, 33GB) — ready for llama.cpp / ollama

Reproduction via Teapot

This model was trained via the teapot training pipeline. The full reproduction is a four-command sequence once the cve-backport dataset is prepared:

git clone https://github.com/anicka-net/teapot
cd teapot
pip install -e .

# 1. Compose training data from the cve-backport module
teapot compose configs/cve-backport.config \
    --output train-cve-backport.jsonl

# 2. Generate the QLoRA-HF launch script
teapot train configs/cve-backport.config \
    --backend qlora-hf \
    --train-data train-cve-backport.jsonl \
    --eval-data eval-cve-backport.jsonl \
    --output train-cve-backport.sh

# 3. Train (2× H100 NVL 94GB; ~46 hours)
bash train-cve-backport.sh

# 4. Final adapter is at output-teapot-cve-backport/final/

The teapot config (configs/cve-backport.config) pins all the hyperparameters: method: qlora, epochs: 2, lr: 1e-4, batch_size: 1, gradient_accumulation: 8, lora_r: 64, lora_alpha: 128, max_length: 4096, warmup_ratio: 0.05, hardware.gpus: 2. See the config file in the teapot repo for the full declaration.

The qlora-hf backend invokes python3 -m teapot.train_qlora_hf, which is a thin wrapper over the HuggingFace Trainer with bitsandbytes 4-bit quantization and PEFT LoRA. Training data is composed from the cve-backport-codegen-dataset HF repo (the domain/cve-backport teapot module fetches it automatically).

Evaluation

Evaluated on 100 held-out examples (zero CVE overlap with training) using the Q8_0 GGUF served via llama-server (temperature=0, ctx=8192).

Overall

Metric Value
Avg recall 93.1%
Avg precision 94.4%
Exact match 83/100
Perfect (100% recall) 90/100
Failures (0% recall) 3/100

By Tier

Tier Count Avg Recall Perfect
Identical (upstream applies as-is) 85 93.7% 77/85
Adapted (requires modification) 15 90.0% 13/15

Failure Analysis

The 3 zero-recall cases are all complex libvirt patches (multi-function adaptations across large files with significant structural differences between versions). These are known hard cases that likely require an agentic approach with source tree context.

Training Data

The v5 dataset contains real SUSE/openSUSE maintenance patches paired with their upstream CVE fixes, converted to a per-hunk codegen format:

  • 36,166 train + 1,834 eval examples (strict CVE-level split, zero overlap)
  • All examples use a 3-turn ChatML format (system / user / assistant)
  • Per-hunk extraction with 15-line context padding, nearby hunks merged
  • Covers C, C++, Python, shell, Java, JavaScript, Go, and more
  • Sources: openSUSE Build Service maintenance incidents

Input Format

## File: path/to/file.c
## Lines: 100-130

```c
/* 15 lines before the change */
vulnerable_code_here();
/* 15 lines after the change */

Fix

Description of what the upstream patch changes in this region.


### Output Format

The model outputs the fixed version of the code region (just the code,
no diff headers or markup).

## Usage

### With llama.cpp / llama-server (GGUF)

```bash
llama-server \
    --model cve-backport-codegen-v5-q8_0.gguf \
    --port 8403 \
    --n-gpu-layers 99 \
    --ctx-size 8192

With the CVE Backport Tool

The recommended way to use this model is via the cve-backport-tool, which handles patch parsing, source extraction, model inference, and diff generation:

python3 cve-backport.py \
    --cve CVE-2024-1234 \
    --package openssl-1.1.1d \
    --patch upstream.patch \
    --source-dir /path/to/source/ \
    --backend openai \
    --retry 3

With transformers + PEFT (adapter)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-32B-Instruct",
    torch_dtype="bfloat16",
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "anicka/cve-backport-codegen-v5-qwen25-32b")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")

Prompt Template (ChatML)

<|im_start|>system
You are a security patch backporting assistant.

Given vulnerable source code and a description of the upstream fix, output the FIXED version of the code.

Rules:
- Output ONLY the fixed code, nothing else
- Preserve all surrounding context exactly
- Apply only the described fix
<|im_end|>
<|im_start|>user
## File: crypto/bn/bn.h
## Lines: 280-310

```c
/* source code region */

Fix

Add bounds check for BN_num_bits to prevent buffer over-read. <|im_end|> <|im_start|>assistant


## Limitations

- **Best at identical-tier patches** (upstream fix applies directly) — 93.7% recall
- **Good at adapted patches** (90% recall) but complex multi-function adaptations
  across structurally different versions remain challenging
- **Context window**: 4,096 token training limit means very large functions or
  multi-file patches may be truncated
- **No compilation feedback**: the model generates code in a single pass without
  verifying it compiles. Use `--retry` in the CLI tool for iterative correction.
- Always review generated patches before applying to production systems

## Related

- **MoE sibling**: [anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b](https://huggingface.co/anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b) — Qwen3-Coder-30B-A3B (3B active, MoE), 91.9% recall on the same n=100 eval, ~10× faster inference
- **openSUSE mirror**: [openSUSE/CVE-Backport-Qwen2.5-Coder-32B](https://huggingface.co/openSUSE/CVE-Backport-Qwen2.5-Coder-32B)
- **CLI tool**: [openSUSE/cve-backport-tool](https://github.com/openSUSE/cve-backport-tool)
- **Dataset**: [anicka/cve-backport-codegen-dataset](https://huggingface.co/datasets/anicka/cve-backport-codegen-dataset)
- **Training pipeline**: [teapot](https://github.com/anicka-net/teapot)
- **Previous version (v1)**: [anicka/cve-backport-codegen-qwen25-32b-v1](https://huggingface.co/anicka/cve-backport-codegen-qwen25-32b-v1)

## Citation

```bibtex
@misc{cve-backport-codegen-v5,
  title={CVE Backport Codegen v5: Fine-tuned Qwen2.5-Coder-32B for Security Patch Backporting},
  author={Anna Maresova},
  year={2026},
  url={https://huggingface.co/anicka/cve-backport-codegen-v5-qwen25-32b}
}
Downloads last month
169
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anicka/cve-backport-codegen-v5-qwen25-32b

Base model

Qwen/Qwen2.5-32B
Adapter
(93)
this model

Dataset used to train anicka/cve-backport-codegen-v5-qwen25-32b

Evaluation results