CVE Backport Codegen v5 — Qwen2.5-Coder-32B QLoRA
Fine-tuned code generation model for backporting upstream CVE security fixes to older SUSE/openSUSE package versions. Given vulnerable source code and an upstream fix description, the model outputs the corrected code. A separate tool then diffs the output against the original to produce a patch.
This is a per-hunk code generation approach: the model sees one region of source code at a time and returns the fixed version, rather than generating raw unified diffs. This yields higher accuracy than patch-format models because the model works in its natural domain (code) rather than a meta-format (diffs).
MoE sibling now available: anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b reaches 91.9% recall on the same n=100 eval (within 1.2 pt of this model) while running ~10× faster at inference, thanks to Qwen3-Coder-30B-A3B's sparse 3B-active MoE architecture. Same training data, same config style, trained in 1/5 the wall time on a single H100.
What's New in v5
v5 uses a unified codegen-only dataset — all 36,166 training examples follow the same 3-turn format (system / user with code + fix description / assistant with fixed code). v4 mixed in 5-turn test-generation examples; v5 drops those to focus entirely on codegen quality.
| Metric | v5 | v4 | v1 |
|---|---|---|---|
| Recall | 93.1% | 93% | 91% |
| Precision | 94.4% | 95% | — |
| Exact match | 83/100 | 87/100 | — |
| Adapted recall | 90.0% | 86% | 71% |
| Identical recall | 93.7% | 94% | 94% |
Adapted-tier recall has steadily improved: 71% (v1) → 86% (v4) → 90% (v5). The codegen-only dataset gives the model a cleaner training signal for the core task.
Model Details
| Base model | Qwen/Qwen2.5-Coder-32B-Instruct |
| Method | QLoRA (4-bit NF4, double quantization, bf16 compute) |
| LoRA rank / alpha | 64 / 128 |
| LoRA dropout | 0.05 |
| LoRA targets | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training data | 36,166 train / 1,834 eval examples |
| Epochs | 2 (8,228 steps) |
| Effective batch size | 8 (1 × grad_accum 8) |
| Learning rate | 1e-4 (cosine schedule, 5% warmup) |
| Max sequence length | 4,096 tokens |
| Optimizer | AdamW fused, weight decay 0.01 |
| Hardware | 2× NVIDIA H100 NVL 94GB |
| Training time | 46.1 hours |
| Train loss (avg) | 0.0215 |
| Eval loss (final) | 0.00602 |
| PEFT version | 0.18.1 |
Files
This repository contains:
- LoRA adapter (
adapter_model.safetensors,adapter_config.json) — merge with the base model using PEFT - GGUF Q8_0 (
cve-backport-codegen-v5-q8_0.gguf, 33GB) — ready for llama.cpp / ollama
Reproduction via Teapot
This model was trained via the teapot training pipeline. The full reproduction is a four-command sequence once the cve-backport dataset is prepared:
git clone https://github.com/anicka-net/teapot
cd teapot
pip install -e .
# 1. Compose training data from the cve-backport module
teapot compose configs/cve-backport.config \
--output train-cve-backport.jsonl
# 2. Generate the QLoRA-HF launch script
teapot train configs/cve-backport.config \
--backend qlora-hf \
--train-data train-cve-backport.jsonl \
--eval-data eval-cve-backport.jsonl \
--output train-cve-backport.sh
# 3. Train (2× H100 NVL 94GB; ~46 hours)
bash train-cve-backport.sh
# 4. Final adapter is at output-teapot-cve-backport/final/
The teapot config (configs/cve-backport.config) pins all the hyperparameters:
method: qlora, epochs: 2, lr: 1e-4, batch_size: 1, gradient_accumulation: 8,
lora_r: 64, lora_alpha: 128, max_length: 4096, warmup_ratio: 0.05,
hardware.gpus: 2. See the config file in the teapot repo for the full
declaration.
The qlora-hf backend invokes python3 -m teapot.train_qlora_hf, which is
a thin wrapper over the HuggingFace Trainer with bitsandbytes 4-bit
quantization and PEFT LoRA. Training data is composed from the
cve-backport-codegen-dataset
HF repo (the domain/cve-backport teapot module fetches it automatically).
Evaluation
Evaluated on 100 held-out examples (zero CVE overlap with training) using the Q8_0 GGUF served via llama-server (temperature=0, ctx=8192).
Overall
| Metric | Value |
|---|---|
| Avg recall | 93.1% |
| Avg precision | 94.4% |
| Exact match | 83/100 |
| Perfect (100% recall) | 90/100 |
| Failures (0% recall) | 3/100 |
By Tier
| Tier | Count | Avg Recall | Perfect |
|---|---|---|---|
| Identical (upstream applies as-is) | 85 | 93.7% | 77/85 |
| Adapted (requires modification) | 15 | 90.0% | 13/15 |
Failure Analysis
The 3 zero-recall cases are all complex libvirt patches (multi-function adaptations across large files with significant structural differences between versions). These are known hard cases that likely require an agentic approach with source tree context.
Training Data
The v5 dataset contains real SUSE/openSUSE maintenance patches paired with their upstream CVE fixes, converted to a per-hunk codegen format:
- 36,166 train + 1,834 eval examples (strict CVE-level split, zero overlap)
- All examples use a 3-turn ChatML format (system / user / assistant)
- Per-hunk extraction with 15-line context padding, nearby hunks merged
- Covers C, C++, Python, shell, Java, JavaScript, Go, and more
- Sources: openSUSE Build Service maintenance incidents
Input Format
## File: path/to/file.c
## Lines: 100-130
```c
/* 15 lines before the change */
vulnerable_code_here();
/* 15 lines after the change */
Fix
Description of what the upstream patch changes in this region.
### Output Format
The model outputs the fixed version of the code region (just the code,
no diff headers or markup).
## Usage
### With llama.cpp / llama-server (GGUF)
```bash
llama-server \
--model cve-backport-codegen-v5-q8_0.gguf \
--port 8403 \
--n-gpu-layers 99 \
--ctx-size 8192
With the CVE Backport Tool
The recommended way to use this model is via the cve-backport-tool, which handles patch parsing, source extraction, model inference, and diff generation:
python3 cve-backport.py \
--cve CVE-2024-1234 \
--package openssl-1.1.1d \
--patch upstream.patch \
--source-dir /path/to/source/ \
--backend openai \
--retry 3
With transformers + PEFT (adapter)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-32B-Instruct",
torch_dtype="bfloat16",
device_map="auto",
)
model = PeftModel.from_pretrained(base, "anicka/cve-backport-codegen-v5-qwen25-32b")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")
Prompt Template (ChatML)
<|im_start|>system
You are a security patch backporting assistant.
Given vulnerable source code and a description of the upstream fix, output the FIXED version of the code.
Rules:
- Output ONLY the fixed code, nothing else
- Preserve all surrounding context exactly
- Apply only the described fix
<|im_end|>
<|im_start|>user
## File: crypto/bn/bn.h
## Lines: 280-310
```c
/* source code region */
Fix
Add bounds check for BN_num_bits to prevent buffer over-read. <|im_end|> <|im_start|>assistant
## Limitations
- **Best at identical-tier patches** (upstream fix applies directly) — 93.7% recall
- **Good at adapted patches** (90% recall) but complex multi-function adaptations
across structurally different versions remain challenging
- **Context window**: 4,096 token training limit means very large functions or
multi-file patches may be truncated
- **No compilation feedback**: the model generates code in a single pass without
verifying it compiles. Use `--retry` in the CLI tool for iterative correction.
- Always review generated patches before applying to production systems
## Related
- **MoE sibling**: [anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b](https://huggingface.co/anicka/cve-backport-codegen-v5-qwen3-coder-30b-a3b) — Qwen3-Coder-30B-A3B (3B active, MoE), 91.9% recall on the same n=100 eval, ~10× faster inference
- **openSUSE mirror**: [openSUSE/CVE-Backport-Qwen2.5-Coder-32B](https://huggingface.co/openSUSE/CVE-Backport-Qwen2.5-Coder-32B)
- **CLI tool**: [openSUSE/cve-backport-tool](https://github.com/openSUSE/cve-backport-tool)
- **Dataset**: [anicka/cve-backport-codegen-dataset](https://huggingface.co/datasets/anicka/cve-backport-codegen-dataset)
- **Training pipeline**: [teapot](https://github.com/anicka-net/teapot)
- **Previous version (v1)**: [anicka/cve-backport-codegen-qwen25-32b-v1](https://huggingface.co/anicka/cve-backport-codegen-qwen25-32b-v1)
## Citation
```bibtex
@misc{cve-backport-codegen-v5,
title={CVE Backport Codegen v5: Fine-tuned Qwen2.5-Coder-32B for Security Patch Backporting},
author={Anna Maresova},
year={2026},
url={https://huggingface.co/anicka/cve-backport-codegen-v5-qwen25-32b}
}
- Downloads last month
- 169
8-bit
Model tree for anicka/cve-backport-codegen-v5-qwen25-32b
Base model
Qwen/Qwen2.5-32BDataset used to train anicka/cve-backport-codegen-v5-qwen25-32b
Evaluation results
- Recall on CVE Backport Codegen Datasetself-reported0.931
- Precision on CVE Backport Codegen Datasetself-reported0.944
- Exact Match on CVE Backport Codegen Datasetself-reported0.830