--- library_name: peft pipeline_tag: text-generation license: bigcode-openrail-m language: - code base_model: - bigcode/starcoder2-15b-instruct-v0.1 tags: - securecode - security - owasp - code-generation - secure-coding - lora - qlora - vulnerability-detection - cybersecurity datasets: - scthornton/securecode model-index: - name: starcoder2-15b-securecode results: [] --- # StarCoder2 15B SecureCode [![Parameters](https://img.shields.io/badge/parameters-15B-blue.svg)](#model-details) [![Dataset](https://img.shields.io/badge/dataset-2,185_examples-green.svg)](https://huggingface.co/datasets/scthornton/securecode) [![OWASP](https://img.shields.io/badge/OWASP-Top_10_2021_+_LLM_Top_10-red.svg)](#security-coverage) [![Method](https://img.shields.io/badge/method-QLoRA-purple.svg)](#training-details) [![License](https://img.shields.io/badge/license-BigCode_OpenRAIL--M-orange.svg)](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement) **Open-source flagship security-aware code generation model. Fine-tuned on 2,185 real-world vulnerability examples covering OWASP Top 10 2021 and OWASP LLM Top 10 2025.** [Dataset](https://huggingface.co/datasets/scthornton/securecode) | [Paper](https://huggingface.co/papers/2512.18542) | [Model Collection](https://huggingface.co/collections/scthornton/securecode) | [perfecXion.ai](https://perfecxion.ai) | [Blog Post](https://huggingface.co/blog/scthornton/securecode-models) --- ## What This Model Does StarCoder2 15B SecureCode generates security-aware code by teaching the model to recognize vulnerability patterns and produce secure implementations. Every training example includes: - **Real-world incident grounding** — Tied to documented CVEs and breach reports - **Vulnerable + secure implementations** — Side-by-side comparison - **Attack demonstrations** — Concrete exploit code - **Defense-in-depth guidance** — SIEM rules, logging, monitoring, infrastructure hardening --- ## Model Details | Property | Value | |----------|-------| | **Base Model** | [bigcode/starcoder2-15b-instruct-v0.1](https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1) | | **Parameters** | 15B | | **Architecture** | GPT-2 (StarCoder2) | | **Method** | QLoRA (4-bit quantization + LoRA) | | **LoRA Rank** | 16 | | **LoRA Alpha** | 32 | | **Training Data** | [scthornton/securecode](https://huggingface.co/datasets/scthornton/securecode) (2,185 examples) | | **Training Time** | ~1h 40min | | **Hardware** | 2x NVIDIA A100 40GB (GCP) | | **Framework** | PEFT 0.18.1, Transformers 5.1.0, PyTorch 2.7.1 | --- ## Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model + LoRA adapter base_model = AutoModelForCausalLM.from_pretrained( "bigcode/starcoder2-15b-instruct-v0.1", device_map="auto", load_in_4bit=True ) model = PeftModel.from_pretrained(base_model, "scthornton/starcoder2-15b-securecode") tokenizer = AutoTokenizer.from_pretrained("scthornton/starcoder2-15b-securecode") # Generate secure code prompt = "Write a secure JWT authentication handler in Python with proper token validation" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## Training Details | Hyperparameter | Value | |----------------|-------| | Learning Rate | 2e-4 | | Batch Size | 1 | | Gradient Accumulation | 16 | | Epochs | 3 | | Scheduler | Cosine | | Warmup Steps | 100 | | Optimizer | paged_adamw_8bit | | Max Sequence Length | 2048 | ### Dataset Breakdown | Component | Examples | Coverage | |-----------|----------|----------| | Web Security (OWASP Top 10:2021) | 1,378 | 12 languages, 9 frameworks | | AI/ML Security (OWASP LLM Top 10:2025) | 750 | Prompt injection, RAG poisoning, model theft | | Framework-Specific Additions | 219 | Django, Flask, Express, Spring Boot, etc. | | **Total** | **2,185** | **Complete OWASP coverage** | --- ## SecureCode Model Collection | Model | Parameters | Base | Training Time | Link | |-------|------------|------|---------------|------| | Llama 3.2 3B | 3B | Meta Llama 3.2 | 1h 5min | [scthornton/llama-3.2-3b-securecode](https://huggingface.co/scthornton/llama-3.2-3b-securecode) | | Qwen Coder 7B | 7B | Qwen 2.5 Coder | 1h 24min | [scthornton/qwen-coder-7b-securecode](https://huggingface.co/scthornton/qwen-coder-7b-securecode) | | CodeGemma 7B | 7B | Google CodeGemma | 1h 27min | [scthornton/codegemma-7b-securecode](https://huggingface.co/scthornton/codegemma-7b-securecode) | | DeepSeek Coder 6.7B | 6.7B | DeepSeek Coder | 1h 15min | [scthornton/deepseek-coder-6.7b-securecode](https://huggingface.co/scthornton/deepseek-coder-6.7b-securecode) | | CodeLlama 13B | 13B | Meta CodeLlama | 1h 32min | [scthornton/codellama-13b-securecode](https://huggingface.co/scthornton/codellama-13b-securecode) | | Qwen Coder 14B | 14B | Qwen 2.5 Coder | 1h 19min | [scthornton/qwen2.5-coder-14b-securecode](https://huggingface.co/scthornton/qwen2.5-coder-14b-securecode) | | **StarCoder2 15B** | **15B** | **BigCode StarCoder2** | **1h 40min** | **This model** | | Granite 20B | 20B | IBM Granite Code | 1h 19min | [scthornton/granite-20b-code-securecode](https://huggingface.co/scthornton/granite-20b-code-securecode) | --- ## Citation ```bibtex @misc{thornton2025securecode, title={SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models}, author={Thornton, Scott}, year={2025}, publisher={perfecXion.ai}, url={https://perfecxion.ai/articles/securecode-v2-dataset-paper.html}, note={Model: https://huggingface.co/scthornton/starcoder2-15b-securecode} } ``` --- ## Links - **Dataset**: [scthornton/securecode](https://huggingface.co/datasets/scthornton/securecode) (2,185 examples) - **Paper**: [SecureCode v2.0](https://huggingface.co/papers/2512.18542) - **Model Collection**: [SecureCode Models](https://huggingface.co/collections/scthornton/securecode) (8 models) - **Blog Post**: [Training Security-Aware Code Models](https://huggingface.co/blog/scthornton/securecode-models) - **Publisher**: [perfecXion.ai](https://perfecxion.ai) --- ## License BigCode OpenRAIL-M