This model fine‑tunes ModernBERT‑base using LoRA (Low‑Rank Adaptation) for efficient parameter‑tuning.
It is designed for binary classification tasks where high recall and controlled false positive rates are important.
Training Configuration
- Seed: 42 (ensures reproducibility)
- Batch sizes: Train = 128, Eval = 256
- Max sequence length: 256
- Epochs: 1 (baseline run)
- Learning rate: 3e‑4
- Weight decay: 0.01
- Warmup ratio: 0.05
- Gradient clipping: 1.0
- Early stopping patience: 3
- Steps: 5,241
LoRA Setup
- Enabled: Yes
- Rank (r): 8
- Alpha: 16
- Dropout: 0.05
- Target modules: Attention (Wqkv, Wo) and MLP (Wi, Wo) layers
- Max drift ratio: 0.1
LoRA adapters allow efficient fine‑tuning by updating only small low‑rank matrices, reducing memory and compute requirements.
Loss Function
Training uses Asymmetric Focal Loss, which emphasizes hard negatives while keeping positive weighting mild. This helps balance recall and false positive rate.
- Gamma_pos: 0.0 (minimal emphasis on positives)
- Gamma_neg: 4.0 (stronger emphasis on negatives)
- Clip: 0.05 (stability for probabilities)
Validation is performed every 5000 steps, with early stopping to prevent overfitting.
Usage
Usage:
import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
from peft import PeftModel
# Base ModernBERT model
base_model_name = "answerdotai/ModernBERT-base"
# LoRA adapter checkpoint
adapter_model_name = "AINovice2005/ModernBERT-base-lora-cicflow-1m-r8"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Load base masked language model
base_model = AutoModelForMaskedLM.from_pretrained(base_model_name)
# Attach LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_model_name)
# Move to device
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Build fill-mask pipeline
fill_mask = pipeline(
"fill-mask",
model=model,
tokenizer=tokenizer,
device=0 if device == "cuda" else -1
)
# Example usage
text = "The network traffic shows a [MASK] pattern."
outputs = fill_mask(text)
for o in outputs:
print(f"Token: {o['token_str']}, Score: {o['score']:.4f}")
Intended Use
- Binary classification tasks where recall is critical.
- Efficient fine‑tuning scenarios with limited compute resources.
- Research and experimentation with parameter‑efficient methods.
Artifacts:
- LoRA adapter
- Training configuration and evaluation logs
- Downloads last month
- 42
Model tree for DollarSign/ModernBERT-base-lora-cicflow-1m-r8
Base model
answerdotai/ModernBERT-base