Safetensors
sft
tool-calling
agent
mistral
lora

Mistral Small Agent-Diff

LoRA fine-tune of Ministral-3-14B-Instruct-2512-BF16 for API tool-calling tasks across Box, Google Calendar, Linear, and Slack.

Results

Evaluated on agent-diff-bench (45 tasks, test split).

Per-example average reward (higher is better):

Config Reward Error Rate
LoRA ep5 t=0.5 0.356 22.2%
LoRA ep4 t=0.7 0.341 21.6%
Base t=0.5 0.322 28.4%
Base t=0.7 0.220 27.8%

Grand means (per-example average, all rollouts pooled):

  • All LoRA configs: 0.350
  • All Base configs: 0.282
  • Delta: +0.068

Best-of per example:

  • Best LoRA: 0.454
  • Best Base: 0.362
  • Delta: +0.092

Per-Service Breakdown

Service LoRA ep5 t=0.5 Base t=0.5 Delta
Box 0.266 0.100 +0.166
Calendar 0.453 0.369 +0.084
Linear 0.317 0.142 +0.175
Slack 0.435 0.452 -0.017

Head-to-head (best LoRA vs best Base per example): LoRA wins 14, Base wins 5, Tied 15

Training

Data

  • Source: Devstral-2512 rollouts on agent-diff-bench, filtered for reward > 0.8
  • Processing pipeline:
    1. Native formatting (0 missing content, 0 consecutive assistant issues)
    2. Command flattening (multi-line curl to single-line, reduced from 44% to 6%)
    3. Error turn removal (failed API call + error response pairs removed, error rate 20% to 1.8%)
    4. Prompt-level train/val split (0% leakage)
  • Final dataset: 361 rows, 142 unique prompts, ~2.5 rollouts per prompt
  • Dataset: hubertmarek/mistral-large-agent-diff-sft-mixed-old-plus-devstral-r0p8-64k

Hyperparameters

SFTConfig(
    num_train_epochs=8,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=6,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.08,
    bf16=True,
    max_length=64000,
    optim="adamw_torch_fused",
    gradient_checkpointing=True,
    save_strategy="epoch",
    packing=False,
)

LoRA Config

  • Rank: 64
  • Alpha: 128
  • Target modules: all linear layers

Inference

vLLM

export HF_TOKEN='your_token'
vllm serve mistralai/Ministral-3-14B-Instruct-2512 \
  --tokenizer_mode mistral --config_format mistral --load_format mistral \
  --enable-lora \
  --lora-modules agent-diff=ministral-3-14b-agent-diff-sft-lora \
  --enable-auto-tool-choice --tool-call-parser mistral \
  --max-model-len 64000 \
  --max-lora-rank 64 \
  --enforce-eager

Evaluation

prime eval hubert-marek/agent-diff-bench \
  -m agent-diff \
  --api-base-url http://localhost:8000/v1 \
  -n -1 -r 3 -c 15 \
  --max-retries 20 \
  --env-args '{"agentdiff_api_key": "YOUR_KEY"}' \
  --save-results \
  --temperature 0.5

Checkpoints

This repo contains multiple epochs as commits:

  • Epoch 3 (checkpoint-183): Recommended starting point
  • Epoch 5 (checkpoint-305): Best benchmark performance
  • Epoch 8 (checkpoint-488): Overfitting
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hubertmarek/Ministral-3-14B-Agent-Diff-SFT-LoRA

Datasets used to train hubertmarek/Ministral-3-14B-Agent-Diff-SFT-LoRA