Azure Advisor Qwen2.5-0.5B (SFT)
Fine-tuned Qwen/Qwen2.5-0.5B-Instruct to generate Azure Advisor-style recommendations using Supervised Fine-Tuning (SFT).
Model Description
This model has been trained to analyze Azure workload configurations and generate structured recommendations across 5 Azure Advisor categories:
- Cost - Cost optimization recommendations
- Security - Security posture improvements
- Performance - Performance optimization suggestions
- OperationalExcellence - Operational best practices
- HighAvailability - Reliability and availability improvements
Training Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-0.5B-Instruct |
| Method | SFT with LoRA adapters |
| Dataset | thegovind/azure-advisor-sft (348 train, 41 eval) |
| Training Steps | 200 |
| Learning Rate | 2e-4 (cosine schedule) |
| LoRA Rank / Alpha | 16 / 32 |
| Quantization | 4-bit QLoRA (NF4) |
| Hardware | NVIDIA RTX 3090 (24GB) |
| Training Time | ~5 minutes |
Training Metrics
| Metric | Value |
|---|---|
| Pre-SFT Baseline | 0.80/10 |
| Post-SFT Score | 3.72/10 |
| Improvement | +2.92 |
| Final Training Loss | 0.029 |
| Final Eval Loss | 0.035 |
Loss Trajectory
1.76 -> 1.06 -> 0.47 -> 0.24 -> 0.13 -> 0.064 -> 0.050 -> 0.044 -> 0.038 -> 0.034 -> 0.029
Evaluation (5 Reward Functions, max 10.0)
| Function | Weight | Description |
|---|---|---|
| Format Compliance | 1.5 | Correct XML tags and JSON structure |
| Category Correctness | 2.0 | Valid Advisor categories |
| Grounding Quality | 2.0 | Claims supported by input evidence |
| Actionability | 2.0 | Concrete, feasible next steps |
| Completeness | 2.5 | Coverage of issues with proper schema |
Output Format
The model generates structured output with:
<ANALYSIS>- Reasoning about the workload state<RECOMMENDATIONS>- JSON array of recommendation objects<SUMMARY>- Brief summary of key recommendations
Each recommendation includes: category, impact, resourceId, problem, solution, potentialBenefits, evidence, nextSteps, confidence.
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-0.5B-Instruct",
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "thegovind/azure-advisor-qwen25-0.5b")
tokenizer = AutoTokenizer.from_pretrained("thegovind/azure-advisor-qwen25-0.5b")
messages = [
{"role": "system", "content": "You are an Azure Advisor assistant..."},
{"role": "user", "content": "Analyze this Azure workload and provide recommendations..."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
W&B Training Dashboard
- SFT Run: wandb.ai/thegovind/azure-advisor-model/runs/quzg7fgs
- Project: wandb.ai/thegovind/azure-advisor-model
Related Resources
- GRPO Model: thegovind/azure-advisor-qwen25-0.5b-grpo - Further refined with reward-based GRPO training
- SFT Dataset: thegovind/azure-advisor-sft - 410 training examples across 15 scenario types
- GRPO Benchmark: thegovind/azure-advisor-grpo-benchmark - 106 evaluation examples with ground truth