Qwen3-0.6B โ€” Fine-tuned with GRPO on AIREN Incident Response

Trained using OpenEnv + TRL GRPO on the AIREN Incident Response environment as part of the Meta PyTorch OpenEnv Hackathon.

Training Details

  • Environment: airen_env
  • Algorithm: GRPO (Group Relative Policy Optimization)
  • Episodes: 16
  • Reward: Dense multi-objective (recovery + diagnosis + efficiency)

Environment

  • HF Space: https://huggingface.co/spaces/amulyalakku/airen-env
Downloads last month
41
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Video Preview
loading