Qwen3-0.6B โ Fine-tuned with GRPO on AIREN Incident Response
Trained using OpenEnv + TRL GRPO on the AIREN Incident Response environment as part of the Meta PyTorch OpenEnv Hackathon.
Training Details
- Environment:
airen_env - Algorithm: GRPO (Group Relative Policy Optimization)
- Episodes: 16
- Reward: Dense multi-objective (recovery + diagnosis + efficiency)
Environment
- HF Space:
https://huggingface.co/spaces/amulyalakku/airen-env
- Downloads last month
- 41