amulyalakku
/

airen-grpo

Reinforcement Learning

incident-response

Model card Files Files and versions

Qwen3-0.6B — Fine-tuned with GRPO on AIREN Incident Response

Trained using OpenEnv + TRL GRPO on the AIREN Incident Response environment as part of the Meta PyTorch OpenEnv Hackathon.

Training Details

Environment: airen_env
Algorithm: GRPO (Group Relative Policy Optimization)
Episodes: 16
Reward: Dense multi-objective (recovery + diagnosis + efficiency)

Environment

HF Space: https://huggingface.co/spaces/amulyalakku/airen-env

Downloads last month: 41

Safetensors

Model size

0.6B params

Tensor type

F32

·

Video Preview

Reinforcement Learning

loading