name: FrontierLabs-Env version: "1.0.0" description: > A strictly deterministic, OpenEnv-compliant simulation sandbox that drops an AI agent into a failing PyTorch/GPU supercomputing environment. The agent must autonomously act as a Principal AI Infrastructure Engineer: hunting poisoned data, fixing Out-Of-Memory (OOM) cluster crashes, and writing low-level hardware kernels to accelerate training. author: FrontierLabs Team tags: - openenv - ai-infrastructure - gpu-computing - data-security - pytorch - triton homepage: https://huggingface.co/spaces/frontierlabs/FrontierLabs-Env api_version: "1.0" observation: type: object description: Current state of the infrastructure environment visible to the agent properties: task_id: type: string description: Current active task identifier step: type: integer description: Current step number in the episode done: type: boolean description: Whether the episode has ended message: type: string description: Human-readable description of the current environment state files: type: object description: Map of filename to content for files available on the simulated filesystem metrics: type: object description: Live performance metrics (latency, memory usage, GPU utilization) partial_score: type: number description: Running partial score [0.0, 1.0] for the current episode action: type: object description: An action taken by the agent in the environment properties: action_type: type: string enum: [write_file, run_script, submit] description: > write_file: Write code/content to a named file on the simulated filesystem. run_script: Execute a named script already on the filesystem. submit: Mark the task as complete and trigger grading. filename: type: string description: Target filename (required for write_file and run_script) content: type: string description: File content to write (required for write_file) reward: type: object description: Reward signal returned after each step properties: value: type: number description: Step reward in range [-1.0, 1.0] explanation: type: string description: Human-readable explanation of the reward signal tasks: - id: task1_security_audit name: Security Audit & Self-Evaluation difficulty: easy description: > A dataset.jsonl file has been infected with 50 malicious backdoor prompts. Write a detection script to clean it and save as cleaned_dataset.jsonl. Then write an evaluation script to compare against the golden baseline and output a metrics_report.json with precision, recall, and F1 score. max_steps: 20 success_threshold: 0.8 - id: task2_fsdp_cluster name: Distributed Cluster Crash (FSDP) difficulty: medium description: > The training cluster is crashing with CUDA Out-of-Memory errors because train.py loads the full model onto a single GPU. Rewrite train.py to use PyTorch Fully Sharded Data Parallel (FSDP) across 8 simulated GPUs, reducing peak memory per GPU below the 40GB threshold. max_steps: 25 success_threshold: 0.8 - id: task3_triton_kernel name: Triton Hardware Bottleneck difficulty: hard description: > Severe latency (150ms/step) is detected due to a math function reading/writing GPU memory too many times. Write an OpenAI Triton kernel (@triton.jit) that fuses the SiLU activation and element-wise multiply into a single kernel, eliminating redundant memory round-trips. Target: < 20ms/step. max_steps: 30 success_threshold: 0.8 endpoints: reset: POST /reset step: POST /step state: GET /state tasks: GET /tasks grader: GET /grader baseline: POST /baseline