Spaces:

aryxn323
/

FrontierLabs-Env

Sleeping

App Files Files Community

FrontierLabs-Env / openenv.yaml

aryxn323

Initial commit for OpenEnv submission

ebb85af about 1 month ago

raw

history blame contribute delete

3.9 kB

	name: FrontierLabs-Env
	version: "1.0.0"
	description: >
	A strictly deterministic, OpenEnv-compliant simulation sandbox that drops an AI agent
	into a failing PyTorch/GPU supercomputing environment. The agent must autonomously act
	as a Principal AI Infrastructure Engineer: hunting poisoned data, fixing Out-Of-Memory
	(OOM) cluster crashes, and writing low-level hardware kernels to accelerate training.
	author: FrontierLabs Team
	tags:
	- openenv
	- ai-infrastructure
	- gpu-computing
	- data-security
	- pytorch
	- triton

	homepage: https://huggingface.co/spaces/frontierlabs/FrontierLabs-Env
	api_version: "1.0"

	observation:
	type: object
	description: Current state of the infrastructure environment visible to the agent
	properties:
	task_id:
	type: string
	description: Current active task identifier
	step:
	type: integer
	description: Current step number in the episode
	done:
	type: boolean
	description: Whether the episode has ended
	message:
	type: string
	description: Human-readable description of the current environment state
	files:
	type: object
	description: Map of filename to content for files available on the simulated filesystem
	metrics:
	type: object
	description: Live performance metrics (latency, memory usage, GPU utilization)
	partial_score:
	type: number
	description: Running partial score [0.0, 1.0] for the current episode

	action:
	type: object
	description: An action taken by the agent in the environment
	properties:
	action_type:
	type: string
	enum: [write_file, run_script, submit]
	description: >
	write_file: Write code/content to a named file on the simulated filesystem.
	run_script: Execute a named script already on the filesystem.
	submit: Mark the task as complete and trigger grading.
	filename:
	type: string
	description: Target filename (required for write_file and run_script)
	content:
	type: string
	description: File content to write (required for write_file)

	reward:
	type: object
	description: Reward signal returned after each step
	properties:
	value:
	type: number
	description: Step reward in range [-1.0, 1.0]
	explanation:
	type: string
	description: Human-readable explanation of the reward signal

	tasks:
	- id: task1_security_audit
	name: Security Audit & Self-Evaluation
	difficulty: easy
	description: >
	A dataset.jsonl file has been infected with 50 malicious backdoor prompts.
	Write a detection script to clean it and save as cleaned_dataset.jsonl.
	Then write an evaluation script to compare against the golden baseline and
	output a metrics_report.json with precision, recall, and F1 score.
	max_steps: 20
	success_threshold: 0.8

	- id: task2_fsdp_cluster
	name: Distributed Cluster Crash (FSDP)
	difficulty: medium
	description: >
	The training cluster is crashing with CUDA Out-of-Memory errors because
	train.py loads the full model onto a single GPU. Rewrite train.py to use
	PyTorch Fully Sharded Data Parallel (FSDP) across 8 simulated GPUs,
	reducing peak memory per GPU below the 40GB threshold.
	max_steps: 25
	success_threshold: 0.8

	- id: task3_triton_kernel
	name: Triton Hardware Bottleneck
	difficulty: hard
	description: >
	Severe latency (150ms/step) is detected due to a math function reading/writing
	GPU memory too many times. Write an OpenAI Triton kernel (@triton.jit) that
	fuses the SiLU activation and element-wise multiply into a single kernel,
	eliminating redundant memory round-trips. Target: < 20ms/step.
	max_steps: 30
	success_threshold: 0.8

	endpoints:
	reset: POST /reset
	step: POST /step
	state: GET /state
	tasks: GET /tasks
	grader: GET /grader
	baseline: POST /baseline