Pratap-K's picture
Meta Content Moderation Env
0ee3210

Meta Content Moderation Environment - API Documentation

This API powers the meta-content-moderation-env benchmark for OpenEnv autonomous agents.

Endpoints

1. POST /reset

Initializes or resets a specific moderation task episode.

Request Body: application/json

{
  "task": "single-label-classify",
  "seed": 42
}

Response: 200 OK (Returns the initial Observation schema)

{
  "step": 0,
  "task_name": "single-label-classify",
  "instructions": "Determine the highest severity violation...",
  "content_item": {
    "content_id": "post_1",
    "content_type": "text_post",
    "text": "User generated text...",
    "media_urls": [],
    "media_types": []
  },
  "thread_history": [],
  "conflicting_policies": [],
  "policy_excerpt": "Standard policies..."
}

2. POST /step

Submits a moderation action to the environment.

Request Body: application/json (ModerationDecision)

{
  "content_id": "post_1",
  "labels": ["hate_speech"],
  "action": "remove",
  "confidence": 0.95,
  "reasoning": "Direct attack on protected group.",
  "policy_citations": []
}

Response: 200 OK

{
  "observation": { ... next ContentItem ... },
  "reward": 0.85,
  "done": false,
  "info": {}
}

3. GET /state

Retrieves the current episode state, providing live telemetry of the metrics algorithm.

Response: 200 OK

{
  "score": 0.85,
  "metrics": {
    "total_rewards": 0.85,
    "steps": 1
  }
}