openhands-index / DATA_STRUCTURE.md
openhands
Update mock data and docs to match new openhands-index-results format
f7e6b58

OpenHands Index Data Structure

This document describes the expected data structure for the openhands-index-results GitHub repository.

Repository Structure

The data should be organized in the following structure:

openhands-index-results/
β”œβ”€β”€ 1.0.0-dev1/              # Version directory (matches CONFIG_NAME in config.py)
β”‚   β”œβ”€β”€ test.jsonl            # Test split results
β”‚   β”œβ”€β”€ validation.jsonl      # Validation split results
β”‚   β”œβ”€β”€ swe-bench.jsonl       # Individual benchmark results
β”‚   β”œβ”€β”€ swe-bench-multimodal.jsonl
β”‚   β”œβ”€β”€ swt-bench.jsonl
β”‚   β”œβ”€β”€ commit0.jsonl
β”‚   β”œβ”€β”€ gaia.jsonl
β”‚   └── agenteval.json        # Configuration file

File Formats

Agent Directory Structure

Each agent has its own directory containing two files:

metadata.json - Agent and model information:

{
  "agent_name": "OpenHands CodeAct",
  "agent_version": "v1.8.3",
  "model": "claude-4.5-opus",
  "openness": "closed_api_available",
  "country": "us",
  "tool_usage": "standard",
  "submission_time": "2026-01-27T01:24:15.735789+00:00",
  "directory_name": "claude-4.5-opus",
  "release_date": "2025-11-24",
  "parameter_count_b": null,
  "active_parameter_count_b": null
}

scores.json - Array of benchmark results:

[
  {
    "benchmark": "swe-bench",
    "score": 76.6,
    "metric": "accuracy",
    "cost_per_instance": 1.82,
    "average_runtime": 325.0,
    "full_archive": "https://results.eval.all-hands.dev/eval-21370451733-...",
    "tags": ["swe-bench"],
    "agent_version": "v1.8.3",
    "submission_time": "2026-01-27T01:24:15.735789+00:00"
  }
]

Configuration File (agenteval.json)

The configuration file defines the benchmark structure:

{
  "suite_config": {
    "name": "openhands-index",
    "version": "1.0.0-dev1",
    "splits": [
      {
        "name": "test",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      },
      {
        "name": "validation",
        "tasks": [
          {
            "name": "swe-bench",
            "tags": ["swe-bench"]
          },
          {
            "name": "swe-bench-multimodal",
            "tags": ["swe-bench-multimodal"]
          },
          {
            "name": "swt-bench",
            "tags": ["swt-bench"]
          },
          {
            "name": "commit0",
            "tags": ["commit0"]
          },
          {
            "name": "gaia",
            "tags": ["gaia"]
          }
        ]
      }
    ]
  }
}

Data Loading Process

  1. GitHub Repository Check: The app first attempts to clone the openhands-index-results repository
  2. Version Directory: Looks for a directory matching CONFIG_NAME (currently "1.0.0-dev1")
  3. Fallback to Mock Data: If GitHub data is unavailable, falls back to local mock data in mock_results/
  4. Data Extraction: Copies data to /tmp/oh_index/data/{version}/extracted/{version}/

Updating Data

To update the leaderboard data:

  1. Push new JSONL files to the openhands-index-results repository
  2. Ensure the version directory matches CONFIG_NAME in config.py
  3. The app will automatically fetch the latest data on restart

Mock Data

Mock data is stored in mock_results/1.0.0-dev1/ and is used:

  • During development and testing
  • When the GitHub repository is unavailable
  • As a template for the expected data format