Spaces:
Running
Running
File size: 2,245 Bytes
22db46f 9e33575 00de1cc 8198041 4364c33 5ee1146 00de1cc b2d9272 db02725 4364c33 00de1cc 5ee1146 4364c33 00de1cc 51443e9 cdf5221 51443e9 00de1cc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ---
title: README
emoji: π
colorFrom: pink
colorTo: blue
sdk: static
pinned: false
---
# Metaphi, Inc
We introduce, CREW, Cross function Enterprise Work Index, to evaluate frontier AI models on long-horizon enterprise tasks.
## CREW-Agents
| Agent | Occupation | Complexity | Scale | What It Tests | Verifiers |
|-------|--------|-------|-------|------------|--------|
| **[Fin Agent](link)** | Credit analyst | 32+ expert hours | 2,610 tasks, 26K+ PDFs | Multiple document reasoning β taxonomy aware transaction categorization β Business P&L construction | Programmatic: Binary pass/fail |
| **[Enterprise Knowledge Agent](link)** | Senior business analyst | 16+ expert hours| 1,220 pitch-deck tasks, 45 video tasks, 279 preference pairs | Source faitfhulness β narrative arc based story-telling --> design coherenece| Skill-based rubrics and Preference-pairs |
| **[Front-end Agent](link)** | Senior Frontend engineer | 60-100 expert hours | 37 tasks, 147 expert preferences | Figma environment navigation β design system creation β build verification | Skill-based rubrics and Preference-pairs |
## Leaderboard
Results at [evals.metaphi.ai/crew/leaderboard](https://evals.metaphi.ai/crew/leaderboard)
## About
Metaphi is an applied AI research lab founded on the mission of scale out of RL environments for long-horizon agents.
We partner with the world's leading domain experts in curating our environments, and training in-house reward models for programmatic verification of autonomous agents.
Website: [metaphi.ai](https://metaphi.ai)
|