Spaces:
Running
Running
| title: README | |
| emoji: π | |
| colorFrom: pink | |
| colorTo: blue | |
| sdk: static | |
| pinned: false | |
| # Metaphi, Inc | |
| We introduce, CREW, Cross function Enterprise Work Index, to evaluate frontier AI models on long-horizon enterprise tasks. | |
| ## CREW-Agents | |
| | Agent | Occupation | Complexity | Scale | What It Tests | Verifiers | | |
| |-------|--------|-------|-------|------------|--------| | |
| | **[Fin Agent](link)** | Credit analyst | 32+ expert hours | 2,610 tasks, 26K+ PDFs | Multiple document reasoning β taxonomy aware transaction categorization β Business P&L construction | Programmatic: Binary pass/fail | | |
| | **[Enterprise Knowledge Agent](link)** | Senior business analyst | 16+ expert hours| 1,220 pitch-deck tasks, 45 video tasks, 279 preference pairs | Source faitfhulness β narrative arc based story-telling --> design coherenece| Skill-based rubrics and Preference-pairs | | |
| | **[Front-end Agent](link)** | Senior Frontend engineer | 60-100 expert hours | 37 tasks, 147 expert preferences | Figma environment navigation β design system creation β build verification | Skill-based rubrics and Preference-pairs | | |
| ## Leaderboard | |
| Results at [evals.metaphi.ai/crew/leaderboard](https://evals.metaphi.ai/crew/leaderboard) | |
| ## About | |
| Metaphi is an applied AI research lab founded on the mission of scale out of RL environments for long-horizon agents. | |
| We partner with the world's leading domain experts in curating our environments, and training in-house reward models for programmatic verification of autonomous agents. | |
| Website: [metaphi.ai](https://metaphi.ai) | |