File size: 2,245 Bytes
22db46f
 
 
 
 
 
 
 
 
9e33575
00de1cc
8198041
4364c33
5ee1146
00de1cc
b2d9272
 
db02725
 
 
4364c33
 
00de1cc
5ee1146
4364c33
 
00de1cc
51443e9
cdf5221
51443e9
00de1cc
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
title: README
emoji: πŸŒ–
colorFrom: pink
colorTo: blue
sdk: static
pinned: false
---

# Metaphi, Inc     
                                                                                                                             
  We introduce, CREW, Cross function Enterprise Work Index, to evaluate frontier AI models on long-horizon enterprise tasks. 

## CREW-Agents                                                                                                               
                                                                                                                             
  | Agent | Occupation | Complexity | Scale | What It Tests | Verifiers |                                                                                
  |-------|--------|-------|-------|------------|--------|                                                                                 
  | **[Fin Agent](link)** | Credit analyst | 32+ expert hours | 2,610 tasks, 26K+ PDFs | Multiple document reasoning β†’ taxonomy aware transaction categorization β†’ Business P&L construction  | Programmatic: Binary pass/fail |
  | **[Enterprise Knowledge Agent](link)** |  Senior business analyst | 16+ expert hours| 1,220 pitch-deck tasks, 45 video tasks, 279 preference pairs | Source faitfhulness β†’ narrative arc based story-telling --> design coherenece| Skill-based rubrics and Preference-pairs |
  | **[Front-end Agent](link)** | Senior Frontend engineer | 60-100 expert hours | 37 tasks, 147 expert preferences | Figma environment navigation β†’ design system creation β†’ build verification  | Skill-based rubrics and Preference-pairs |
                                                                                                                             
## Leaderboard

  Results at [evals.metaphi.ai/crew/leaderboard](https://evals.metaphi.ai/crew/leaderboard)
  
## About

  Metaphi is an applied AI research lab founded on the mission of scale out of RL environments for long-horizon agents. 

  We partner with the world's leading domain experts in curating our environments, and training in-house reward models for programmatic verification of autonomous agents. 

  Website: [metaphi.ai](https://metaphi.ai)