Spaces:

metaphilabs
/

README

Running

App Files Files Community

README / README.md

metaphi-ai

Update README.md

db02725 verified about 2 months ago

preview code

raw

history blame contribute delete

2.25 kB

	---
	title: README
	emoji: 🌖
	colorFrom: pink
	colorTo: blue
	sdk: static
	pinned: false
	---

	# Metaphi, Inc

	We introduce, CREW, Cross function Enterprise Work Index, to evaluate frontier AI models on long-horizon enterprise tasks.

	## CREW-Agents

	\| Agent \| Occupation \| Complexity \| Scale \| What It Tests \| Verifiers \|
	\|-------\|--------\|-------\|-------\|------------\|--------\|
	\| [Fin Agent](link) \| Credit analyst \| 32+ expert hours \| 2,610 tasks, 26K+ PDFs \| Multiple document reasoning → taxonomy aware transaction categorization → Business P&L construction \| Programmatic: Binary pass/fail \|
	\| [Enterprise Knowledge Agent](link) \| Senior business analyst \| 16+ expert hours\| 1,220 pitch-deck tasks, 45 video tasks, 279 preference pairs \| Source faitfhulness → narrative arc based story-telling --> design coherenece\| Skill-based rubrics and Preference-pairs \|
	\| [Front-end Agent](link) \| Senior Frontend engineer \| 60-100 expert hours \| 37 tasks, 147 expert preferences \| Figma environment navigation → design system creation → build verification \| Skill-based rubrics and Preference-pairs \|

	## Leaderboard

	Results at [evals.metaphi.ai/crew/leaderboard](https://evals.metaphi.ai/crew/leaderboard)

	## About

	Metaphi is an applied AI research lab founded on the mission of scale out of RL environments for long-horizon agents.

	We partner with the world's leading domain experts in curating our environments, and training in-house reward models for programmatic verification of autonomous agents.

	Website: [metaphi.ai](https://metaphi.ai)