Trinity-Large-Thinking / .eval_results /swe-bench_verified.yaml
lckr's picture
Add community evaluation results for GPQA, MMLU-PRO, SWE-BENCH_VERIFIED (#4)
377bae8
raw
history blame contribute delete
191 Bytes
- dataset:
id: SWE-bench/SWE-bench_Verified
task_id: swe_bench_%_resolved
value: 63.2
source:
url: https://huggingface.co/arcee-ai/Trinity-Large-Thinking
name: Model Card