YAML Metadata Error:Invalid content in Eval Result file .eval_results/ngen4-pro.yaml
Check out the documentation for more information.
Show details
Task ID "terminal_bench" does not match any task in dataset "harborframework/terminal-bench-2.0". Available: terminalbench_2
YAML Metadata Error:Invalid content in Eval Result file .eval_results/ngen4-pro.yaml
Check out the documentation for more information.
Show details
Task ID "screenspot_pro" does not match any task in dataset "likaixin/ScreenSpot-Pro". Available: overall, android_studio_macos, autocad_windows, blender_windows, davinci_macos, eviews_windows, excel_macos, fruitloops_windows, illustrator_windows, inventor_windows, linux_common_linux, macos_common_macos, matlab_macos, origin_windows, photoshop_windows, powerpoint_windows, premiere_windows, pycharm_macos, quartus_windows, solidworks_windows, stata_windows, unreal_engine_windows, vivado_windows, vmware_macos, vscode_macos, windows_common_windows, word_macos
| - dataset: | |
| id: Idavidrein/gpqa | |
| task_id: diamond | |
| value: 91.1 | |
| date: '2026-04-06' | |
| source: | |
| url: https://tnsaai.com/models/ngen4-pro | |
| name: TNSA NGen-4 Pro Evaluations | |
| - dataset: | |
| id: openai/gsm8k | |
| task_id: gsm8k | |
| value: 99.2 | |
| date: '2026-04-06' | |
| source: | |
| url: https://tnsaai.com/models/ngen4-pro | |
| name: TNSA NGen-4 Pro Evaluations | |
| - dataset: | |
| id: SWE-bench/SWE-bench_Verified | |
| task_id: swe_bench_%_resolved | |
| value: 77.3 | |
| date: '2026-04-06' | |
| source: | |
| url: https://tnsaai.com/models/ngen4-pro | |
| name: TNSA NGen-4 Pro Evaluations | |
| - dataset: | |
| id: harborframework/terminal-bench-2.0 | |
| task_id: terminal_bench | |
| value: 42.3 | |
| date: '2026-04-06' | |
| source: | |
| url: https://tnsaai.com/models/ngen4-pro | |
| name: TNSA NGen-4 Pro Evaluations | |
| - dataset: | |
| id: likaixin/ScreenSpot-Pro | |
| task_id: screenspot_pro | |
| value: 72.9 | |
| date: '2026-04-06' | |
| source: | |
| url: https://tnsaai.com/models/ngen4-pro | |
| name: TNSA NGen-4 Pro Evaluations | |