YAML Metadata Error:Invalid content in Eval Result file .eval_results/ngen4-pro.yaml

Check out the documentation for more information.

Show details
Task ID "terminal_bench" does not match any task in dataset "harborframework/terminal-bench-2.0". Available: terminalbench_2

YAML Metadata Error:Invalid content in Eval Result file .eval_results/ngen4-pro.yaml

Check out the documentation for more information.

Show details
Task ID "screenspot_pro" does not match any task in dataset "likaixin/ScreenSpot-Pro". Available: overall, android_studio_macos, autocad_windows, blender_windows, davinci_macos, eviews_windows, excel_macos, fruitloops_windows, illustrator_windows, inventor_windows, linux_common_linux, macos_common_macos, matlab_macos, origin_windows, photoshop_windows, powerpoint_windows, premiere_windows, pycharm_macos, quartus_windows, solidworks_windows, stata_windows, unreal_engine_windows, vivado_windows, vmware_macos, vscode_macos, windows_common_windows, word_macos
NGen-4-Pro / .eval_results /ngen4-pro.yaml
Thishyaketh's picture
Update .eval_results/ngen4-pro.yaml
52095a3 verified
raw
history blame contribute delete
982 Bytes
- dataset:
id: Idavidrein/gpqa
task_id: diamond
value: 91.1
date: '2026-04-06'
source:
url: https://tnsaai.com/models/ngen4-pro
name: TNSA NGen-4 Pro Evaluations
- dataset:
id: openai/gsm8k
task_id: gsm8k
value: 99.2
date: '2026-04-06'
source:
url: https://tnsaai.com/models/ngen4-pro
name: TNSA NGen-4 Pro Evaluations
- dataset:
id: SWE-bench/SWE-bench_Verified
task_id: swe_bench_%_resolved
value: 77.3
date: '2026-04-06'
source:
url: https://tnsaai.com/models/ngen4-pro
name: TNSA NGen-4 Pro Evaluations
- dataset:
id: harborframework/terminal-bench-2.0
task_id: terminal_bench
value: 42.3
date: '2026-04-06'
source:
url: https://tnsaai.com/models/ngen4-pro
name: TNSA NGen-4 Pro Evaluations
- dataset:
id: likaixin/ScreenSpot-Pro
task_id: screenspot_pro
value: 72.9
date: '2026-04-06'
source:
url: https://tnsaai.com/models/ngen4-pro
name: TNSA NGen-4 Pro Evaluations