YAML Metadata Error:Invalid content in Eval Result file .eval_results/ngen4-pro.yaml

Check out the documentation for more information.

Show details

Task ID "terminal_bench" does not match any task in dataset "harborframework/terminal-bench-2.0". Available: terminalbench_2

YAML Metadata Error:Invalid content in Eval Result file .eval_results/ngen4-pro.yaml

Check out the documentation for more information.

Show details

Task ID "screenspot_pro" does not match any task in dataset "likaixin/ScreenSpot-Pro". Available: overall, android_studio_macos, autocad_windows, blender_windows, davinci_macos, eviews_windows, excel_macos, fruitloops_windows, illustrator_windows, inventor_windows, linux_common_linux, macos_common_macos, matlab_macos, origin_windows, photoshop_windows, powerpoint_windows, premiere_windows, pycharm_macos, quartus_windows, solidworks_windows, stata_windows, unreal_engine_windows, vivado_windows, vmware_macos, vscode_macos, windows_common_windows, word_macos

NGen-4-Pro / .eval_results /ngen4-pro.yaml

Thishyaketh

Update .eval_results/ngen4-pro.yaml

52095a3 verified 7 days ago

raw

history blame contribute delete

982 Bytes

	- dataset:
	id: Idavidrein/gpqa
	task_id: diamond
	value: 91.1
	date: '2026-04-06'
	source:
	url: https://tnsaai.com/models/ngen4-pro
	name: TNSA NGen-4 Pro Evaluations
	- dataset:
	id: openai/gsm8k
	task_id: gsm8k
	value: 99.2
	date: '2026-04-06'
	source:
	url: https://tnsaai.com/models/ngen4-pro
	name: TNSA NGen-4 Pro Evaluations
	- dataset:
	id: SWE-bench/SWE-bench_Verified
	task_id: swe_bench_%_resolved
	value: 77.3
	date: '2026-04-06'
	source:
	url: https://tnsaai.com/models/ngen4-pro
	name: TNSA NGen-4 Pro Evaluations
	- dataset:
	id: harborframework/terminal-bench-2.0
	task_id: terminal_bench
	value: 42.3
	date: '2026-04-06'
	source:
	url: https://tnsaai.com/models/ngen4-pro
	name: TNSA NGen-4 Pro Evaluations
	- dataset:
	id: likaixin/ScreenSpot-Pro
	task_id: screenspot_pro
	value: 72.9
	date: '2026-04-06'
	source:
	url: https://tnsaai.com/models/ngen4-pro
	name: TNSA NGen-4 Pro Evaluations