A low number of evaluation benchmarks

by Michalea - opened Jan 30

Jan 30

Hello!!
Thanks for your contribution to open-source community!
Nevertheless, the number of eval. benchmarks is very low, and humaneval is a very saturated dataset.
I think it would be perfect if you extend your evaluation.

floory

Feb 4

the evaluations are from the original cerebras/GLM-4.7-Flash-REAP-23B-A3B repo. daniel did not add the benchmarks himself

Michalea

Feb 6

•

edited Feb 6

the evaluations are from the original cerebras/GLM-4.7-Flash-REAP-23B-A3B repo. daniel did not add the benchmarks himself

Oh yes I see, thanks for information, I think adding more datasets by Cerebras would be good as their method can drop on other benchmarks to quite not low extent.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment