A low number of evaluation benchmarks

#5
by Michalea - opened

Hello!!
Thanks for your contribution to open-source community!
Nevertheless, the number of eval. benchmarks is very low, and humaneval is a very saturated dataset.
I think it would be perfect if you extend your evaluation.

the evaluations are from the original cerebras/GLM-4.7-Flash-REAP-23B-A3B repo. daniel did not add the benchmarks himself

the evaluations are from the original cerebras/GLM-4.7-Flash-REAP-23B-A3B repo. daniel did not add the benchmarks himself

Oh yes I see, thanks for information, I think adding more datasets by Cerebras would be good as their method can drop on other benchmarks to quite not low extent.

Sign up or log in to comment