A low number of evaluation benchmarks
#5
by Michalea - opened
Hello!!
Thanks for your contribution to open-source community!
Nevertheless, the number of eval. benchmarks is very low, and humaneval is a very saturated dataset.
I think it would be perfect if you extend your evaluation.
the evaluations are from the original cerebras/GLM-4.7-Flash-REAP-23B-A3B repo. daniel did not add the benchmarks himself
the evaluations are from the original cerebras/GLM-4.7-Flash-REAP-23B-A3B repo. daniel did not add the benchmarks himself
Oh yes I see, thanks for information, I think adding more datasets by Cerebras would be good as their method can drop on other benchmarks to quite not low extent.