leaderboard / backend

Commit History

fix(ui): render structured benchmark details correctly
7749d9c

Alyafeai commited on

dark mode overhaul, fix filter panel, and UI polish
de2f8be

LeenAlQadi commited on

UI polish: table options grouping, column visibility redesign, model size fixes
bcff379

LeenAlQadi commited on

added filters and fixed adaptive filtered average
f20c7d0

LeenAlQadi commited on

UI polish: sticky avg column at 3rd, global color gradient, podium links, cards pronounced
3638b8c

LeenAlQadi commited on

add pagination for details
62947e5

Alyafeai commited on

fix to show bert score for each sample
8cbc289

Alyafeai commited on

Align FannOrFlop detail summary with results f1 and return full benchmark-detail rows
5af9331

Alyafeai commited on

fix issue with model's type emoji
633c620

Alyafeai commited on

Add Expand and Collapse button for long samples in the detail section
129be1e

Alyafeai commited on

Enahnce downloading the details repo
b50cc9d

Alyafeai commited on

Fix score mismatch between details and table
03ab794

Alyafeai commited on

Prioritize extracted_json over raw_response
d243d1c

Alyafeai commited on

Enhance Details for benchmarks without sub-tasks
c61f1c0

Alyafeai commited on

New formats for details, add fannorflop benchmark
86f7358

Alyafeai commited on

add a new status submitted
e357bf2

Basma Boussaha commited on

fix
f8317f3

Basma Boussaha commited on

remove unecessary args
2bda3df

Basma Boussaha commited on

fix issue with multi-options answers, and with the samples that don't have binary score
53dfe4f

Alyafeai commited on

fix details
f8be5d4

Alyafeai commited on

fix issue with some parquet files have different formats
034f762

Alyafeai commited on

fix requirements
5e5f8d3

Alyafeai commited on

adding details
3725eb1

Alyafeai commited on

fix get_model_size
163662f

Basma Boussaha commited on

remove unnecessary tags
fdbda9e

Basma Boussaha commited on

bunch of fixes
b6de4f2

Basma Boussaha commited on

change the way we read results by specifying source type, source type is the prefix of the filename. Sperating the tasks into different lists based on the source
c7488db

Alyafeai commited on

removing chat_template
1482380

Alyafeai commited on

fix some of the status
8eddc6c

Alyafeai commited on

adding new benchmarks
b828f6c

Alyafeai commited on

modify how finetuning jobs get submitted
0da0ffb

Alyafeai commited on

Update backend/config.py
079041d
verified

basma-b commited on

Update backend/config.py
8993e7a
verified

basma-b commited on

update dataset_keys to match qimma.py
f248cb9
verified

amztheory commited on

fix requirements.txt
747d6d8

Alyafeai commited on

modify benchmarks
ff2b7be

Alyafeai commited on

feat: add average column and improve file loading logic
510e4b3

Alyafeai commited on

Add ArabLegalQA and MedArabicQA benchmarks + default missing results to 0
6cb3a47

Alyafeai commited on

Update backend/config.py
c0b49da
verified

basma-b commited on

Update backend/config.py
f95776a
verified

basma-b commited on

mcq tasks
fbb9d41

Alyafeai commited on

bugfix
a587249

Alyafeai commited on

fix config
bb633bc

Alyafeai commited on

fix config
4ddba7f

Alyafeai commited on

fix config
4d6faba

Alyafeai commited on

download_datasets
800eeca

Alyafeai commited on

requirements
cb8dda6

Alyafeai commited on

requirements
543d551

Alyafeai commited on

push first code
178c53e

Alyafeai commited on