Imatrix calibration data
#3
by bobchenyx - opened
Hi there,
I would like to kindly ask regarding the imatrix calibration dataset. eaddario-imatrix-corpus-combined-all-medium.txt used in this model.
how is it different from ubergarm-imatrix-calibration-corpus-v02.txt. or is this just a previous version?
Heya sorry for late reply, life has been busy and slowly getting back into things this week.
These are two totally different imatrix calibration corpi. I used ed's because it activated more of the sparse routed experts than my own corpus.
You can see the differences yourself looking at them both:
- https://huggingface.co/datasets/eaddario/imatrix-calibration/blob/main/combined_all_medium.parquet
- ubergarm-imatrix-calibration-corpus-v02.txt
To convert ed's into txt you can use:
#source venv/bin/activate
#uv pip install duckdb
# curl https://install.duckdb.org | sh
duckdb -ascii -c "SELECT * FROM read_parquet('file.parquet');" > file.txt
Thank you for the detailed explanation. It's extremely helpful, and I truly appreciate the guidance.