Imatrix calibration data

#3
by bobchenyx - opened

Hi there,
I would like to kindly ask regarding the imatrix calibration dataset. eaddario-imatrix-corpus-combined-all-medium.txt used in this model.
how is it different from ubergarm-imatrix-calibration-corpus-v02.txt. or is this just a previous version?

Owner

@bobchenyx

Heya sorry for late reply, life has been busy and slowly getting back into things this week.

These are two totally different imatrix calibration corpi. I used ed's because it activated more of the sparse routed experts than my own corpus.

You can see the differences yourself looking at them both:

To convert ed's into txt you can use:

#source venv/bin/activate
#uv pip install duckdb
# curl https://install.duckdb.org | sh
duckdb -ascii -c "SELECT * FROM read_parquet('file.parquet');" > file.txt

Thank you for the detailed explanation. It's extremely helpful, and I truly appreciate the guidance.

Sign up or log in to comment