difference between ComPO and ComPO-2

#1
by DZgas - opened

What's the difference between ComPO and ComPO-2 for this Gemma model? I see you've slightly changed the last layers, but why exactly? Was there some inaccuracy in ComPO that was corrected in ComPO-2?

Sorry, I don't remember exactly why, but one of the later models might be the one I fine-tuned with better hyperparameters, which performed better than the version reported in the arXiv paper.

Sign up or log in to comment