A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.
Note A highly capable 2.4B lightweight LLM using only 1T pre-training data.