A DPO fine-tuned version of theprint/GeneralChat-Llama3.2-3B.
Description
GeneralChat-Llama3.2-3B, a general-purpose conversational fine-tune of Llama 3.2 3B.
This model was trained with Direct Preference Optimization (DPO) on the theprint/Tom-4.2k-alpaca dataset.
Rejected responses were generated using a weak local model to create preference pairs, with chosen responses drawn from the original dataset.
- Downloads last month
- 243
Hardware compatibility
Log In to add your hardware
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for theprint/GeneralChat-Llama3.2-3B-DPO-GGUF
Base model
theprint/GeneralChat-Llama3.2-3B Finetuned
theprint/GeneralChat-Llama3.2-3B-DPO