A DPO fine-tuned version of theprint/GeneralChat-Llama3.2-3B.

Description

GeneralChat-Llama3.2-3B, a general-purpose conversational fine-tune of Llama 3.2 3B.

This model was trained with Direct Preference Optimization (DPO) on the theprint/Tom-4.2k-alpaca dataset.

Rejected responses were generated using a weak local model to create preference pairs, with chosen responses drawn from the original dataset.

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for theprint/GeneralChat-Llama3.2-3B-DPO-GGUF

Base model

Finetuned

Quantized

(3)

this model