pathcosmos commited on
Commit
fc1a997
·
verified ·
1 Parent(s): 39739fa

Upload orpo/README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. orpo/README.md +60 -0
orpo/README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ko
4
+ - en
5
+ license: apache-2.0
6
+ tags:
7
+ - orpo
8
+ - alignment
9
+ - experimental
10
+ - lora
11
+ - korean
12
+ - llm
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # EVAFRILL-Mo 3B — ORPO (Experimental)
17
+
18
+ Experimental variant trained with ORPO (Odds Ratio Preference Optimization), which performs
19
+ SFT and preference alignment simultaneously without a reference model.
20
+
21
+ ## Training Stage
22
+
23
+ ORPO fine-tuning directly from the pretrained base checkpoint (not from SFT v2).
24
+
25
+ ## Key Details
26
+
27
+ - **Steps**: 10,000
28
+ - **Loss formulation**: SFT loss + lambda * odds_ratio_loss
29
+ - **Reference model**: none required (ORPO property)
30
+ - **LoRA weights file**: `lora_weights.pt`
31
+
32
+ ## Metrics
33
+
34
+ | Metric | Value |
35
+ |--------|-------|
36
+ | Steps trained | 10,000 |
37
+ | Outcome | SFT learning insufficient at 10K steps |
38
+
39
+ ## Notes
40
+
41
+ This is an **experimental** variant. At 10K steps starting from the raw pretrained model,
42
+ ORPO did not provide sufficient SFT-level instruction following. The simultaneous SFT +
43
+ alignment objective requires more steps when starting from a base (non-instruction-tuned)
44
+ checkpoint.
45
+
46
+ Not recommended for production use. Included for research reproducibility.
47
+ For best results, use the [SLERP variant](../slerp/).
48
+
49
+ ## Main Model Card
50
+
51
+ See the [main README](../../README.md) for full project details, architecture, and training history.
52
+
53
+ ## Usage
54
+
55
+ ```python
56
+ from transformers import AutoModelForCausalLM, AutoTokenizer
57
+ from peft import PeftModel
58
+ base = AutoModelForCausalLM.from_pretrained("path/to/orpo", torch_dtype="bfloat16")
59
+ model = PeftModel.from_pretrained(base, "path/to/orpo")
60
+ ```