Upload orpo/README.md with huggingface_hub

Browse files

Files changed (1) hide show

orpo/README.md +60 -0

orpo/README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+language:
+  - ko
+  - en
+license: apache-2.0
+tags:
+  - orpo
+  - alignment
+  - experimental
+  - lora
+  - korean
+  - llm
+pipeline_tag: text-generation
+---
+# EVAFRILL-Mo 3B — ORPO (Experimental)
+Experimental variant trained with ORPO (Odds Ratio Preference Optimization), which performs
+SFT and preference alignment simultaneously without a reference model.
+## Training Stage
+ORPO fine-tuning directly from the pretrained base checkpoint (not from SFT v2).
+## Key Details
+- **Steps**: 10,000
+- **Loss formulation**: SFT loss + lambda * odds_ratio_loss
+- **Reference model**: none required (ORPO property)
+- **LoRA weights file**: `lora_weights.pt`
+## Metrics
+| Metric | Value |
+|--------|-------|
+| Steps trained | 10,000 |
+| Outcome | SFT learning insufficient at 10K steps |
+## Notes
+This is an **experimental** variant. At 10K steps starting from the raw pretrained model,
+ORPO did not provide sufficient SFT-level instruction following. The simultaneous SFT +
+alignment objective requires more steps when starting from a base (non-instruction-tuned)
+checkpoint.
+Not recommended for production use. Included for research reproducibility.
+For best results, use the [SLERP variant](../slerp/).
+## Main Model Card
+See the [main README](../../README.md) for full project details, architecture, and training history.
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+base = AutoModelForCausalLM.from_pretrained("path/to/orpo", torch_dtype="bfloat16")
+model = PeftModel.from_pretrained(base, "path/to/orpo")
+```