Upload pretrain/README.md with huggingface_hub

Browse files

Files changed (1) hide show

pretrain/README.md +60 -0

pretrain/README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+language:
+  - ko
+  - en
+license: apache-2.0
+tags:
+  - pretrained
+  - causal-lm
+  - korean
+  - llm
+pipeline_tag: text-generation
+---
+# EVAFRILL-Mo 3B — Pretrained Base
+Raw pretrained language model, the foundation for all EVAFRILL-Mo downstream variants.
+## Training Stage
+Pretraining from scratch on a mixed Korean/English corpus.
+## Key Details
+- **Steps**: 319,772 (Chinchilla ~93% budget)
+- **Tokens**: ~55B tokens
+- **Hardware**: 7× NVIDIA B200 GPUs (DDP)
+- **Precision**: BF16
+- **Architecture**: Transformer decoder, 3B parameters
+## Metrics
+| Metric | Value |
+|--------|-------|
+| Final train loss | — |
+| Chinchilla efficiency | ~93% |
+## Notes
+This is the **raw pretrained model** with no instruction tuning or alignment applied.
+It is not suitable for chat/instruction use directly — use one of the fine-tuned variants below.
+## Variants
+| Variant | Description |
+|---------|-------------|
+| [sft-v2](../sft-v2/) | Instruction-tuned (recommended starting point) |
+| [slerp](../slerp/) | SLERP merge — best overall (recommended) |
+| [dpo-r1](../dpo-r1/) | DPO alignment round 1 |
+## Main Model Card
+See the [main README](../../README.md) for full project details, architecture, and training history.
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("path/to/pretrain", torch_dtype="bfloat16")
+tokenizer = AutoTokenizer.from_pretrained("path/to/pretrain")
+```