AWAXIS-Hybrid-28B / README.md
Anserwise's picture
Update README: humble tone, remove competitive claims
c0653ca verified
---
license: apache-2.0
language:
- ko
- en
library_name: transformers
pipeline_tag: text-generation
base_model:
- Anserwise/AWAXIS-Think-28B
- FINAL-Bench/Darwin-28B-KR
tags:
- korean
- darwin
- darwin-platform
- merge
---
# AWAXIS-Hybrid-28B
> AWAXIS-Think ร— Darwin Platform Hybrid ๋ชจ๋ธ
AWAXIS-Think-28B๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ Darwin Platform ํ•œ๊ตญ์–ด ๊ฐ€์ค‘์น˜๋ฅผ Smart MRI Layer-wise ๋จธ์ง€๋กœ ๊ฒฐํ•ฉํ•œ ํ•œ๊ตญ์–ด LLM์ž…๋‹ˆ๋‹ค.
๋ณธ ๋ชจ๋ธ์€ **Anserwise**์—์„œ ์ œ์ž‘ยท๊ณต๊ฐœํ•œ ํ•œ๊ตญ์–ด LLM์ž…๋‹ˆ๋‹ค.
- **๐Ÿงฌ ์•„๋ฒ„์ง€**: [`Anserwise/AWAXIS-Think-28B`](https://huggingface.co/Anserwise/AWAXIS-Think-28B)
- **๐Ÿงฌ ์–ด๋จธ๋‹ˆ**: [`FINAL-Bench/Darwin-28B-KR`](https://huggingface.co/FINAL-Bench/Darwin-28B-KR)
---
## ๐Ÿงฌ ๋ชจ๋ธ ์„ค๋ช… โ€” ๋ถ€๋ชจ ๋ชจ๋ธ ๊ต๋ฐฐยท์ง„ํ™” (Darwin ๋ฐฉ์‹)
๋ณธ ๋ชจ๋ธ์€ ๋‘ ๋ถ€๋ชจ LLM์˜ ๊ฐ€์ค‘์น˜๋ฅผ **layer ๋‹จ์œ„๋กœ ๊ต๋ฐฐยท๊ฒฐํ•ฉ**ํ•˜๋Š” **Darwin Platform ์ง„ํ™”์  ๋จธ์ง€ ๊ธฐ๋ฒ•** (Smart MRI Layer-wise)์œผ๋กœ ์ œ์ž‘๋์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜ SFT/CPT ํ•™์Šต์ด ์•„๋‹Œ **๊ฐ€์ค‘์น˜ ํ•ฉ์„ฑ ์ง„ํ™”** ๋ฐฉ์‹์ด๋ผ๋Š” ์ ์ด ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค.
| ๊ตฌ๊ฐ„ | ์–ด๋จธ๋‹ˆ ์ฑ„ํƒ๋ฅ  | ์˜๋„ |
|------|:---:|------|
| Embed / LM-head | 50% | ์ถœ๋ ฅ ํ†ต๋กœ ๊ท ํ˜• |
| Norm | 30% | ์•ˆ์ •์„ฑ |
| Visual encoder | 0% | ์•„๋ฒ„์ง€ 100% ๋ณด์กด |
| Layers 0~15 (์ดˆ๊ธฐ) | 40% | ํ•œ๊ตญ์–ด ํ‘œ๋ฉด ํŒจํ„ด ํก์ˆ˜ |
| Layers 16~50 (์ค‘๊ธฐ) | 0% | ์ถ”๋ก  ๋Šฅ๋ ฅ ๋ณด์กด |
| Layers 51~63 (ํ›„๊ธฐ) | 70% | ๋„๋ฉ”์ธ ์ง€์‹ ์ฑ„ํƒ |
๊ฐ ๋ถ€๋ชจ์˜ ๊ฐ•์ ์ด ์–ด๋А layer์— ์ €์žฅ๋˜๋Š”์ง€ ๋ถ„์„ํ•œ ํ›„ layer-wise๋กœ ๋ถ€๋ชจ ๊ฐ€์ค‘์น˜ ๋น„์œจ์„ ๋‹ค๋ฅด๊ฒŒ ์ ์šฉํ•˜์—ฌ ๋‘ ๋ชจ๋ธ์˜ ๊ฐ•์ ๋งŒ ์„ ํƒ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
## ๐Ÿ“š ๋ฐ์ดํ„ฐ์…‹ ํ™œ์šฉ
๋ณธ ๋ชจ๋ธ์€ **๊ฐ€์ค‘์น˜ ๋จธ์ง€(Weight Merge) ์‚ฐ๋ฌผ**์ด๋ฏ€๋กœ ์ถ”๊ฐ€ SFT/CPT ํ•™์Šต์€ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋Œ€์‹  ๋‘ ๋ถ€๋ชจ ๋ชจ๋ธ์ด ์‚ฌ์ „์— ํ•™์Šตํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์˜ ๊ฐ•์ ์„ layer ๋‹จ์œ„๋กœ ํก์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ๋ถ€๋ชจ ๋ชจ๋ธ๋“ค์ด ํ•™์Šตํ•œ ์ฃผ์š” ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
- **์ผ๋ฐ˜ Instruction**: kai-sft / kai-combined ์‹œ๋ฆฌ์ฆˆ (ํ•œ๊ตญ์–ด ๋‹ค์–‘ ๋„๋ฉ”์ธ instruction-following)
- **KMMLU-Pro**: ํ•œ๊ตญ ๋„๋ฉ”์ธ ์ง€์‹ (์—ญ์‚ฌยท๋ฒ•๋ฅ ยท์˜๋ฃŒยท๊ณผํ•™ยท๊ณตํ•™ ๋“ฑ)
- **CLIcK**: ํ•œ๊ตญ ๋ฌธํ™”ยท์ƒ์‹ (์—ญ์‚ฌยท์ „ํ†ตยท์‚ฌํšŒ ๊ทœ๋ฒ”)
- **HAERAE**: ํ•œ๊ตญ์–ด ํ‘œ๋ฉด ํŒจํ„ด (์–ธ์–ดํ•™ยท์ผ๋ฐ˜์ƒ์‹ยท์—ญ์‚ฌ)
- **KOBEST**: ํ•œ๊ตญ์–ด reasoning (HellaSwagยทCOPAยทBoolQ)
- **Com2-main(ko)**: ํ•œ๊ตญ์–ด commonsense (์‚ฌํšŒ์  ์ถ”๋ก ยท์˜๋„ ํŒŒ์•…)
- **MuSR(Ko)**: ํ•œ๊ตญ์–ด ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก  (Murder MysteryยทObject Placements ๋“ฑ)
๊ฐ€์ค‘์น˜ ๋จธ์ง€ ๋‹จ๊ณ„์—์„œ ๊ฐ ๋ถ€๋ชจ์˜ ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ layer ๋น„์œจ๋กœ ๋ณด์กดยท๊ฒฐํ•ฉํ•˜์—ฌ, ๋‹จ์ผ SFT๋ณด๋‹ค catastrophic forgetting ์œ„ํ—˜์ด ๋‚ฎ๊ณ  ๋ถ€๋ชจ ๊ฐ•์  ์†์‹ค์ด ์ ์€ ๊ฒƒ์ด ํŠน์ง•์ž…๋‹ˆ๋‹ค.
---
## ๐Ÿ“Š ํ‰๊ฐ€ ๊ฒฐ๊ณผ
### 1) K-AI ๋ฆฌ๋”๋ณด๋“œ ๊ธฐ์ค€ (5๊ณผ๋ชฉ)
KMMLU-Pro / CLIcK / HLE(Ko) / MuSR(Ko) / Com2-main(ko)
- **์™ธ๋ถ€ ๋ชจ๋ธ**: K-AI ๋ฆฌ๋”๋ณด๋“œ(leaderboard.aihub.or.kr) ์‹ค์ธก๊ฐ’
- **๋ณธ ์‹œ๋ฆฌ์ฆˆ**: ์ž์ฒด mirror eval(100๋ฌธํ•ญ) ร— Rogue-28B-MIX ๊ธฐ์ค€ ratio ํ™˜์‚ฐ ์ถ”์ •๊ฐ’
| Model | KMMLU-Pro | CLIcK | HLE(Ko) | MuSR(Ko) | Com2-main(ko) | **Sum** | **Macro** |
|-------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| Hybrid (์˜ˆ์ƒ) | 0.674 | 0.787 | 0.07 | 0.611 | 0.657 | **2.799** | **0.560** |
| **AWAXIS-Hybrid-28B** โญ (์ด ๋ชจ๋ธ, ์˜ˆ์ƒ) | 0.674 | 0.787 | 0.07 | 0.611 | 0.657 | **2.799** | **0.560** |
| Rogue-28B-MIX (์‹ค์ธก) | 0.666 | 0.797 | 0.07 | 0.611 | 0.650 | **2.794** | **0.559** |
| Warecube-KO-27B-v3 (์‹ค์ธก) | 0.668 | 0.799 | 0.07 | 0.584 | 0.638 | **2.756** | **0.551** |
| AWAXIS-Think-28B (์‹ค์ธก) | 0.603 | 0.770 | 0.06 | 0.591 | 0.632 | **2.651** | **0.530** |
| KR-Pro (์˜ˆ์ƒ) | 0.643 | 0.661 | 0.07 | 0.585 | 0.650 | **2.609** | **0.522** |
| KR-Plus (์˜ˆ์ƒ) | 0.643 | 0.703 | 0.07 | 0.532 | 0.657 | **2.605** | **0.521** |
> HLE(Ko)๋Š” 28B๊ธ‰ ๊ณตํ†ต ์•ฝ์  (๋‚œ์ด๋„ ๋งค์šฐ ๋†’์Œ).
### 2) ์ข…ํ•ฉ ํ•œ๊ตญ์–ด ๋Šฅ๋ ฅ (10๊ณผ๋ชฉ mirror eval)
CLIcK + KMMLU(history/law/health) + HAERAE(gk/hist/ling) + KOBEST(hella/copa/boolq)
| Model | CLIcK | KMMLU ํ‰๊ท  | HAERAE ํ‰๊ท  | KOBEST ํ‰๊ท  | **Sum (10๊ณผ๋ชฉ)** | **Macro** |
|-------|:---:|:---:|:---:|:---:|:---:|:---:|
| **AWAXIS-Hybrid-28B** โญ (์ด ๋ชจ๋ธ) | 0.83 | 0.530 | 0.813 | 0.967 | **7.760** | **0.7760** |
| Rogue-28B-MIX | 0.83 | 0.513 | 0.807 | 0.967 | **7.690** | **0.7690** |
---
## ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Anserwise/AWAXIS-Hybrid-28B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"Anserwise/AWAXIS-Hybrid-28B",
torch_dtype="bfloat16",
device_map="auto",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "ํ•œ๊ตญ ์—ญ์‚ฌ์—์„œ ์ž„์ง„์™œ๋ž€์ด ๋ฐœ์ƒํ•œ ์‹œ๊ธฐ๋Š”?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
```
## ๋ผ์ด์„ ์Šค
Apache 2.0 (๋ถ€๋ชจ ๋ชจ๋ธ ๋ผ์ด์„ ์Šค ๊ณ„์Šน)
---
*2026-04-30*