Qwen2.5-3B Memory State Generator
Qwen2.5-3B-Instructλ₯Ό λ©ν°ν΄ λνμμ ꡬ쑰νλ λ©λͺ¨λ¦¬ μνλ₯Ό μΆμΆνλλ‘ νμΈνλν λͺ¨λΈμ λλ€.
λ©ν°ν΄ λν νμ΄νλΌμΈμμ λΌμ°ν
λ° κ²μ μ μ κ°μ₯ λ¨Όμ μ€νλλ©°, μ΄ν μ»΄ν¬λνΈ(Router, RAG, LLM)κ° νμ©ν μ μλ memory_state JSONμ μμ±ν©λλ€.
μ¬μ©μ μ
λ ₯ β [Memory State Generator] β Router β LLM/VLM
λͺ¨λΈ μ€λͺ
| λ² μ΄μ€ λͺ¨λΈ | Qwen/Qwen2.5-3B-Instruct |
| νμΈνλ λ°©μ | SFT + LoRA |
| νμ΅ λ°μ΄ν° | DialogSum + QMSum |
| μ΅λ μνμ€ κΈΈμ΄ | 512 |
| LoRA rank | 16 |
| GPU | NVIDIA A100 40GB |
| Epoch | 3 |
| μ΅μ’ Validation Loss | 0.693 |
μΆλ ₯ νμ
λνλ₯Ό μ λ ₯νλ©΄ μλ νμμ JSONμ μΆλ ₯ν©λλ€.
{
"memory_state": {
"key_facts": ["μ¬μ€1", "μ¬μ€2"],
"unresolved_refs": ["λΆλͺ
νν μ§μμ΄λ λλͺ
μ¬"],
"topic": "λνμ μ£Όμ ",
"turn_count": 5
},
"memory_summary": "μ§κΈκΉμ§μ λνλ₯Ό ν λ¬Έμ₯μΌλ‘ μμ½ν λ΄μ©"
}
μ¬μ©λ²
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import json
model_id = "your-username/qwen2.5-3b-memory-summary-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
SYSTEM_PROMPT = """You are a Memory State Generator in a multi-turn dialogue system.
Given a conversation, extract and output a structured memory state as JSON.
Output format (strictly follow this):
{
"memory_state": {
"key_facts": ["fact1", "fact2"],
"unresolved_refs": ["any unclear references or pronouns"],
"topic": "main topic of the conversation",
"turn_count": <number of turns>
},
"memory_summary": "One concise sentence summarizing the conversation so far."
}
Output only valid JSON. No explanation, no markdown."""
dialogue = """
A: RAG νμ΄νλΌμΈ ꡬν μλ£νμ΄μ.
B: λͺ¨λΈμ μ΄λ€ κ±Έ μ°κΈ°λ‘ νμ΄μ?
A: Qwen2.5-3B-Instructλ‘ κ²°μ νμ΄μ. LoRAλ‘ νμΈνλν μμ μ
λλ€.
"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Conversation:\n{dialogue}"}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
parsed = json.loads(response)
print(json.dumps(parsed, indent=2, ensure_ascii=False))
νμ΅ μ 보
λ°μ΄ν°
| λ°μ΄ν°μ | ν¬κΈ° | μ€λͺ |
|---|---|---|
| DialogSum | 13,031κ° | μΌμ λν + μ¬λμ΄ μμ±ν μμ½λ¬Έ |
| QMSum | 686κ° | νμλ‘ + query κΈ°λ° μμ½ μ |
λ λ°μ΄ν°μ
λͺ¨λ memory_state JSON νμμΌλ‘ λ³ννμ¬ SFT νμ΅μ μ¬μ©νμ΅λλ€.
νμ΅ μ€μ
LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
)
SFTConfig(
num_train_epochs=3,
per_device_train_batch_size=1,
gradient_accumulation_steps=16,
learning_rate=2e-4,
lr_scheduler_type="cosine",
max_seq_length=512,
bf16=True,
)
νμ΅ Loss
| Step | Training Loss | Validation Loss |
|---|---|---|
| 100 | 14.578 | 0.896 |
| 500 | 12.919 | 0.804 |
| 1000 | 11.361 | 0.734 |
| 1500 | 10.437 | 0.694 |
| 2000 | 9.783 | 0.694 |
| 2400 | 9.635 | 0.693 |
νκ³μ
- λν νμμ λ°λΌ
turn_countμΆμΆμ΄ λΆμ νν μ μμ΅λλ€ key_factsκ° κ΅¬μ²΄μ μΈ μ¬μ€ μΆμΆλ³΄λ€ μΆμμ μΈ μμ½μ κ°κΉκ² λμ€λ κ²½μ°κ° μμ΅λλ€ β synthetic λ°μ΄ν° μΆκ° νμ΅μΌλ‘ κ°μ μμ μ λλ€- μ§§μ~μ€κ° κΈΈμ΄ λνμ μ΅μ νλμ΄ μμ΅λλ€ (μ΅λ 512 ν ν°)
λΌμ΄μ μ€
Apache 2.0
- Downloads last month
- 250