Qwen2.5-3B Memory State Generator

Qwen2.5-3B-Instructλ₯Ό λ©€ν‹°ν„΄ λŒ€ν™”μ—μ„œ κ΅¬μ‘°ν™”λœ λ©”λͺ¨λ¦¬ μƒνƒœλ₯Ό μΆ”μΆœν•˜λ„λ‘ νŒŒμΈνŠœλ‹ν•œ λͺ¨λΈμž…λ‹ˆλ‹€.

λ©€ν‹°ν„΄ λŒ€ν™” νŒŒμ΄ν”„λΌμΈμ—μ„œ λΌμš°νŒ… 및 검색 전에 κ°€μž₯ λ¨Όμ € μ‹€ν–‰λ˜λ©°, 이후 μ»΄ν¬λ„ŒνŠΈ(Router, RAG, LLM)κ°€ ν™œμš©ν•  수 μžˆλŠ” memory_state JSON을 μƒμ„±ν•©λ‹ˆλ‹€.

μ‚¬μš©μž μž…λ ₯ β†’ [Memory State Generator] β†’ Router β†’ LLM/VLM

λͺ¨λΈ μ„€λͺ…

베이슀 λͺ¨λΈ Qwen/Qwen2.5-3B-Instruct
νŒŒμΈνŠœλ‹ 방식 SFT + LoRA
ν•™μŠ΅ 데이터 DialogSum + QMSum
μ΅œλŒ€ μ‹œν€€μŠ€ 길이 512
LoRA rank 16
GPU NVIDIA A100 40GB
Epoch 3
μ΅œμ’… Validation Loss 0.693

좜λ ₯ ν˜•μ‹

λŒ€ν™”λ₯Ό μž…λ ₯ν•˜λ©΄ μ•„λž˜ ν˜•μ‹μ˜ JSON을 좜λ ₯ν•©λ‹ˆλ‹€.

{
  "memory_state": {
    "key_facts": ["사싀1", "사싀2"],
    "unresolved_refs": ["뢈λͺ…ν™•ν•œ μ§€μ‹œμ–΄λ‚˜ λŒ€λͺ…사"],
    "topic": "λŒ€ν™”μ˜ 주제",
    "turn_count": 5
  },
  "memory_summary": "μ§€κΈˆκΉŒμ§€μ˜ λŒ€ν™”λ₯Ό ν•œ λ¬Έμž₯으둜 μš”μ•½ν•œ λ‚΄μš©"
}

μ‚¬μš©λ²•

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import json

model_id = "your-username/qwen2.5-3b-memory-summary-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

SYSTEM_PROMPT = """You are a Memory State Generator in a multi-turn dialogue system.
Given a conversation, extract and output a structured memory state as JSON.

Output format (strictly follow this):
{
  "memory_state": {
    "key_facts": ["fact1", "fact2"],
    "unresolved_refs": ["any unclear references or pronouns"],
    "topic": "main topic of the conversation",
    "turn_count": <number of turns>
  },
  "memory_summary": "One concise sentence summarizing the conversation so far."
}

Output only valid JSON. No explanation, no markdown."""

dialogue = """
A: RAG νŒŒμ΄ν”„λΌμΈ κ΅¬ν˜„ μ™„λ£Œν–ˆμ–΄μš”.
B: λͺ¨λΈμ€ μ–΄λ–€ κ±Έ μ“°κΈ°λ‘œ ν–ˆμ–΄μš”?
A: Qwen2.5-3B-Instruct둜 κ²°μ •ν–ˆμ–΄μš”. LoRA둜 νŒŒμΈνŠœλ‹ν•  μ˜ˆμ •μž…λ‹ˆλ‹€.
"""

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"Conversation:\n{dialogue}"}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
parsed = json.loads(response)
print(json.dumps(parsed, indent=2, ensure_ascii=False))

ν•™μŠ΅ 정보

데이터

데이터셋 크기 μ„€λͺ…
DialogSum 13,031개 일상 λŒ€ν™” + μ‚¬λžŒμ΄ μž‘μ„±ν•œ μš”μ•½λ¬Έ
QMSum 686개 회의둝 + query 기반 μš”μ•½ 쌍

두 데이터셋 λͺ¨λ‘ memory_state JSON ν˜•μ‹μœΌλ‘œ λ³€ν™˜ν•˜μ—¬ SFT ν•™μŠ΅μ— μ‚¬μš©ν–ˆμŠ΅λ‹ˆλ‹€.

ν•™μŠ΅ μ„€μ •

LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
)

SFTConfig(
    num_train_epochs=3,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    max_seq_length=512,
    bf16=True,
)

ν•™μŠ΅ Loss

Step Training Loss Validation Loss
100 14.578 0.896
500 12.919 0.804
1000 11.361 0.734
1500 10.437 0.694
2000 9.783 0.694
2400 9.635 0.693

ν•œκ³„μ 

  • λŒ€ν™” ν˜•μ‹μ— 따라 turn_count μΆ”μΆœμ΄ λΆ€μ •ν™•ν•  수 μžˆμŠ΅λ‹ˆλ‹€
  • key_factsκ°€ ꡬ체적인 사싀 μΆ”μΆœλ³΄λ‹€ 좔상적인 μš”μ•½μ— κ°€κΉκ²Œ λ‚˜μ˜€λŠ” κ²½μš°κ°€ μžˆμŠ΅λ‹ˆλ‹€ β€” synthetic 데이터 μΆ”κ°€ ν•™μŠ΅μœΌλ‘œ κ°œμ„  μ˜ˆμ •μž…λ‹ˆλ‹€
  • 짧은~쀑간 길이 λŒ€ν™”μ— μ΅œμ ν™”λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€ (μ΅œλŒ€ 512 토큰)

λΌμ΄μ„ μŠ€

Apache 2.0

Downloads last month
250
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for g34634/qwen2.5-3b-memory-summary-v1

Base model

Qwen/Qwen2.5-3B
Adapter
(1126)
this model