GovOn Civil Adapter — EXAONE 4.0 QLoRA for Korean Government Civil Response Drafting

#2
by umyunsang - opened

What is this adapter?

This LoRA adapter fine-tunes EXAONE 4.0-32B-AWQ to draft formal civil complaint responses for Korean local government officers.


Training Details

Item Value
Base Model EXAONE 4.0-32B-AWQ
Method Unsloth QLoRA (4-bit NF4)
LoRA Rank r=16, alpha=32
Target Modules q, k, v, o, gate, up, down_proj (7)
Dataset 74K civil complaint Q&A pairs
Hardware HF Spaces A100 80GB
Final Loss 0.889 (365 steps)

Data Sources

  • AI Hub 71852: Public Civil Service QA (29K)
  • AI Hub 71847: Administrative Law QA (37K)
  • Custom preprocessing via parsers.py

How It Works in GovOn

GovOn uses vLLM Multi-LoRA serving. When the ReAct agent decides to call the public_admin_adapter tool, this LoRA is dynamically attached per-request:

User: 도로 파손 민원에 대한 답변을 작성해줘
  → Agent: tool_call(public_admin_adapter)
    → vLLM: attach civil-adapter LoRA
      → Formal response draft generated

A companion legal-adapter (siwo/govon-legal-adapter) handles law citation and evidence.


Project Context

Developed at Dong-A University, Dept. of Computer Science as an industry-collaboration capstone project for Korean public sector AI assistance.

Resource Link
GitHub GovOn-Org/GovOn
Runtime Space umyunsang/govon-runtime
Legal Adapter siwo/govon-legal-adapter

Feedback on training methodology, evaluation metrics, or potential improvements is very welcome!

Detailed Training Report & Lessons Learned

Why LoRA instead of full fine-tuning?

We considered three approaches for domain adaptation:

Approach GPU Memory Training Time Switching Cost Quality
Full fine-tune 160GB+ (multi-GPU) Days Must load separate model Highest
QLoRA (our choice) 48GB (single GPU) ~7 hours ~32MB swap, milliseconds High
Prompt engineering 0 (inference only) None None Moderate

For a university project deploying on a single A100 80GB, QLoRA was the clear winner. The key insight: adapter quality with 74K domain-specific pairs exceeded prompt engineering quality, while keeping the deployment footprint minimal.

Data Pipeline: From Raw Public Data to Training Pairs

The most underappreciated part of any fine-tuning project is data preparation. Our civil adapter data came from:

Source 1: AI Hub 71852 — Public Civil Service QA (29K)

  • Raw format: XML with nested complaint/response pairs
  • Challenge: Many responses were template-based with placeholder text
  • Solution: parsers.py filters responses shorter than 50 chars and strips boilerplate headers

Source 2: AI Hub 71847 — Administrative Law QA (37K)

  • Raw format: JSON with legal terminology
  • Challenge: Answers often cited laws without explaining them
  • Solution: Paired with the legal dataset for cross-referencing

Final dataset composition:

  • 74K training pairs after deduplication and quality filtering
  • Average input length: ~150 tokens
  • Average output length: ~300 tokens
  • Format: EXAONE chat template with system prompt for civil servant role

Training Configuration Decisions

Why r=16? We tested r=8, r=16, and r=32:

  • r=8: Faster training but noticeably lower response formality
  • r=16: Sweet spot — formal government style retained with good generalization
  • r=32: Marginal improvement over r=16, 2x parameter count

Why 7 target modules? Including gate/up/down_proj (MLP layers) in addition to attention (q/k/v/o) improved the model's ability to generate structured, formal text. Attention-only LoRA tended to produce more conversational responses.

Why 1 epoch? With 74K packed samples and effective batch size 64, the model converged by step 365. Validation loss plateaued at 0.889. A second epoch showed early signs of overfitting on template phrases.

What Could Be Improved

  1. Evaluation gap: We lack automated evaluation metrics (BLEU, BERTScore) comparing adapter vs base model responses. This is planned for the next iteration.
  2. Data diversity: Current data skews toward road/parking/noise complaints. Underrepresented categories (welfare, education) could benefit from targeted data collection.
  3. 2nd epoch on remaining data: Only 74K of the available ~110K civil pairs were used. A continued training run on the remainder could improve coverage.

Integration with GovOn Runtime

This adapter doesn't run standalone — it's served through vLLM Multi-LoRA:

POST /v3/agent/run
  → LLM decides to call public_admin_adapter tool
    → vLLM attaches this LoRA for the generation request
      → Response formatted as formal civil complaint answer
        → Agent returns to user (with approval in v4 mode)

The companion legal-adapter handles law citation when the agent calls the legal_adapter tool.


Questions about training methodology, data preprocessing, or LoRA configuration are very welcome!

Sign up or log in to comment