GovOn R1 Preview — EXAONE 4.0 + Multi-LoRA Agentic Shell for Korean Public Sector

#2
by umyunsang - opened

What problem does GovOn solve?

Local government officers in South Korea handle dozens of civil complaints daily — road damage reports, parking disputes, noise complaints. Each response requires searching relevant laws, finding similar cases, and drafting formal documents.

GovOn automates this workflow with an AI agent.


Project Background

GovOn is developed at Dong-A University (Department of Computer Science) as an industry-collaboration capstone project, partnering with public sector organizations.


How It Works

1. ReAct Autonomous Agent

  • LangGraph-based v4 architecture
  • 7 tools: civil complaint lookup, issue detection, statistics, keyword analysis, demographics, civil-response adapter, legal adapter
  • Human-in-the-loop approval before tool execution

2. Domain-Specific Multi-LoRA

  • Base: EXAONE 4.0-32B-AWQ
  • civil-adapter (this model): 74K civil complaint Q&A pairs → formal response drafting
  • legal-adapter: 270K legal documents → law citation and evidence
  • vLLM Multi-LoRA serving: per-request adapter switching

3. Production-Grade Engineering

  • E2E 27/27 scenarios passing (6-Phase verification)
  • DORA Elite grade (30 deploys/week, 0.9h lead time)
  • 6-Layer context management for multi-turn conversations

Architecture

User Terminal (govon CLI)
        |
        v
HF Space — A100 80GB
+-------------------------+
|  FastAPI :7860          |
|  v4 ReAct Agent         |
|  + 6-Layer Context Mgmt |
|                         |
|  vLLM :8000             |
|  EXAONE 4.0-32B-AWQ     |
|  + civil LoRA (r16)     |
|  + legal LoRA (r16)     |
+-------------------------+

Links

Resource URL
GitHub GovOn-Org/GovOn
Civil Adapter umyunsang/govon-civil-adapter
Legal Adapter siwo/govon-legal-adapter
Docs Portal govon-org.github.io/GovOn

We want your feedback!

  • What features would be most useful for government officers?
  • Thoughts on the Multi-LoRA agentic architecture?
  • Similar domain applications you would like to explore?

This project is open source. Stars, forks, and PRs are welcome!

Deep Dive: How GovOn's ReAct Agent Actually Works

The Problem We Observed

During field research at a local government office in Busan, South Korea, we observed civil complaint officers spending 20-30 minutes per response:

  1. Search — Finding relevant laws and regulations (Korean legal system has frequent amendments)
  2. Lookup — Searching similar past cases across fragmented databases
  3. Draft — Writing formal government-style responses with correct formatting
  4. Review — Cross-checking citations and department references

This is not a creative task. It is a structured, repeatable workflow — exactly the kind AI agents excel at.


Architecture Deep Dive

GovOn runs two agent graphs simultaneously:

Graph Endpoint Use Case Approval
v4 /v2/agent/* Production: human-in-the-loop Required before tool execution
v3 /v3/agent/* Development/testing: auto-execute None (fully autonomous)

Both use the same ReAct loop pattern: the LLM observes the current state, reasons about what tool to call, acts by executing the tool, and then observes the result to decide the next step.

Multi-LoRA Serving — One Model, Multiple Experts

The key architectural decision was not to deploy separate models for each domain. Instead:

vLLM handles per-request LoRA switching with near-zero overhead. This means:

  • One A100 GPU serves all capabilities
  • Adapter switching takes milliseconds, not seconds
  • Adding new domain adapters only requires training a new LoRA (~32MB each)

6-Layer Context Management

Long conversations are the Achilles heel of agent systems. After 3-5 turns, context windows overflow. We built a 6-layer defense:

Layer Stage Mechanism
L1 Tool execution Truncate tool outputs to 3000 chars (head+tail)
L2 Agent input Clear old tool results with placeholder after iteration 2+
L3 Agent input Reverse token-budget trim (4500 token budget)
L4 Agent input Hard cap — force remove from front if still over
L5 Session load Rule-based extractive summary of older messages
L6 Session load Permanent message removal via RemoveMessage

This was inspired by production patterns from Claude API and Codex. The result: GovOn maintains coherent 5+ turn conversations without hallucination from context overflow.


Training Details

Civil Adapter Legal Adapter
Base EXAONE 4.0-32B EXAONE 4.0-32B
Method Unsloth QLoRA 4-bit NF4 Unsloth QLoRA 4-bit NF4
LoRA Config r=16, alpha=32, 7 target modules r=16, alpha=32, 7 target modules
Dataset 74K civil Q&A pairs 270K legal documents
Hardware HF Spaces A100 80GB HF Spaces A100 80GB
Final Loss 0.889 0.889

Data sources: AI Hub (Korean government open datasets) — civil service QA, administrative law QA, civil/IP/criminal cases, and court precedents.


E2E Verification: 27 Scenarios, 6 Phases

We don't ship without evidence. Our E2E test suite covers:

Phase Scenarios What It Tests
1. Infrastructure 3 Health, base model, vLLM connection
2. v2 Pipeline 6 Approval flow, rejection, multi-turn, concurrency
3. v3 ReAct 10 Direct answer, tool execution, SSE streaming, iteration limits
4. Cross-version 2 v2→v3 consistency, long query handling
5. Multi-turn 3 Context retention, session isolation, 3-turn workflow
6. Context Mgmt 3 Tool clearing, long query + clearing, 5-turn summarization

Current status: 27/27 passing.


Want to try it?

The runtime is deployed on this HF Space. If the Space is active (A100 GPU), you can hit the API directly:

The space is paused, ask a maintainer to restart it

We welcome technical feedback on architecture, training methodology, or deployment strategies!

Sign up or log in to comment