Spaces:

umyunsang
/

govon-runtime

Paused

GovOn R1 Preview — EXAONE 4.0 + Multi-LoRA Agentic Shell for Korean Public Sector

by umyunsang - opened 10 days ago

What problem does GovOn solve?

Local government officers in South Korea handle dozens of civil complaints daily — road damage reports, parking disputes, noise complaints. Each response requires searching relevant laws, finding similar cases, and drafting formal documents.

GovOn automates this workflow with an AI agent.

Project Background

GovOn is developed at Dong-A University (Department of Computer Science) as an industry-collaboration capstone project, partnering with public sector organizations.

How It Works

1. ReAct Autonomous Agent

LangGraph-based v4 architecture
7 tools: civil complaint lookup, issue detection, statistics, keyword analysis, demographics, civil-response adapter, legal adapter
Human-in-the-loop approval before tool execution

2. Domain-Specific Multi-LoRA

Base: EXAONE 4.0-32B-AWQ
civil-adapter (this model): 74K civil complaint Q&A pairs → formal response drafting
legal-adapter: 270K legal documents → law citation and evidence
vLLM Multi-LoRA serving: per-request adapter switching

3. Production-Grade Engineering

E2E 27/27 scenarios passing (6-Phase verification)
DORA Elite grade (30 deploys/week, 0.9h lead time)
6-Layer context management for multi-turn conversations

Architecture

User Terminal (govon CLI)
        |
        v
HF Space — A100 80GB
+-------------------------+
|  FastAPI :7860          |
|  v4 ReAct Agent         |
|  + 6-Layer Context Mgmt |
|                         |
|  vLLM :8000             |
|  EXAONE 4.0-32B-AWQ     |
|  + civil LoRA (r16)     |
|  + legal LoRA (r16)     |
+-------------------------+

Links

Resource	URL
GitHub	GovOn-Org/GovOn
Civil Adapter	umyunsang/govon-civil-adapter
Legal Adapter	siwo/govon-legal-adapter
Docs Portal	govon-org.github.io/GovOn

We want your feedback!

What features would be most useful for government officers?
Thoughts on the Multi-LoRA agentic architecture?
Similar domain applications you would like to explore?

This project is open source. Stars, forks, and PRs are welcome!

umyunsang

Owner 10 days ago

Deep Dive: How GovOn's ReAct Agent Actually Works

The Problem We Observed

During field research at a local government office in Busan, South Korea, we observed civil complaint officers spending 20-30 minutes per response:

Search — Finding relevant laws and regulations (Korean legal system has frequent amendments)
Lookup — Searching similar past cases across fragmented databases
Draft — Writing formal government-style responses with correct formatting
Review — Cross-checking citations and department references

This is not a creative task. It is a structured, repeatable workflow — exactly the kind AI agents excel at.

Architecture Deep Dive

GovOn runs two agent graphs simultaneously:

Graph	Endpoint	Use Case	Approval
v4	/v2/agent/*	Production: human-in-the-loop	Required before tool execution
v3	/v3/agent/*	Development/testing: auto-execute	None (fully autonomous)

Both use the same ReAct loop pattern: the LLM observes the current state, reasons about what tool to call, acts by executing the tool, and then observes the result to decide the next step.

Multi-LoRA Serving — One Model, Multiple Experts

The key architectural decision was not to deploy separate models for each domain. Instead:

vLLM handles per-request LoRA switching with near-zero overhead. This means:

One A100 GPU serves all capabilities
Adapter switching takes milliseconds, not seconds
Adding new domain adapters only requires training a new LoRA (~32MB each)

6-Layer Context Management

Long conversations are the Achilles heel of agent systems. After 3-5 turns, context windows overflow. We built a 6-layer defense:

Layer	Stage	Mechanism
L1	Tool execution	Truncate tool outputs to 3000 chars (head+tail)
L2	Agent input	Clear old tool results with placeholder after iteration 2+
L3	Agent input	Reverse token-budget trim (4500 token budget)
L4	Agent input	Hard cap — force remove from front if still over
L5	Session load	Rule-based extractive summary of older messages
L6	Session load	Permanent message removal via RemoveMessage

This was inspired by production patterns from Claude API and Codex. The result: GovOn maintains coherent 5+ turn conversations without hallucination from context overflow.

Training Details

	Civil Adapter	Legal Adapter
Base	EXAONE 4.0-32B	EXAONE 4.0-32B
Method	Unsloth QLoRA 4-bit NF4	Unsloth QLoRA 4-bit NF4
LoRA Config	r=16, alpha=32, 7 target modules	r=16, alpha=32, 7 target modules
Dataset	74K civil Q&A pairs	270K legal documents
Hardware	HF Spaces A100 80GB	HF Spaces A100 80GB
Final Loss	0.889	0.889

Data sources: AI Hub (Korean government open datasets) — civil service QA, administrative law QA, civil/IP/criminal cases, and court precedents.

E2E Verification: 27 Scenarios, 6 Phases

We don't ship without evidence. Our E2E test suite covers:

Phase	Scenarios	What It Tests
1. Infrastructure	3	Health, base model, vLLM connection
2. v2 Pipeline	6	Approval flow, rejection, multi-turn, concurrency
3. v3 ReAct	10	Direct answer, tool execution, SSE streaming, iteration limits
4. Cross-version	2	v2→v3 consistency, long query handling
5. Multi-turn	3	Context retention, session isolation, 3-turn workflow
6. Context Mgmt	3	Tool clearing, long query + clearing, 5-turn summarization

Current status: 27/27 passing.

Want to try it?

The runtime is deployed on this HF Space. If the Space is active (A100 GPU), you can hit the API directly:

The space is paused, ask a maintainer to restart it

We welcome technical feedback on architecture, training methodology, or deployment strategies!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment