Qwen3-8B-TMF921-Intent-QLora

A QLoRA fine-tuned Qwen3-8B that achieves 100% in-distribution schema compliance across 6 telecom standards, 8 lifecycle operations, and adversarial rejection — evaluated on 498 stratified test samples covering all 17 target layer types.

This model converts free-form operator intents like "Deploy a URLLC slice for remote surgery with sub-1ms latency and 99.999% reliability" into structured JSON configurations conforming to TMF921, 3GPP TS 28.312, ETSI ZSM 009-1, CAMARA, O-RAN A1, and 3GPP TS 28.541.

Key Results

Metric	Score	Samples
JSON Validity	100.0%	498/498
Structure Correctness	100.0%	498/498
All 5 KPIs Present	100.0%	300/300 (create ops)
Adversarial Rejection	100.0%	25/25

Every single output across all 17 target layer types — including 6 specification layers, 8 lifecycle operations, and 3 adversarial categories — was valid JSON with correct structure and complete KPI fields.

Evaluation scope note: These results measure in-distribution schema compliance — the test set was generated by the same pipeline as the training data. The 100% score confirms the model has learned to produce correctly formatted, spec-compliant JSON for the template-bounded input distribution. This is analogous to SQL generation models achieving near-perfect scores on Spider test splits while requiring additional evaluation on user-generated queries. Out-of-distribution evaluation on human-written operator intents (e.g., the ORION 100-intent benchmark) is needed to assess real-world generalization and is planned as future work.

Evaluation Results

Per-Layer Breakdown (498 stratified samples, 50 per major layer)

Layer	N	JSON Valid	Struct Correct	All KPIs
`tmf921`	50	100%	100%	100%
`intent_3gpp`	50	100%	100%	100%
`camara`	50	100%	100%	100%
`a1_policy`	50	100%	100%	100%
`o1_nrm`	50	100%	100%	100%
`etsi_zsm`	50	100%	100%	100%
`tmf921_lifecycle_activate`	19	100%	100%	—
`tmf921_lifecycle_modify`	20	100%	100%	—
`tmf921_lifecycle_monitor`	30	100%	100%	—
`tmf921_lifecycle_report`	18	100%	100%	—
`tmf921_lifecycle_resume`	24	100%	100%	—
`tmf921_lifecycle_scale`	21	100%	100%	—
`tmf921_lifecycle_suspend`	15	100%	100%	—
`tmf921_lifecycle_terminate`	26	100%	100%	—
`adversarial_ambiguous`	10	100%	100%	—
`adversarial_contradictory`	8	100%	100%	—
`adversarial_out_of_scope`	7	100%	100%	—

KPI Field Presence (create operations, n=300)

Layer	N	Latency	Reliability	DL Thpt	UL Thpt	Max UEs
`tmf921`	50	100%	100%	100%	100%	100%
`intent_3gpp`	50	100%	100%	100%	100%	100%
`camara`	50	100%	100%	100%	100%	100%
`a1_policy`	50	100%	100%	100%	100%	100%
`o1_nrm`	50	100%	100%	100%	100%	100%
`etsi_zsm`	50	100%	100%	100%	100%	100%

Evaluation Methodology

The evaluation used stratified sampling (50 samples per major layer, all samples for smaller layers) with standard-aware KPI checking that correctly handles how each telecom standard encodes network parameters:

TMF921, 3GPP, CAMARA, ETSI ZSM: Direct KPI value matching with int/float tolerance
O-RAN A1 Policy: Reliability → Packet Error Rate (PER), latency → Packet Delay Budget (pdb), throughput → GFBR/MFBR
O1 NRM (TS 28.541): Structural element presence (rrmPolicyMemberList, operationalState, arfcnDL, bSChannelBwDL)

Full evaluation results are available in eval_v3_results.json.

Model Details

Property	Value
Base model	Qwen/Qwen3-8B
Method	QLoRA (4-bit NF4 quantization + LoRA adapters)
Training dataset	nraptisss/TMF921-intent-to-config-augmented (41,815 samples)
Task	Natural language → structured JSON configuration translation
Standards covered	TMF921, 3GPP TS 28.312, ETSI ZSM 009-1, CAMARA, O-RAN A1, 3GPP TS 28.541
License	Apache 2.0

Training Configuration

Parameter	Value
Quantization	4-bit NF4 + double quantization
LoRA rank (r)	32
LoRA alpha	64
Target modules	`all-linear`
Effective batch size	32 (per_device=4 × grad_accum=8)
Learning rate	1e-4 (cosine schedule, warmup_ratio=0.1)
Epochs	3
Max sequence length	4,096 tokens
Loss masking	`assistant_only_loss=True`
Precision	bf16
Flash attention	`flash_attention_2`
Gradient checkpointing	Yes
Estimated VRAM	~26 GB

Usage

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import json

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(base_model, "nraptisss/Qwen3-8B-TMF921-Intent-QLora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

messages = [
    {"role": "system", "content": "You are a TM Forum TMF921-compliant Intent Management system. Given a natural language network intent, output a valid TMF921 Intent Management API v4 JSON object. The response must include the full Intent resource with id, href, name, description, lifecycleState, priority, intentExpression containing IntentExpectation objects with DeliveryExpectation targets, contexts, and relatedParty references. Follow the TMF921 Open API specification and use @type annotations for polymorphic types. Ground all KPI targets in 3GPP TS 22.261 performance requirements."},
    {"role": "user", "content": "Deploy a URLLC slice for remote robotic surgery in Hospital Campus. Requirements: sub-1 ms latency, 99.999% reliability, 100 Mbps DL, 50 Mbps UL, 50 connected devices."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.1, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)

config = json.loads(response)
print(json.dumps(config, indent=2))

System Prompts by Target Layer

Layer	System Prompt Summary
`tmf921`	TMF921 v4 JSON with `@type` annotations, HATEOAS links, `relatedParty`, geographic/temporal contexts
`intent_3gpp`	3GPP TS 28.312 v18.8.0 intent with `intentExpectation`, S-NSSAI, targets
`camara`	CAMARA NetworkSliceBooking with `sliceProfile`, `areaOfService`, `duration`
`a1_policy`	O-RAN A1 policy with 5QI mapping, PRB quotas, scheduler weights
`o1_nrm`	3GPP TS 28.541 `ManagedElement` → `GNBDUFunction` → `NRCellDU` with RRM policies
`etsi_zsm`	ETSI ZSM 009-1 intent with `objectives`, `constraints`, `context`, `fulfillmentRequirements`

Full system prompts for each layer are in the training dataset.

Supported Target Layers

6 Specification Layers

Layer	Standard	Description
`tmf921`	TM Forum TMF921 v4	Full Intent resource with `@type` annotations, HATEOAS links
`intent_3gpp`	3GPP TS 28.312 Rel-18	Intent with `intentExpectation` and S-NSSAI encoding
`camara`	CAMARA NetworkSliceBooking	Slice booking with QoS profile and area of service
`a1_policy`	O-RAN WG2 A1	Policy with 5QI, PRB quotas, scheduler weights
`o1_nrm`	3GPP TS 28.541	RAN config: ManagedElement → GNBDUFunction → NRCellDU
`etsi_zsm`	ETSI GS ZSM 009-1	Zero-touch intent with fulfillment requirements

8 Lifecycle Operations

activate, modify, monitor, report, scale, suspend, resume, terminate — all following the TMF921 state machine.

3 Adversarial Categories

Category	Example Input	Expected Output
Ambiguous	"Can you create a slice for educational purposes?"	`CLARIFICATION_REQUIRED`
Contradictory	"mMTC slice with sub-1ms latency for 1M devices at 10 Gbps each"	`INTENT_VALIDATION_FAILED`
Out-of-scope	"Request for a list of local volunteer opportunities"	`OUT_OF_SCOPE`

6 Slice Types

eMBB (SST=1), URLLC (SST=2), mMTC (SST=3), V2X (SST=4), HMTC (SST=5), MPS (SST=5, distinct SD)

Training Data

nraptisss/TMF921-intent-to-config-augmented — 41,815 samples (39,294 train / 2,521 test) covering 6 specification layers, 8 lifecycle operations, 3 adversarial categories, 18 industry sectors, 147 use cases, and 54 geographic regions.

See the dataset card for full documentation of construction methodology and specification grounding.

Training Pipeline

Reproducible pipeline at nraptisss/intent-translation-training with train.py, evaluate_v3.py, and run.sh.

Known Limitations

In-distribution evaluation only: The 100% results are on a test set from the same synthetic generator. Real operator intents are more ambiguous, underspecified, and linguistically diverse. Out-of-distribution evaluation on human-authored intents is required before deployment claims.
Single-turn only: Handles individual intent→config translations. Multi-turn conversations (create → monitor → modify → terminate) were not evaluated.
English only: All training data is in English.
Synthetic training data: Fine-tuned on template + LLM-augmented data, not real operator intents. Real-world intents may be more ambiguous.
HMTC is speculative: SST=5 for HMTC is based on Rel-17 extensions; 6G slice types are not yet standardized.

Citation

@misc{qwen3_8b_tmf921_qlora_2025,
  title     = {Qwen3-8B-TMF921-Intent-QLora: QLoRA Fine-tuned LLM for
               5G/6G Intent-to-Configuration Translation},
  author    = {Raptis, Nikolaos},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/nraptisss/Qwen3-8B-TMF921-Intent-QLora},
  note      = {100\% JSON validity, structure correctness, and KPI accuracy
               on 498 stratified test samples across TMF921, 3GPP TS 28.312,
               ETSI ZSM, CAMARA, O-RAN A1, and 3GPP TS 28.541}
}

References

Downloads last month: -

Model tree for nraptisss/Qwen3-8B-TMF921-Intent-QLora

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1185)

this model

Dataset used to train nraptisss/Qwen3-8B-TMF921-Intent-QLora

Papers for nraptisss/Qwen3-8B-TMF921-Intent-QLora

ORION: Intent-Aware Orchestration in Open RAN for SLA-Driven Network Management

Paper • 2603.03667 • Published Mar 4

NEFMind: Parameter-Efficient Fine-Tuning of Open-Source LLMs for Telecom APIs Automation

Paper • 2508.09240 • Published Aug 12, 2025

Evaluation results

JSON Validity (%) on TMF921-intent-to-config-augmented
test set self-reported

100.000
Structure Correctness (%) on TMF921-intent-to-config-augmented
test set self-reported

100.000
All KPIs Present (%) on TMF921-intent-to-config-augmented
test set self-reported

100.000