Qwen3-8B-TMF921-Intent-QLora

A QLoRA fine-tuned Qwen3-8B that achieves 100% in-distribution schema compliance across 6 telecom standards, 8 lifecycle operations, and adversarial rejection — evaluated on 498 stratified test samples covering all 17 target layer types.

This model converts free-form operator intents like "Deploy a URLLC slice for remote surgery with sub-1ms latency and 99.999% reliability" into structured JSON configurations conforming to TMF921, 3GPP TS 28.312, ETSI ZSM 009-1, CAMARA, O-RAN A1, and 3GPP TS 28.541.


Key Results

Metric Score Samples
JSON Validity 100.0% 498/498
Structure Correctness 100.0% 498/498
All 5 KPIs Present 100.0% 300/300 (create ops)
Adversarial Rejection 100.0% 25/25

Every single output across all 17 target layer types — including 6 specification layers, 8 lifecycle operations, and 3 adversarial categories — was valid JSON with correct structure and complete KPI fields.

Evaluation scope note: These results measure in-distribution schema compliance — the test set was generated by the same pipeline as the training data. The 100% score confirms the model has learned to produce correctly formatted, spec-compliant JSON for the template-bounded input distribution. This is analogous to SQL generation models achieving near-perfect scores on Spider test splits while requiring additional evaluation on user-generated queries. Out-of-distribution evaluation on human-written operator intents (e.g., the ORION 100-intent benchmark) is needed to assess real-world generalization and is planned as future work.


Evaluation Results

Per-Layer Breakdown (498 stratified samples, 50 per major layer)

Layer N JSON Valid Struct Correct All KPIs
tmf921 50 100% 100% 100%
intent_3gpp 50 100% 100% 100%
camara 50 100% 100% 100%
a1_policy 50 100% 100% 100%
o1_nrm 50 100% 100% 100%
etsi_zsm 50 100% 100% 100%
tmf921_lifecycle_activate 19 100% 100%
tmf921_lifecycle_modify 20 100% 100%
tmf921_lifecycle_monitor 30 100% 100%
tmf921_lifecycle_report 18 100% 100%
tmf921_lifecycle_resume 24 100% 100%
tmf921_lifecycle_scale 21 100% 100%
tmf921_lifecycle_suspend 15 100% 100%
tmf921_lifecycle_terminate 26 100% 100%
adversarial_ambiguous 10 100% 100%
adversarial_contradictory 8 100% 100%
adversarial_out_of_scope 7 100% 100%

KPI Field Presence (create operations, n=300)

Layer N Latency Reliability DL Thpt UL Thpt Max UEs
tmf921 50 100% 100% 100% 100% 100%
intent_3gpp 50 100% 100% 100% 100% 100%
camara 50 100% 100% 100% 100% 100%
a1_policy 50 100% 100% 100% 100% 100%
o1_nrm 50 100% 100% 100% 100% 100%
etsi_zsm 50 100% 100% 100% 100% 100%

Evaluation Methodology

The evaluation used stratified sampling (50 samples per major layer, all samples for smaller layers) with standard-aware KPI checking that correctly handles how each telecom standard encodes network parameters:

  • TMF921, 3GPP, CAMARA, ETSI ZSM: Direct KPI value matching with int/float tolerance
  • O-RAN A1 Policy: Reliability → Packet Error Rate (PER), latency → Packet Delay Budget (pdb), throughput → GFBR/MFBR
  • O1 NRM (TS 28.541): Structural element presence (rrmPolicyMemberList, operationalState, arfcnDL, bSChannelBwDL)

Full evaluation results are available in eval_v3_results.json.


Model Details

Property Value
Base model Qwen/Qwen3-8B
Method QLoRA (4-bit NF4 quantization + LoRA adapters)
Training dataset nraptisss/TMF921-intent-to-config-augmented (41,815 samples)
Task Natural language → structured JSON configuration translation
Standards covered TMF921, 3GPP TS 28.312, ETSI ZSM 009-1, CAMARA, O-RAN A1, 3GPP TS 28.541
License Apache 2.0

Training Configuration

Parameter Value
Quantization 4-bit NF4 + double quantization
LoRA rank (r) 32
LoRA alpha 64
Target modules all-linear
Effective batch size 32 (per_device=4 × grad_accum=8)
Learning rate 1e-4 (cosine schedule, warmup_ratio=0.1)
Epochs 3
Max sequence length 4,096 tokens
Loss masking assistant_only_loss=True
Precision bf16
Flash attention flash_attention_2
Gradient checkpointing Yes
Estimated VRAM ~26 GB

Usage

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import json

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(base_model, "nraptisss/Qwen3-8B-TMF921-Intent-QLora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

messages = [
    {"role": "system", "content": "You are a TM Forum TMF921-compliant Intent Management system. Given a natural language network intent, output a valid TMF921 Intent Management API v4 JSON object. The response must include the full Intent resource with id, href, name, description, lifecycleState, priority, intentExpression containing IntentExpectation objects with DeliveryExpectation targets, contexts, and relatedParty references. Follow the TMF921 Open API specification and use @type annotations for polymorphic types. Ground all KPI targets in 3GPP TS 22.261 performance requirements."},
    {"role": "user", "content": "Deploy a URLLC slice for remote robotic surgery in Hospital Campus. Requirements: sub-1 ms latency, 99.999% reliability, 100 Mbps DL, 50 Mbps UL, 50 connected devices."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.1, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)

config = json.loads(response)
print(json.dumps(config, indent=2))

System Prompts by Target Layer

Layer System Prompt Summary
tmf921 TMF921 v4 JSON with @type annotations, HATEOAS links, relatedParty, geographic/temporal contexts
intent_3gpp 3GPP TS 28.312 v18.8.0 intent with intentExpectation, S-NSSAI, targets
camara CAMARA NetworkSliceBooking with sliceProfile, areaOfService, duration
a1_policy O-RAN A1 policy with 5QI mapping, PRB quotas, scheduler weights
o1_nrm 3GPP TS 28.541 ManagedElementGNBDUFunctionNRCellDU with RRM policies
etsi_zsm ETSI ZSM 009-1 intent with objectives, constraints, context, fulfillmentRequirements

Full system prompts for each layer are in the training dataset.


Supported Target Layers

6 Specification Layers

Layer Standard Description
tmf921 TM Forum TMF921 v4 Full Intent resource with @type annotations, HATEOAS links
intent_3gpp 3GPP TS 28.312 Rel-18 Intent with intentExpectation and S-NSSAI encoding
camara CAMARA NetworkSliceBooking Slice booking with QoS profile and area of service
a1_policy O-RAN WG2 A1 Policy with 5QI, PRB quotas, scheduler weights
o1_nrm 3GPP TS 28.541 RAN config: ManagedElement → GNBDUFunction → NRCellDU
etsi_zsm ETSI GS ZSM 009-1 Zero-touch intent with fulfillment requirements

8 Lifecycle Operations

activate, modify, monitor, report, scale, suspend, resume, terminate — all following the TMF921 state machine.

3 Adversarial Categories

Category Example Input Expected Output
Ambiguous "Can you create a slice for educational purposes?" CLARIFICATION_REQUIRED
Contradictory "mMTC slice with sub-1ms latency for 1M devices at 10 Gbps each" INTENT_VALIDATION_FAILED
Out-of-scope "Request for a list of local volunteer opportunities" OUT_OF_SCOPE

6 Slice Types

eMBB (SST=1), URLLC (SST=2), mMTC (SST=3), V2X (SST=4), HMTC (SST=5), MPS (SST=5, distinct SD)


Training Data

nraptisss/TMF921-intent-to-config-augmented — 41,815 samples (39,294 train / 2,521 test) covering 6 specification layers, 8 lifecycle operations, 3 adversarial categories, 18 industry sectors, 147 use cases, and 54 geographic regions.

See the dataset card for full documentation of construction methodology and specification grounding.

Training Pipeline

Reproducible pipeline at nraptisss/intent-translation-training with train.py, evaluate_v3.py, and run.sh.


Known Limitations

  1. In-distribution evaluation only: The 100% results are on a test set from the same synthetic generator. Real operator intents are more ambiguous, underspecified, and linguistically diverse. Out-of-distribution evaluation on human-authored intents is required before deployment claims.
  2. Single-turn only: Handles individual intent→config translations. Multi-turn conversations (create → monitor → modify → terminate) were not evaluated.
  3. English only: All training data is in English.
  4. Synthetic training data: Fine-tuned on template + LLM-augmented data, not real operator intents. Real-world intents may be more ambiguous.
  5. HMTC is speculative: SST=5 for HMTC is based on Rel-17 extensions; 6G slice types are not yet standardized.

Citation

@misc{qwen3_8b_tmf921_qlora_2025,
  title     = {Qwen3-8B-TMF921-Intent-QLora: QLoRA Fine-tuned LLM for
               5G/6G Intent-to-Configuration Translation},
  author    = {Raptis, Nikolaos},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/nraptisss/Qwen3-8B-TMF921-Intent-QLora},
  note      = {100\% JSON validity, structure correctness, and KPI accuracy
               on 498 stratified test samples across TMF921, 3GPP TS 28.312,
               ETSI ZSM, CAMARA, O-RAN A1, and 3GPP TS 28.541}
}

References

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nraptisss/Qwen3-8B-TMF921-Intent-QLora

Finetuned
Qwen/Qwen3-8B
Adapter
(1185)
this model

Dataset used to train nraptisss/Qwen3-8B-TMF921-Intent-QLora

Papers for nraptisss/Qwen3-8B-TMF921-Intent-QLora

Evaluation results