PEFT
qlora
sft
trl
qwen3
tmf921
intent-based-networking
network-slicing
rtx-6000-ada
ml-intern
nraptisss's picture
Update ML Intern artifact metadata
825bb04 verified
metadata
license: apache-2.0
tags:
  - qlora
  - sft
  - trl
  - peft
  - qwen3
  - tmf921
  - intent-based-networking
  - network-slicing
  - rtx-6000-ada
  - ml-intern
base_model:
  - Qwen/Qwen3-8B
datasets:
  - nraptisss/TMF921-intent-to-config-research-sota

TMF921 Intent-to-Config Training + Evaluation

Training and evaluation repo for nraptisss/TMF921-intent-to-config-research-sota on a single RTX 6000 Ada 48/50GB server.

The default recipe is Qwen3-8B + QLoRA NF4 + TRL SFTTrainer + PEFT LoRA.

Why this recipe

  • Dataset rows were audited with Qwen/Qwen3-8B chat-template tokenization.
  • Source max length: 1,316 tokens, p99: 1,300, so max_length=2048 is safe.
  • QLoRA NF4 + double quant follows the QLoRA recipe for fitting large models on one 48GB-class GPU.
  • LoRA uses target_modules="all-linear", recommended for QLoRA-style training.
  • assistant_only_loss=True trains only the JSON/config response tokens.
  • Evaluation is split by in-distribution and OOD splits; do not report only a single merged score.

Hardware target

Recommended server:

  • GPU: NVIDIA RTX 6000 Ada, 48GB/50GB VRAM
  • RAM: 64GB+
  • Disk: 200GB+ free
  • CUDA-compatible PyTorch

Default effective batch size:

per_device_train_batch_size = 2
gradient_accumulation_steps = 8
effective batch size = 16
max_length = 2048

If OOM occurs, preserve the effective batch size by changing:

per_device_train_batch_size: 1
gradient_accumulation_steps: 16

Do not reduce max_length unless you intentionally want a different training task.

Quick start with nohup, unique run dirs, and resumable checkpoints

git clone https://huggingface.co/nraptisss/tmf921-intent-training
cd tmf921-intent-training

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
bash scripts/install_rtx6000ada.sh
python scripts/check_gpu.py

export HF_TOKEN=hf_...
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH="$PWD/src"
export TOKENIZERS_PARALLELISM=false

bash scripts/nohup_new_run.sh

Monitor:

RUN_DIR=runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS
bash scripts/status_run.sh "$RUN_DIR"
tail -f "$RUN_DIR/logs/train.log"
watch -n 2 nvidia-smi

Resume:

bash scripts/nohup_resume.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS

Evaluate:

bash scripts/nohup_eval.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS

Configs

  • configs/rtx6000ada_qwen3_8b_qlora.yaml — recommended stage-1 config
  • configs/rtx6000ada_qwen3_14b_qlora_experimental.yaml — experimental 14B config
  • configs/stage2_weak_layer_qwen3_8b.yaml — diagnostic weak-layer continuation config

Evaluation

Raw evaluator:

python scripts/evaluate_model.py \
  --model Qwen/Qwen3-8B \
  --adapter outputs/qwen3-8b-tmf921-qlora \
  --dataset nraptisss/TMF921-intent-to-config-research-sota \
  --output_dir outputs/qwen3-8b-tmf921-qlora/eval \
  --load_in_4bit

Normalize existing predictions:

python scripts/normalize_eval_metrics.py \
  --eval_dir outputs/qwen3-8b-tmf921-qlora/eval

Metrics:

  • JSON parse rate
  • canonical JSON exact match
  • field precision / recall / F1
  • normalized field precision / recall / F1
  • normalized key precision / recall / F1
  • slice/SST diagnostic pass
  • KPI text-presence diagnostic pass
  • adversarial status pass
  • stratified metrics by target_layer, slice_type, and lifecycle_operation

Merge adapter for deployment/evaluation

python scripts/merge_adapter.py \
  --base_model Qwen/Qwen3-8B \
  --adapter outputs/qwen3-8b-tmf921-qlora \
  --output_dir outputs/qwen3-8b-tmf921-merged

Stage 2 weak-layer continuation

Stage 2 was implemented and tested as a diagnostic experiment. It is not promoted as the main model because it did not materially improve O1/A1 and slightly regressed adversarial performance.

Run if needed:

bash scripts/nohup_stage2_weak.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS

Results packaging and qualitative failure analysis

After completing stage-1 and stage-2 evaluation plus normalization, package publication artifacts with:

export PYTHONPATH="$PWD/src"

python scripts/package_results.py \
  --stage1_eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \
  --stage2_eval_dir runs/stage2-weak-20260505-080040/eval \
  --output_dir results

This writes:

results/stage1_raw_metrics.json
results/stage1_normalized_metrics.json
results/stage2_raw_metrics.json
results/stage2_normalized_metrics.json
results/metrics_summary.json
results/stage1_vs_stage2_comparison.md

Generate qualitative success/failure examples for the paper with:

python scripts/sample_failure_examples.py \
  --eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \
  --output_dir analysis/stage1_examples

Optionally also sample stage-2 examples:

python scripts/sample_failure_examples.py \
  --eval_dir runs/stage2-weak-20260505-080040/eval \
  --output_dir analysis/stage2_examples

The example sampler writes:

analysis/*/failure_examples.md
analysis/*/failure_examples.json

These artifacts are intended for paper tables, qualitative error analysis, and reproducibility appendices.

Scientific reporting protocol

For research papers/reports, report at least:

  1. validation loss,
  2. test_in_distribution metrics,
  3. test_template_ood metrics,
  4. test_use_case_ood metrics,
  5. test_sector_ood metrics,
  6. test_adversarial metrics,
  7. per-target-layer field F1,
  8. normalized field/key F1,
  9. JSON parse rate,
  10. rare-class metrics for lifecycle operations and adversarial categories.

Do not claim production standards compliance from JSON validity alone. Official TMF921/3GPP/ETSI/CAMARA/O-RAN validators are still needed for schema-level certification.

Files

configs/
scripts/
src/tmf921_train/
PROJECT_JOURNAL.md
requirements.txt

References

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'nraptisss/tmf921-intent-training'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.