nraptisss
/

tmf921-intent-training

+---
+license: apache-2.0
+tags:
+- qlora
+- sft
+- trl
+- peft
+- qwen3
+- tmf921
+- intent-based-networking
+- network-slicing
+- rtx-6000-ada
+- ml-intern
+base_model:
+- Qwen/Qwen3-8B
+datasets:
+- nraptisss/TMF921-intent-to-config-research-sota
+---
+# TMF921 Intent-to-Config Training + Evaluation
+Training and evaluation repo for [`nraptisss/TMF921-intent-to-config-research-sota`](https://huggingface.co/datasets/nraptisss/TMF921-intent-to-config-research-sota) on a single **RTX 6000 Ada 48/50GB** server.
+The default recipe is **Qwen3-8B + QLoRA NF4 + TRL SFTTrainer + PEFT LoRA**.
+## Why this recipe
+- Dataset rows were audited with `Qwen/Qwen3-8B` chat-template tokenization.
+- Source max length: **1,316 tokens**, p99: **1,300**, so `max_length=2048` is safe.
+- QLoRA NF4 + double quant follows the QLoRA recipe for fitting large models on one 48GB-class GPU.
+- LoRA uses `target_modules="all-linear"`, recommended for QLoRA-style training.
+- `assistant_only_loss=True` trains only the JSON/config response tokens.
+- Evaluation is split by in-distribution and OOD splits; do not report only a single merged score.
+## Hardware target
+Recommended server:
+- GPU: NVIDIA RTX 6000 Ada, 48GB/50GB VRAM
+- RAM: 64GB+
+- Disk: 200GB+ free
+- CUDA-compatible PyTorch
+Default effective batch size:
+```text
+per_device_train_batch_size = 2
+gradient_accumulation_steps = 8
+effective batch size = 16
+max_length = 2048
+```
+If OOM occurs, preserve the effective batch size by changing:
+```yaml
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 16
+```
+Do **not** reduce `max_length` unless you intentionally want a different training task.
+## Quick start with nohup, unique run dirs, and resumable checkpoints
+```bash
+git clone https://huggingface.co/nraptisss/tmf921-intent-training
+cd tmf921-intent-training
+python -m venv .venv
+source .venv/bin/activate
+python -m pip install -U pip
+bash scripts/install_rtx6000ada.sh
+python scripts/check_gpu.py
+export HF_TOKEN=hf_...
+export CUDA_VISIBLE_DEVICES=0
+export PYTHONPATH="$PWD/src"
+export TOKENIZERS_PARALLELISM=false
+bash scripts/nohup_new_run.sh
+```
+Monitor:
+```bash
+RUN_DIR=runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS
+bash scripts/status_run.sh "$RUN_DIR"
+tail -f "$RUN_DIR/logs/train.log"
+watch -n 2 nvidia-smi
+```
+Resume:
+```bash
+bash scripts/nohup_resume.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS
+```
+Evaluate:
+```bash
+bash scripts/nohup_eval.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS
+```
+## Configs
+- `configs/rtx6000ada_qwen3_8b_qlora.yaml` — recommended stage-1 config
+- `configs/rtx6000ada_qwen3_14b_qlora_experimental.yaml` — experimental 14B config
+- `configs/stage2_weak_layer_qwen3_8b.yaml` — diagnostic weak-layer continuation config
+## Evaluation
+Raw evaluator:
+```bash
+python scripts/evaluate_model.py \
+  --model Qwen/Qwen3-8B \
+  --adapter outputs/qwen3-8b-tmf921-qlora \
+  --dataset nraptisss/TMF921-intent-to-config-research-sota \
+  --output_dir outputs/qwen3-8b-tmf921-qlora/eval \
+  --load_in_4bit
+```
+Normalize existing predictions:
+```bash
+python scripts/normalize_eval_metrics.py \
+  --eval_dir outputs/qwen3-8b-tmf921-qlora/eval
+```
+Metrics:
+- JSON parse rate
+- canonical JSON exact match
+- field precision / recall / F1
+- normalized field precision / recall / F1
+- normalized key precision / recall / F1
+- slice/SST diagnostic pass
+- KPI text-presence diagnostic pass
+- adversarial status pass
+- stratified metrics by `target_layer`, `slice_type`, and `lifecycle_operation`
+## Merge adapter for deployment/evaluation
+```bash
+python scripts/merge_adapter.py \
+  --base_model Qwen/Qwen3-8B \
+  --adapter outputs/qwen3-8b-tmf921-qlora \
+  --output_dir outputs/qwen3-8b-tmf921-merged
+```
+## Stage 2 weak-layer continuation
+Stage 2 was implemented and tested as a diagnostic experiment. It is **not promoted** as the main model because it did not materially improve O1/A1 and slightly regressed adversarial performance.
+Run if needed:
+```bash
+bash scripts/nohup_stage2_weak.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS
+```
+## Results packaging and qualitative failure analysis
+After completing stage-1 and stage-2 evaluation plus normalization, package publication artifacts with:
+```bash
+export PYTHONPATH="$PWD/src"
+python scripts/package_results.py \
+  --stage1_eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \
+  --stage2_eval_dir runs/stage2-weak-20260505-080040/eval \
+  --output_dir results
+```
+This writes:
+```text
+results/stage1_raw_metrics.json
+results/stage1_normalized_metrics.json
+results/stage2_raw_metrics.json
+results/stage2_normalized_metrics.json
+results/metrics_summary.json
+results/stage1_vs_stage2_comparison.md
+```
+Generate qualitative success/failure examples for the paper with:
+```bash
+python scripts/sample_failure_examples.py \
+  --eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \
+  --output_dir analysis/stage1_examples
+```
+Optionally also sample stage-2 examples:
+```bash
+python scripts/sample_failure_examples.py \
+  --eval_dir runs/stage2-weak-20260505-080040/eval \
+  --output_dir analysis/stage2_examples
+```
+The example sampler writes:
+```text
+analysis/*/failure_examples.md
+analysis/*/failure_examples.json
+```
+These artifacts are intended for paper tables, qualitative error analysis, and reproducibility appendices.
+## Scientific reporting protocol
+For research papers/reports, report at least:
+1. validation loss,
+2. `test_in_distribution` metrics,
+3. `test_template_ood` metrics,
+4. `test_use_case_ood` metrics,
+5. `test_sector_ood` metrics,
+6. `test_adversarial` metrics,
+7. per-target-layer field F1,
+8. normalized field/key F1,
+9. JSON parse rate,
+10. rare-class metrics for lifecycle operations and adversarial categories.
+Do **not** claim production standards compliance from JSON validity alone. Official TMF921/3GPP/ETSI/CAMARA/O-RAN validators are still needed for schema-level certification.
+## Files
+```text
+configs/
+scripts/
+src/tmf921_train/
+PROJECT_JOURNAL.md
+requirements.txt
+```
+## References
+- QLoRA: https://huggingface.co/papers/2305.14314
+- LoRA: https://huggingface.co/papers/2106.09685
+- TRL SFTTrainer docs: https://huggingface.co/docs/trl/sft_trainer
+- TRL PEFT integration: https://huggingface.co/docs/trl/peft_integration
+- Source dataset: https://huggingface.co/datasets/nraptisss/TMF921-intent-to-config-research-sota