--- license: apache-2.0 tags: - qlora - sft - trl - peft - qwen3 - tmf921 - intent-based-networking - network-slicing - rtx-6000-ada - ml-intern base_model: - Qwen/Qwen3-8B datasets: - nraptisss/TMF921-intent-to-config-research-sota --- # TMF921 Intent-to-Config Training + Evaluation Training and evaluation repo for [`nraptisss/TMF921-intent-to-config-research-sota`](https://huggingface.co/datasets/nraptisss/TMF921-intent-to-config-research-sota) on a single **RTX 6000 Ada 48/50GB** server. The default recipe is **Qwen3-8B + QLoRA NF4 + TRL SFTTrainer + PEFT LoRA**. ## Why this recipe - Dataset rows were audited with `Qwen/Qwen3-8B` chat-template tokenization. - Source max length: **1,316 tokens**, p99: **1,300**, so `max_length=2048` is safe. - QLoRA NF4 + double quant follows the QLoRA recipe for fitting large models on one 48GB-class GPU. - LoRA uses `target_modules="all-linear"`, recommended for QLoRA-style training. - `assistant_only_loss=True` trains only the JSON/config response tokens. - Evaluation is split by in-distribution and OOD splits; do not report only a single merged score. ## Hardware target Recommended server: - GPU: NVIDIA RTX 6000 Ada, 48GB/50GB VRAM - RAM: 64GB+ - Disk: 200GB+ free - CUDA-compatible PyTorch Default effective batch size: ```text per_device_train_batch_size = 2 gradient_accumulation_steps = 8 effective batch size = 16 max_length = 2048 ``` If OOM occurs, preserve the effective batch size by changing: ```yaml per_device_train_batch_size: 1 gradient_accumulation_steps: 16 ``` Do **not** reduce `max_length` unless you intentionally want a different training task. ## Quick start with nohup, unique run dirs, and resumable checkpoints ```bash git clone https://huggingface.co/nraptisss/tmf921-intent-training cd tmf921-intent-training python -m venv .venv source .venv/bin/activate python -m pip install -U pip bash scripts/install_rtx6000ada.sh python scripts/check_gpu.py export HF_TOKEN=hf_... export CUDA_VISIBLE_DEVICES=0 export PYTHONPATH="$PWD/src" export TOKENIZERS_PARALLELISM=false bash scripts/nohup_new_run.sh ``` Monitor: ```bash RUN_DIR=runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS bash scripts/status_run.sh "$RUN_DIR" tail -f "$RUN_DIR/logs/train.log" watch -n 2 nvidia-smi ``` Resume: ```bash bash scripts/nohup_resume.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS ``` Evaluate: ```bash bash scripts/nohup_eval.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS ``` ## Configs - `configs/rtx6000ada_qwen3_8b_qlora.yaml` — recommended stage-1 config - `configs/rtx6000ada_qwen3_14b_qlora_experimental.yaml` — experimental 14B config - `configs/stage2_weak_layer_qwen3_8b.yaml` — diagnostic weak-layer continuation config ## Evaluation Raw evaluator: ```bash python scripts/evaluate_model.py \ --model Qwen/Qwen3-8B \ --adapter outputs/qwen3-8b-tmf921-qlora \ --dataset nraptisss/TMF921-intent-to-config-research-sota \ --output_dir outputs/qwen3-8b-tmf921-qlora/eval \ --load_in_4bit ``` Normalize existing predictions: ```bash python scripts/normalize_eval_metrics.py \ --eval_dir outputs/qwen3-8b-tmf921-qlora/eval ``` Metrics: - JSON parse rate - canonical JSON exact match - field precision / recall / F1 - normalized field precision / recall / F1 - normalized key precision / recall / F1 - slice/SST diagnostic pass - KPI text-presence diagnostic pass - adversarial status pass - stratified metrics by `target_layer`, `slice_type`, and `lifecycle_operation` ## Merge adapter for deployment/evaluation ```bash python scripts/merge_adapter.py \ --base_model Qwen/Qwen3-8B \ --adapter outputs/qwen3-8b-tmf921-qlora \ --output_dir outputs/qwen3-8b-tmf921-merged ``` ## Stage 2 weak-layer continuation Stage 2 was implemented and tested as a diagnostic experiment. It is **not promoted** as the main model because it did not materially improve O1/A1 and slightly regressed adversarial performance. Run if needed: ```bash bash scripts/nohup_stage2_weak.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS ``` ## Results packaging and qualitative failure analysis After completing stage-1 and stage-2 evaluation plus normalization, package publication artifacts with: ```bash export PYTHONPATH="$PWD/src" python scripts/package_results.py \ --stage1_eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \ --stage2_eval_dir runs/stage2-weak-20260505-080040/eval \ --output_dir results ``` This writes: ```text results/stage1_raw_metrics.json results/stage1_normalized_metrics.json results/stage2_raw_metrics.json results/stage2_normalized_metrics.json results/metrics_summary.json results/stage1_vs_stage2_comparison.md ``` Generate qualitative success/failure examples for the paper with: ```bash python scripts/sample_failure_examples.py \ --eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \ --output_dir analysis/stage1_examples ``` Optionally also sample stage-2 examples: ```bash python scripts/sample_failure_examples.py \ --eval_dir runs/stage2-weak-20260505-080040/eval \ --output_dir analysis/stage2_examples ``` The example sampler writes: ```text analysis/*/failure_examples.md analysis/*/failure_examples.json ``` These artifacts are intended for paper tables, qualitative error analysis, and reproducibility appendices. ## Scientific reporting protocol For research papers/reports, report at least: 1. validation loss, 2. `test_in_distribution` metrics, 3. `test_template_ood` metrics, 4. `test_use_case_ood` metrics, 5. `test_sector_ood` metrics, 6. `test_adversarial` metrics, 7. per-target-layer field F1, 8. normalized field/key F1, 9. JSON parse rate, 10. rare-class metrics for lifecycle operations and adversarial categories. Do **not** claim production standards compliance from JSON validity alone. Official TMF921/3GPP/ETSI/CAMARA/O-RAN validators are still needed for schema-level certification. ## Files ```text configs/ scripts/ src/tmf921_train/ PROJECT_JOURNAL.md requirements.txt ``` ## References - QLoRA: https://huggingface.co/papers/2305.14314 - LoRA: https://huggingface.co/papers/2106.09685 - TRL SFTTrainer docs: https://huggingface.co/docs/trl/sft_trainer - TRL PEFT integration: https://huggingface.co/docs/trl/peft_integration - Source dataset: https://huggingface.co/datasets/nraptisss/TMF921-intent-to-config-research-sota ## Generated by ML Intern This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. - Try ML Intern: https://smolagents-ml-intern.hf.space - Source code: https://github.com/huggingface/ml-intern ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = 'nraptisss/tmf921-intent-training' tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) ``` For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.