| --- |
| license: apache-2.0 |
| tags: |
| - qlora |
| - sft |
| - trl |
| - peft |
| - qwen3 |
| - tmf921 |
| - intent-based-networking |
| - network-slicing |
| - rtx-6000-ada |
| - ml-intern |
| base_model: |
| - Qwen/Qwen3-8B |
| datasets: |
| - nraptisss/TMF921-intent-to-config-research-sota |
| --- |
| |
| # TMF921 Intent-to-Config Training + Evaluation |
|
|
| Training and evaluation repo for [`nraptisss/TMF921-intent-to-config-research-sota`](https://huggingface.co/datasets/nraptisss/TMF921-intent-to-config-research-sota) on a single **RTX 6000 Ada 48/50GB** server. |
|
|
| The default recipe is **Qwen3-8B + QLoRA NF4 + TRL SFTTrainer + PEFT LoRA**. |
|
|
| ## Why this recipe |
|
|
| - Dataset rows were audited with `Qwen/Qwen3-8B` chat-template tokenization. |
| - Source max length: **1,316 tokens**, p99: **1,300**, so `max_length=2048` is safe. |
| - QLoRA NF4 + double quant follows the QLoRA recipe for fitting large models on one 48GB-class GPU. |
| - LoRA uses `target_modules="all-linear"`, recommended for QLoRA-style training. |
| - `assistant_only_loss=True` trains only the JSON/config response tokens. |
| - Evaluation is split by in-distribution and OOD splits; do not report only a single merged score. |
|
|
| ## Hardware target |
|
|
| Recommended server: |
|
|
| - GPU: NVIDIA RTX 6000 Ada, 48GB/50GB VRAM |
| - RAM: 64GB+ |
| - Disk: 200GB+ free |
| - CUDA-compatible PyTorch |
|
|
| Default effective batch size: |
|
|
| ```text |
| per_device_train_batch_size = 2 |
| gradient_accumulation_steps = 8 |
| effective batch size = 16 |
| max_length = 2048 |
| ``` |
|
|
| If OOM occurs, preserve the effective batch size by changing: |
|
|
| ```yaml |
| per_device_train_batch_size: 1 |
| gradient_accumulation_steps: 16 |
| ``` |
|
|
| Do **not** reduce `max_length` unless you intentionally want a different training task. |
|
|
| ## Quick start with nohup, unique run dirs, and resumable checkpoints |
|
|
| ```bash |
| git clone https://huggingface.co/nraptisss/tmf921-intent-training |
| cd tmf921-intent-training |
| |
| python -m venv .venv |
| source .venv/bin/activate |
| python -m pip install -U pip |
| bash scripts/install_rtx6000ada.sh |
| python scripts/check_gpu.py |
| |
| export HF_TOKEN=hf_... |
| export CUDA_VISIBLE_DEVICES=0 |
| export PYTHONPATH="$PWD/src" |
| export TOKENIZERS_PARALLELISM=false |
| |
| bash scripts/nohup_new_run.sh |
| ``` |
|
|
| Monitor: |
|
|
| ```bash |
| RUN_DIR=runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS |
| bash scripts/status_run.sh "$RUN_DIR" |
| tail -f "$RUN_DIR/logs/train.log" |
| watch -n 2 nvidia-smi |
| ``` |
|
|
| Resume: |
|
|
| ```bash |
| bash scripts/nohup_resume.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS |
| ``` |
|
|
| Evaluate: |
|
|
| ```bash |
| bash scripts/nohup_eval.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS |
| ``` |
|
|
| ## Configs |
|
|
| - `configs/rtx6000ada_qwen3_8b_qlora.yaml` — recommended stage-1 config |
| - `configs/rtx6000ada_qwen3_14b_qlora_experimental.yaml` — experimental 14B config |
| - `configs/stage2_weak_layer_qwen3_8b.yaml` — diagnostic weak-layer continuation config |
|
|
| ## Evaluation |
|
|
| Raw evaluator: |
|
|
| ```bash |
| python scripts/evaluate_model.py \ |
| --model Qwen/Qwen3-8B \ |
| --adapter outputs/qwen3-8b-tmf921-qlora \ |
| --dataset nraptisss/TMF921-intent-to-config-research-sota \ |
| --output_dir outputs/qwen3-8b-tmf921-qlora/eval \ |
| --load_in_4bit |
| ``` |
|
|
| Normalize existing predictions: |
|
|
| ```bash |
| python scripts/normalize_eval_metrics.py \ |
| --eval_dir outputs/qwen3-8b-tmf921-qlora/eval |
| ``` |
|
|
| Metrics: |
|
|
| - JSON parse rate |
| - canonical JSON exact match |
| - field precision / recall / F1 |
| - normalized field precision / recall / F1 |
| - normalized key precision / recall / F1 |
| - slice/SST diagnostic pass |
| - KPI text-presence diagnostic pass |
| - adversarial status pass |
| - stratified metrics by `target_layer`, `slice_type`, and `lifecycle_operation` |
|
|
| ## Merge adapter for deployment/evaluation |
|
|
| ```bash |
| python scripts/merge_adapter.py \ |
| --base_model Qwen/Qwen3-8B \ |
| --adapter outputs/qwen3-8b-tmf921-qlora \ |
| --output_dir outputs/qwen3-8b-tmf921-merged |
| ``` |
|
|
| ## Stage 2 weak-layer continuation |
|
|
| Stage 2 was implemented and tested as a diagnostic experiment. It is **not promoted** as the main model because it did not materially improve O1/A1 and slightly regressed adversarial performance. |
|
|
| Run if needed: |
|
|
| ```bash |
| bash scripts/nohup_stage2_weak.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS |
| ``` |
|
|
| ## Results packaging and qualitative failure analysis |
|
|
| After completing stage-1 and stage-2 evaluation plus normalization, package publication artifacts with: |
|
|
| ```bash |
| export PYTHONPATH="$PWD/src" |
| |
| python scripts/package_results.py \ |
| --stage1_eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \ |
| --stage2_eval_dir runs/stage2-weak-20260505-080040/eval \ |
| --output_dir results |
| ``` |
|
|
| This writes: |
|
|
| ```text |
| results/stage1_raw_metrics.json |
| results/stage1_normalized_metrics.json |
| results/stage2_raw_metrics.json |
| results/stage2_normalized_metrics.json |
| results/metrics_summary.json |
| results/stage1_vs_stage2_comparison.md |
| ``` |
|
|
| Generate qualitative success/failure examples for the paper with: |
|
|
| ```bash |
| python scripts/sample_failure_examples.py \ |
| --eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \ |
| --output_dir analysis/stage1_examples |
| ``` |
|
|
| Optionally also sample stage-2 examples: |
|
|
| ```bash |
| python scripts/sample_failure_examples.py \ |
| --eval_dir runs/stage2-weak-20260505-080040/eval \ |
| --output_dir analysis/stage2_examples |
| ``` |
|
|
| The example sampler writes: |
|
|
| ```text |
| analysis/*/failure_examples.md |
| analysis/*/failure_examples.json |
| ``` |
|
|
| These artifacts are intended for paper tables, qualitative error analysis, and reproducibility appendices. |
|
|
| ## Scientific reporting protocol |
|
|
| For research papers/reports, report at least: |
|
|
| 1. validation loss, |
| 2. `test_in_distribution` metrics, |
| 3. `test_template_ood` metrics, |
| 4. `test_use_case_ood` metrics, |
| 5. `test_sector_ood` metrics, |
| 6. `test_adversarial` metrics, |
| 7. per-target-layer field F1, |
| 8. normalized field/key F1, |
| 9. JSON parse rate, |
| 10. rare-class metrics for lifecycle operations and adversarial categories. |
|
|
| Do **not** claim production standards compliance from JSON validity alone. Official TMF921/3GPP/ETSI/CAMARA/O-RAN validators are still needed for schema-level certification. |
|
|
| ## Files |
|
|
| ```text |
| configs/ |
| scripts/ |
| src/tmf921_train/ |
| PROJECT_JOURNAL.md |
| requirements.txt |
| ``` |
|
|
| ## References |
|
|
| - QLoRA: https://huggingface.co/papers/2305.14314 |
| - LoRA: https://huggingface.co/papers/2106.09685 |
| - TRL SFTTrainer docs: https://huggingface.co/docs/trl/sft_trainer |
| - TRL PEFT integration: https://huggingface.co/docs/trl/peft_integration |
| - Source dataset: https://huggingface.co/datasets/nraptisss/TMF921-intent-to-config-research-sota |
|
|
| <!-- ml-intern-provenance --> |
| ## Generated by ML Intern |
|
|
| This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. |
|
|
| - Try ML Intern: https://smolagents-ml-intern.hf.space |
| - Source code: https://github.com/huggingface/ml-intern |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = 'nraptisss/tmf921-intent-training' |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained(model_id) |
| ``` |
|
|
| For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class. |
|
|