PEFT
qlora
sft
trl
qwen3
tmf921
intent-based-networking
network-slicing
rtx-6000-ada
ml-intern
nraptisss commited on
Commit
fcc7c83
·
verified ·
1 Parent(s): 23140d4

Restore README with results packaging instructions

Browse files
Files changed (1) hide show
  1. README.md +242 -0
README.md CHANGED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - qlora
5
+ - sft
6
+ - trl
7
+ - peft
8
+ - qwen3
9
+ - tmf921
10
+ - intent-based-networking
11
+ - network-slicing
12
+ - rtx-6000-ada
13
+ - ml-intern
14
+ base_model:
15
+ - Qwen/Qwen3-8B
16
+ datasets:
17
+ - nraptisss/TMF921-intent-to-config-research-sota
18
+ ---
19
+
20
+ # TMF921 Intent-to-Config Training + Evaluation
21
+
22
+ Training and evaluation repo for [`nraptisss/TMF921-intent-to-config-research-sota`](https://huggingface.co/datasets/nraptisss/TMF921-intent-to-config-research-sota) on a single **RTX 6000 Ada 48/50GB** server.
23
+
24
+ The default recipe is **Qwen3-8B + QLoRA NF4 + TRL SFTTrainer + PEFT LoRA**.
25
+
26
+ ## Why this recipe
27
+
28
+ - Dataset rows were audited with `Qwen/Qwen3-8B` chat-template tokenization.
29
+ - Source max length: **1,316 tokens**, p99: **1,300**, so `max_length=2048` is safe.
30
+ - QLoRA NF4 + double quant follows the QLoRA recipe for fitting large models on one 48GB-class GPU.
31
+ - LoRA uses `target_modules="all-linear"`, recommended for QLoRA-style training.
32
+ - `assistant_only_loss=True` trains only the JSON/config response tokens.
33
+ - Evaluation is split by in-distribution and OOD splits; do not report only a single merged score.
34
+
35
+ ## Hardware target
36
+
37
+ Recommended server:
38
+
39
+ - GPU: NVIDIA RTX 6000 Ada, 48GB/50GB VRAM
40
+ - RAM: 64GB+
41
+ - Disk: 200GB+ free
42
+ - CUDA-compatible PyTorch
43
+
44
+ Default effective batch size:
45
+
46
+ ```text
47
+ per_device_train_batch_size = 2
48
+ gradient_accumulation_steps = 8
49
+ effective batch size = 16
50
+ max_length = 2048
51
+ ```
52
+
53
+ If OOM occurs, preserve the effective batch size by changing:
54
+
55
+ ```yaml
56
+ per_device_train_batch_size: 1
57
+ gradient_accumulation_steps: 16
58
+ ```
59
+
60
+ Do **not** reduce `max_length` unless you intentionally want a different training task.
61
+
62
+ ## Quick start with nohup, unique run dirs, and resumable checkpoints
63
+
64
+ ```bash
65
+ git clone https://huggingface.co/nraptisss/tmf921-intent-training
66
+ cd tmf921-intent-training
67
+
68
+ python -m venv .venv
69
+ source .venv/bin/activate
70
+ python -m pip install -U pip
71
+ bash scripts/install_rtx6000ada.sh
72
+ python scripts/check_gpu.py
73
+
74
+ export HF_TOKEN=hf_...
75
+ export CUDA_VISIBLE_DEVICES=0
76
+ export PYTHONPATH="$PWD/src"
77
+ export TOKENIZERS_PARALLELISM=false
78
+
79
+ bash scripts/nohup_new_run.sh
80
+ ```
81
+
82
+ Monitor:
83
+
84
+ ```bash
85
+ RUN_DIR=runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS
86
+ bash scripts/status_run.sh "$RUN_DIR"
87
+ tail -f "$RUN_DIR/logs/train.log"
88
+ watch -n 2 nvidia-smi
89
+ ```
90
+
91
+ Resume:
92
+
93
+ ```bash
94
+ bash scripts/nohup_resume.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS
95
+ ```
96
+
97
+ Evaluate:
98
+
99
+ ```bash
100
+ bash scripts/nohup_eval.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS
101
+ ```
102
+
103
+ ## Configs
104
+
105
+ - `configs/rtx6000ada_qwen3_8b_qlora.yaml` — recommended stage-1 config
106
+ - `configs/rtx6000ada_qwen3_14b_qlora_experimental.yaml` — experimental 14B config
107
+ - `configs/stage2_weak_layer_qwen3_8b.yaml` — diagnostic weak-layer continuation config
108
+
109
+ ## Evaluation
110
+
111
+ Raw evaluator:
112
+
113
+ ```bash
114
+ python scripts/evaluate_model.py \
115
+ --model Qwen/Qwen3-8B \
116
+ --adapter outputs/qwen3-8b-tmf921-qlora \
117
+ --dataset nraptisss/TMF921-intent-to-config-research-sota \
118
+ --output_dir outputs/qwen3-8b-tmf921-qlora/eval \
119
+ --load_in_4bit
120
+ ```
121
+
122
+ Normalize existing predictions:
123
+
124
+ ```bash
125
+ python scripts/normalize_eval_metrics.py \
126
+ --eval_dir outputs/qwen3-8b-tmf921-qlora/eval
127
+ ```
128
+
129
+ Metrics:
130
+
131
+ - JSON parse rate
132
+ - canonical JSON exact match
133
+ - field precision / recall / F1
134
+ - normalized field precision / recall / F1
135
+ - normalized key precision / recall / F1
136
+ - slice/SST diagnostic pass
137
+ - KPI text-presence diagnostic pass
138
+ - adversarial status pass
139
+ - stratified metrics by `target_layer`, `slice_type`, and `lifecycle_operation`
140
+
141
+ ## Merge adapter for deployment/evaluation
142
+
143
+ ```bash
144
+ python scripts/merge_adapter.py \
145
+ --base_model Qwen/Qwen3-8B \
146
+ --adapter outputs/qwen3-8b-tmf921-qlora \
147
+ --output_dir outputs/qwen3-8b-tmf921-merged
148
+ ```
149
+
150
+ ## Stage 2 weak-layer continuation
151
+
152
+ Stage 2 was implemented and tested as a diagnostic experiment. It is **not promoted** as the main model because it did not materially improve O1/A1 and slightly regressed adversarial performance.
153
+
154
+ Run if needed:
155
+
156
+ ```bash
157
+ bash scripts/nohup_stage2_weak.sh runs/qwen3-8b-qlora-YYYYMMDD-HHMMSS
158
+ ```
159
+
160
+ ## Results packaging and qualitative failure analysis
161
+
162
+ After completing stage-1 and stage-2 evaluation plus normalization, package publication artifacts with:
163
+
164
+ ```bash
165
+ export PYTHONPATH="$PWD/src"
166
+
167
+ python scripts/package_results.py \
168
+ --stage1_eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \
169
+ --stage2_eval_dir runs/stage2-weak-20260505-080040/eval \
170
+ --output_dir results
171
+ ```
172
+
173
+ This writes:
174
+
175
+ ```text
176
+ results/stage1_raw_metrics.json
177
+ results/stage1_normalized_metrics.json
178
+ results/stage2_raw_metrics.json
179
+ results/stage2_normalized_metrics.json
180
+ results/metrics_summary.json
181
+ results/stage1_vs_stage2_comparison.md
182
+ ```
183
+
184
+ Generate qualitative success/failure examples for the paper with:
185
+
186
+ ```bash
187
+ python scripts/sample_failure_examples.py \
188
+ --eval_dir runs/qwen3-8b-qlora-20260501-083834/eval_merged \
189
+ --output_dir analysis/stage1_examples
190
+ ```
191
+
192
+ Optionally also sample stage-2 examples:
193
+
194
+ ```bash
195
+ python scripts/sample_failure_examples.py \
196
+ --eval_dir runs/stage2-weak-20260505-080040/eval \
197
+ --output_dir analysis/stage2_examples
198
+ ```
199
+
200
+ The example sampler writes:
201
+
202
+ ```text
203
+ analysis/*/failure_examples.md
204
+ analysis/*/failure_examples.json
205
+ ```
206
+
207
+ These artifacts are intended for paper tables, qualitative error analysis, and reproducibility appendices.
208
+
209
+ ## Scientific reporting protocol
210
+
211
+ For research papers/reports, report at least:
212
+
213
+ 1. validation loss,
214
+ 2. `test_in_distribution` metrics,
215
+ 3. `test_template_ood` metrics,
216
+ 4. `test_use_case_ood` metrics,
217
+ 5. `test_sector_ood` metrics,
218
+ 6. `test_adversarial` metrics,
219
+ 7. per-target-layer field F1,
220
+ 8. normalized field/key F1,
221
+ 9. JSON parse rate,
222
+ 10. rare-class metrics for lifecycle operations and adversarial categories.
223
+
224
+ Do **not** claim production standards compliance from JSON validity alone. Official TMF921/3GPP/ETSI/CAMARA/O-RAN validators are still needed for schema-level certification.
225
+
226
+ ## Files
227
+
228
+ ```text
229
+ configs/
230
+ scripts/
231
+ src/tmf921_train/
232
+ PROJECT_JOURNAL.md
233
+ requirements.txt
234
+ ```
235
+
236
+ ## References
237
+
238
+ - QLoRA: https://huggingface.co/papers/2305.14314
239
+ - LoRA: https://huggingface.co/papers/2106.09685
240
+ - TRL SFTTrainer docs: https://huggingface.co/docs/trl/sft_trainer
241
+ - TRL PEFT integration: https://huggingface.co/docs/trl/peft_integration
242
+ - Source dataset: https://huggingface.co/datasets/nraptisss/TMF921-intent-to-config-research-sota