File size: 3,945 Bytes
a50bda6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | {
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Telecom Intent-to-Config Pipeline\n",
"\n",
"Fine-tune Qwen2.5-7B on your TMF921 intent dataset using QLoRA on Kaggle T4x2.\n",
"\n",
"## Step 1: Install Dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"!pip install -q transformers trl peft accelerate bitsandbytes datasets liger-kernel sentence-transformers huggingface-hub\n",
"!pip install -q --upgrade transformers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Login to Hugging Face\n",
"\n",
"Get your token from https://huggingface.co/settings/tokens (needs `write` access)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"from huggingface_hub import notebook_login\n",
"notebook_login()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Download Scripts from Hub"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"!wget -q https://huggingface.co/nraptisss/telecom-intent-pipeline/resolve/main/train.py\n",
"!wget -q https://huggingface.co/nraptisss/telecom-intent-pipeline/resolve/main/inference.py\n",
"!wget -q https://huggingface.co/nraptisss/telecom-intent-pipeline/resolve/main/merge_and_push.py\n",
"!wget -q https://huggingface.co/nraptisss/telecom-intent-pipeline/resolve/main/benchmark.py\n",
"!ls -la"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Run Training\n",
"\n",
"This takes ~2-3 hours on Kaggle T4x2 for 3 epochs on 30K samples.\n",
"\n",
"**Edit `train.py` first** if you want to change dataset, model, or hyperparameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"!python train.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Test Inference"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"!python inference.py --intent \"Deploy a low-latency URLLC slice for autonomous drones in the harbor zone with 1ms latency and 99.999% reliability\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Merge & Push to Hub"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"!python merge_and_push.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 7: Benchmark on Test Set"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"!python benchmark.py --max_samples 100 --output benchmark_results.json"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## View Results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"import json\n",
"with open('benchmark_results.json', 'r') as f:\n",
" data = json.load(f)\n",
"\n",
"print(f\"JSON Valid Rate: {data['summary']['json_valid_rate']:.1%}\")\n",
"print(f\"Schema Compliance: {data['summary']['avg_schema_compliance']:.1%}\")\n",
"if data['summary'].get('semantic_similarity_avg'):\n",
" print(f\"Semantic Similarity: {data['summary']['semantic_similarity_avg']:.3f}\")\n",
"\n",
"for layer, s in data['summary']['per_layer'].items():\n",
" print(f\" {layer:20s} valid={s['valid_rate']:.1%} compliance={s['avg_compliance']:.1%}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
} |