root commited on
Commit ·
7e9f789
1
Parent(s): 60a6bce
Move GGUF to dedicated repos, add GGUF section linking to collection
Browse files- README.md +26 -116
- brick-complexity-extractor-BF16.gguf +0 -3
- brick-complexity-extractor-Q4_K_M.gguf +0 -3
- brick-complexity-extractor-Q8_0.gguf +0 -3
README.md
CHANGED
|
@@ -7,7 +7,6 @@ tags:
|
|
| 7 |
- peft
|
| 8 |
- safetensors
|
| 9 |
- lora
|
| 10 |
-
- gguf
|
| 11 |
- complexity-classification
|
| 12 |
- llm-routing
|
| 13 |
- query-difficulty
|
|
@@ -42,11 +41,11 @@ model-index:
|
|
| 42 |
|
| 43 |
<div align="center">
|
| 44 |
|
| 45 |
-
# Brick Complexity Extractor
|
| 46 |
|
| 47 |
### A lightweight LoRA adapter for real-time query complexity classification
|
| 48 |
|
| 49 |
-
**[Regolo.ai](https://regolo.ai)
|
| 50 |
|
| 51 |
[](https://creativecommons.org/licenses/by-nc/4.0/)
|
| 52 |
[](https://huggingface.co/Qwen/Qwen3.5-0.8B)
|
|
@@ -84,7 +83,7 @@ The adapter adds only **~2M trainable parameters** on top of the 0.8B base model
|
|
| 84 |
|
| 85 |
## The Problem: Why LLM Routing Needs Complexity Classification
|
| 86 |
|
| 87 |
-
Not all prompts are equal. A factual recall question ("What is the capital of France?") and a multi-step reasoning task ("Derive the optimal portfolio allocation given these constraints
|
| 88 |
|
| 89 |
**Brick** solves this by routing each query to the right model tier in real time. Complexity classification is one of several routing signals (alongside keyword matching, domain detection, and reasoning-depth estimation) that Brick uses to make sub-50ms routing decisions.
|
| 90 |
|
|
@@ -112,26 +111,26 @@ The adapter applies LoRA to the query and value projection matrices (`q_proj`, `
|
|
| 112 |
|
| 113 |
```
|
| 114 |
Qwen3.5-0.8B (frozen)
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
```
|
| 121 |
|
| 122 |
## Label Definitions
|
| 123 |
|
| 124 |
| Label | Reasoning Steps | Description | Example |
|
| 125 |
|---|---|---|---|
|
| 126 |
-
| **easy** | 1
|
| 127 |
-
| **medium** | 3
|
| 128 |
| **hard** | 6+ | Deep expertise, multi-constraint optimization, creative synthesis | "Design a distributed cache eviction policy that minimizes tail latency under bursty traffic" |
|
| 129 |
|
| 130 |
Labels were generated by **Qwen3.5-122B** acting as an LLM judge on 76,831 diverse user prompts. See the [dataset card](https://huggingface.co/datasets/regolo/brick-complexity-extractor) for full labeling methodology.
|
| 131 |
|
| 132 |
## Performance
|
| 133 |
|
| 134 |
-
### Classification Metrics (Test Set
|
| 135 |
|
| 136 |
| Metric | Value |
|
| 137 |
|---|---|
|
|
@@ -199,106 +198,17 @@ print(f"Complexity: {predicted}")
|
|
| 199 |
# https://github.com/regolo-ai/brick-SR1
|
| 200 |
```
|
| 201 |
|
| 202 |
-
---
|
| 203 |
-
|
| 204 |
## GGUF Quantized Models
|
| 205 |
|
| 206 |
-
Pre-built GGUF files are available for inference with
|
| 207 |
-
|
| 208 |
-
These files contain the **full merged model** (base Qwen3.5-0.8B + LoRA adapter merged), so no separate adapter loading is needed.
|
| 209 |
-
|
| 210 |
-
### Available Quantizations
|
| 211 |
|
| 212 |
-
|
|
| 213 |
|---|---|---|---|---|
|
| 214 |
-
|
|
| 215 |
-
|
|
| 216 |
-
|
|
| 217 |
-
|
| 218 |
-
### Usage with llama.cpp
|
| 219 |
-
|
| 220 |
-
```bash
|
| 221 |
-
# Download a quantized model
|
| 222 |
-
huggingface-cli download regolo/brick-complexity-extractor \
|
| 223 |
-
brick-complexity-extractor-Q8_0.gguf \
|
| 224 |
-
--local-dir ./models
|
| 225 |
-
|
| 226 |
-
# Run inference
|
| 227 |
-
./llama-cli -m ./models/brick-complexity-extractor-Q8_0.gguf \
|
| 228 |
-
-p "<|im_start|>system
|
| 229 |
-
You are a query difficulty classifier for an LLM routing system.
|
| 230 |
-
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
|
| 231 |
-
Respond with ONLY one word: easy, medium, or hard.<|im_end|>
|
| 232 |
-
<|im_start|>user
|
| 233 |
-
Classify: What is the capital of France?<|im_end|>
|
| 234 |
-
<|im_start|>assistant
|
| 235 |
-
" \
|
| 236 |
-
-n 5 --temp 0
|
| 237 |
-
```
|
| 238 |
-
|
| 239 |
-
### Usage with Ollama
|
| 240 |
-
|
| 241 |
-
```bash
|
| 242 |
-
# Create a Modelfile
|
| 243 |
-
cat > Modelfile <<EOF
|
| 244 |
-
FROM ./brick-complexity-extractor-Q8_0.gguf
|
| 245 |
-
|
| 246 |
-
SYSTEM """You are a query difficulty classifier for an LLM routing system.
|
| 247 |
-
Classify each query as easy, medium, or hard based on the cognitive depth and domain expertise required to answer correctly.
|
| 248 |
-
Respond with ONLY one word: easy, medium, or hard."""
|
| 249 |
-
|
| 250 |
-
TEMPLATE """<|im_start|>system
|
| 251 |
-
{{ .System }}<|im_end|>
|
| 252 |
-
<|im_start|>user
|
| 253 |
-
Classify: {{ .Prompt }}<|im_end|>
|
| 254 |
-
<|im_start|>assistant
|
| 255 |
-
"""
|
| 256 |
-
|
| 257 |
-
PARAMETER temperature 0
|
| 258 |
-
PARAMETER num_predict 5
|
| 259 |
-
EOF
|
| 260 |
-
|
| 261 |
-
ollama create brick-complexity -f Modelfile
|
| 262 |
-
ollama run brick-complexity "Design a distributed consensus algorithm"
|
| 263 |
-
# Output: hard
|
| 264 |
-
```
|
| 265 |
-
|
| 266 |
-
### Usage with vLLM
|
| 267 |
|
| 268 |
-
|
| 269 |
-
from vllm import LLM, SamplingParams
|
| 270 |
-
|
| 271 |
-
llm = LLM(
|
| 272 |
-
model="regolo/brick-complexity-extractor",
|
| 273 |
-
quantization="gguf",
|
| 274 |
-
# Point to a specific GGUF file:
|
| 275 |
-
# model="./brick-complexity-extractor-Q8_0.gguf"
|
| 276 |
-
)
|
| 277 |
-
|
| 278 |
-
sampling_params = SamplingParams(temperature=0, max_tokens=5)
|
| 279 |
-
|
| 280 |
-
prompt = """<|im_start|>system
|
| 281 |
-
You are a query difficulty classifier for an LLM routing system.
|
| 282 |
-
Classify each query as easy, medium, or hard.
|
| 283 |
-
Respond with ONLY one word: easy, medium, or hard.<|im_end|>
|
| 284 |
-
<|im_start|>user
|
| 285 |
-
Classify: Explain the rendering equation from radiometric first principles<|im_end|>
|
| 286 |
-
<|im_start|>assistant
|
| 287 |
-
"""
|
| 288 |
-
|
| 289 |
-
output = llm.generate([prompt], sampling_params)
|
| 290 |
-
print(output[0].outputs[0].text.strip())
|
| 291 |
-
# Output: hard
|
| 292 |
-
```
|
| 293 |
-
|
| 294 |
-
### Important Note on GGUF Inference
|
| 295 |
-
|
| 296 |
-
The GGUF models use **generative text output** (the model generates the word "easy", "medium", or "hard") rather than the logit-based classification used by the LoRA adapter. This means:
|
| 297 |
-
|
| 298 |
-
- **LoRA adapter (recommended for production)**: Uses logit extraction at the last token position for the three label tokens. Faster and more reliable.
|
| 299 |
-
- **GGUF (recommended for local/edge deployment)**: Generates the classification label as text. Slightly lower accuracy but works with any GGUF runtime without Python dependencies.
|
| 300 |
-
|
| 301 |
-
---
|
| 302 |
|
| 303 |
## Integration with Brick Semantic Router
|
| 304 |
|
|
@@ -339,14 +249,14 @@ model_pools:
|
|
| 339 |
|
| 340 |
## Intended Uses
|
| 341 |
|
| 342 |
-
### Primary Use Cases
|
| 343 |
-
- **LLM routing**: Classify query complexity to route to the optimal model tier, reducing inference cost by 30
|
| 344 |
- **Reasoning budget allocation**: Decide how many reasoning tokens to allocate before inference begins
|
| 345 |
- **Traffic shaping**: Balance GPU load across model pools based on real-time complexity distribution
|
| 346 |
- **Cost monitoring**: Track complexity distribution over time to optimize fleet sizing
|
| 347 |
|
| 348 |
-
### Out-of-Scope Uses
|
| 349 |
-
- **Content moderation or safety filtering**
|
| 350 |
- **Non-English queries** trained on English data only; accuracy degrades significantly on other languages
|
| 351 |
- **Direct use as a chatbot or generative model** this is a classification adapter, not a generative model
|
| 352 |
|
|
@@ -364,7 +274,7 @@ model_pools:
|
|
| 364 |
|---|---|
|
| 365 |
| **Base model** | Qwen/Qwen3.5-0.8B |
|
| 366 |
| **LoRA rank (r)** | 16 |
|
| 367 |
-
| **LoRA alpha** | 32 |
|
| 368 |
| **LoRA dropout** | 0.05 |
|
| 369 |
| **Target modules** | q_proj, v_proj |
|
| 370 |
| **Learning rate** | 2e-4 |
|
|
@@ -376,7 +286,7 @@ model_pools:
|
|
| 376 |
| **Training samples** | 65,307 |
|
| 377 |
| **Validation samples** | 7,683 |
|
| 378 |
| **Test samples** | 3,841 |
|
| 379 |
-
| **Training hardware** |
|
| 380 |
| **Training time** | ~2 hours |
|
| 381 |
| **Framework** | PyTorch + HuggingFace PEFT |
|
| 382 |
|
|
@@ -386,9 +296,9 @@ Regolo.ai is committed to sustainable AI. This model was trained on GPU infrastr
|
|
| 386 |
|
| 387 |
| Metric | Value |
|
| 388 |
|---|---|
|
| 389 |
-
| **Hardware** |
|
| 390 |
| **Training duration** | ~2 hours |
|
| 391 |
-
| **Estimated
|
| 392 |
| **Energy source** | Renewable (certified) |
|
| 393 |
| **Location** | Italy (EU) |
|
| 394 |
|
|
@@ -411,6 +321,6 @@ Regolo.ai is committed to sustainable AI. This model was trained on GPU infrastr
|
|
| 411 |
|
| 412 |
<div align="center">
|
| 413 |
|
| 414 |
-
**[Website](https://regolo.ai)
|
| 415 |
|
| 416 |
</div>
|
|
|
|
| 7 |
- peft
|
| 8 |
- safetensors
|
| 9 |
- lora
|
|
|
|
| 10 |
- complexity-classification
|
| 11 |
- llm-routing
|
| 12 |
- query-difficulty
|
|
|
|
| 41 |
|
| 42 |
<div align="center">
|
| 43 |
|
| 44 |
+
# 🧱 Brick Complexity Extractor
|
| 45 |
|
| 46 |
### A lightweight LoRA adapter for real-time query complexity classification
|
| 47 |
|
| 48 |
+
**[Regolo.ai](https://regolo.ai) · [Dataset](https://huggingface.co/datasets/regolo/brick-complexity-extractor) · [Brick SR1 on GitHub](https://github.com/regolo-ai/brick-SR1) · [API Docs](https://docs.regolo.ai)**
|
| 49 |
|
| 50 |
[](https://creativecommons.org/licenses/by-nc/4.0/)
|
| 51 |
[](https://huggingface.co/Qwen/Qwen3.5-0.8B)
|
|
|
|
| 83 |
|
| 84 |
## The Problem: Why LLM Routing Needs Complexity Classification
|
| 85 |
|
| 86 |
+
Not all prompts are equal. A factual recall question ("What is the capital of France?") and a multi-step reasoning task ("Derive the optimal portfolio allocation given these constraints…") require fundamentally different compute budgets. Sending every query to a frontier reasoning model wastes resources; sending hard queries to a lightweight model degrades quality.
|
| 87 |
|
| 88 |
**Brick** solves this by routing each query to the right model tier in real time. Complexity classification is one of several routing signals (alongside keyword matching, domain detection, and reasoning-depth estimation) that Brick uses to make sub-50ms routing decisions.
|
| 89 |
|
|
|
|
| 111 |
|
| 112 |
```
|
| 113 |
Qwen3.5-0.8B (frozen)
|
| 114 |
+
└── Attention Layers × 24
|
| 115 |
+
├── q_proj ← LoRA(r=16, α=32)
|
| 116 |
+
└── v_proj ← LoRA(r=16, α=32)
|
| 117 |
+
└── Last Hidden State
|
| 118 |
+
└── Classification Head (3 classes)
|
| 119 |
```
|
| 120 |
|
| 121 |
## Label Definitions
|
| 122 |
|
| 123 |
| Label | Reasoning Steps | Description | Example |
|
| 124 |
|---|---|---|---|
|
| 125 |
+
| **easy** | 1–2 | Surface knowledge, factual recall, simple lookups | "What is the capital of Italy?" |
|
| 126 |
+
| **medium** | 3–5 | Domain familiarity, multi-step reasoning, comparison | "Compare REST and GraphQL for a mobile app backend" |
|
| 127 |
| **hard** | 6+ | Deep expertise, multi-constraint optimization, creative synthesis | "Design a distributed cache eviction policy that minimizes tail latency under bursty traffic" |
|
| 128 |
|
| 129 |
Labels were generated by **Qwen3.5-122B** acting as an LLM judge on 76,831 diverse user prompts. See the [dataset card](https://huggingface.co/datasets/regolo/brick-complexity-extractor) for full labeling methodology.
|
| 130 |
|
| 131 |
## Performance
|
| 132 |
|
| 133 |
+
### Classification Metrics (Test Set — 3,841 samples)
|
| 134 |
|
| 135 |
| Metric | Value |
|
| 136 |
|---|---|
|
|
|
|
| 198 |
# https://github.com/regolo-ai/brick-SR1
|
| 199 |
```
|
| 200 |
|
|
|
|
|
|
|
| 201 |
## GGUF Quantized Models
|
| 202 |
|
| 203 |
+
Pre-built GGUF files are available for inference with llama.cpp, Ollama, LM Studio, vLLM, and other GGUF-compatible runtimes. Each quantization is published as a separate model:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 204 |
|
| 205 |
+
| Model | Quant | Size | BPW | Notes |
|
| 206 |
|---|---|---|---|---|
|
| 207 |
+
| [brick-complexity-extractor-BF16-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-BF16-GGUF) | BF16 | 1.5 GB | 16.0 | Full precision |
|
| 208 |
+
| [brick-complexity-extractor-Q8_0-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q8_0-GGUF) | Q8_0 | 775 MB | 8.0 | Recommended |
|
| 209 |
+
| [brick-complexity-extractor-Q4_K_M-GGUF](https://huggingface.co/regolo/brick-complexity-extractor-Q4_K_M-GGUF) | Q4_K_M | 494 MB | 5.5 | Best size/quality ratio |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 210 |
|
| 211 |
+
See the [brick-complexity-extractor collection](https://huggingface.co/collections/regolo/brick-complexity-extractor-69dcc2dec2fe3b54a70b3415) for all available formats.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 212 |
|
| 213 |
## Integration with Brick Semantic Router
|
| 214 |
|
|
|
|
| 249 |
|
| 250 |
## Intended Uses
|
| 251 |
|
| 252 |
+
### ✅ Primary Use Cases
|
| 253 |
+
- **LLM routing**: Classify query complexity to route to the optimal model tier, reducing inference cost by 30–60% compared to always-frontier routing
|
| 254 |
- **Reasoning budget allocation**: Decide how many reasoning tokens to allocate before inference begins
|
| 255 |
- **Traffic shaping**: Balance GPU load across model pools based on real-time complexity distribution
|
| 256 |
- **Cost monitoring**: Track complexity distribution over time to optimize fleet sizing
|
| 257 |
|
| 258 |
+
### ⚠️ Out-of-Scope Uses
|
| 259 |
+
- **Content moderation or safety filtering** — this model classifies cognitive difficulty, not content safety
|
| 260 |
- **Non-English queries** trained on English data only; accuracy degrades significantly on other languages
|
| 261 |
- **Direct use as a chatbot or generative model** this is a classification adapter, not a generative model
|
| 262 |
|
|
|
|
| 274 |
|---|---|
|
| 275 |
| **Base model** | Qwen/Qwen3.5-0.8B |
|
| 276 |
| **LoRA rank (r)** | 16 |
|
| 277 |
+
| **LoRA alpha (α)** | 32 |
|
| 278 |
| **LoRA dropout** | 0.05 |
|
| 279 |
| **Target modules** | q_proj, v_proj |
|
| 280 |
| **Learning rate** | 2e-4 |
|
|
|
|
| 286 |
| **Training samples** | 65,307 |
|
| 287 |
| **Validation samples** | 7,683 |
|
| 288 |
| **Test samples** | 3,841 |
|
| 289 |
+
| **Training hardware** | 1× NVIDIA A100 80GB |
|
| 290 |
| **Training time** | ~2 hours |
|
| 291 |
| **Framework** | PyTorch + HuggingFace PEFT |
|
| 292 |
|
|
|
|
| 296 |
|
| 297 |
| Metric | Value |
|
| 298 |
|---|---|
|
| 299 |
+
| **Hardware** | 1× NVIDIA A100 80GB |
|
| 300 |
| **Training duration** | ~2 hours |
|
| 301 |
+
| **Estimated CO₂** | < 0.5 kg CO₂eq |
|
| 302 |
| **Energy source** | Renewable (certified) |
|
| 303 |
| **Location** | Italy (EU) |
|
| 304 |
|
|
|
|
| 321 |
|
| 322 |
<div align="center">
|
| 323 |
|
| 324 |
+
**[Website](https://regolo.ai) · [Docs](https://docs.regolo.ai) · [Discord](https://discord.gg/myuuVFcfJw) · [GitHub](https://github.com/regolo-ai) · [LinkedIn](https://www.linkedin.com/company/regolo-ai/)**
|
| 325 |
|
| 326 |
</div>
|
brick-complexity-extractor-BF16.gguf
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:6fc8392a811ff1b3dbdb7348110893bac25f912540a58ae7ff4e1cb96ceced92
|
| 3 |
-
size 1516736384
|
|
|
|
|
|
|
|
|
|
|
|
brick-complexity-extractor-Q4_K_M.gguf
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:8bb38e63a7eeabddd729f2cdadfc7bd04b82aea413778e77bd4dee2b03a5489e
|
| 3 |
-
size 529289088
|
|
|
|
|
|
|
|
|
|
|
|
brick-complexity-extractor-Q8_0.gguf
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:1f74b88a1b7149dd9074eed60cadfc7555fca227ddbc1c71ec30a635f7cd3913
|
| 3 |
-
size 811835264
|
|
|
|
|
|
|
|
|
|
|
|