Prompt Quality Analyzer
A LoRA-finetuned model that evaluates prompt quality across 8 criteria.
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import json
# Load model (auto-detects base model from adapter config)
model_path = "YOUR_USERNAME/prompt-quality-analyzer"
# The adapter config contains the base model name
# Default: TinyLlama/TinyLlama-1.1B-Chat-v1.0
base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
trust_remote_code=True,
low_cpu_mem_usage=True
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_path)
model.eval()
# Analyze a prompt
prompt = "Classify text into categories"
instruction = """You are a prompt quality analyzer. Analyze the given prompt and extract quality scores for each criterion.
Criteria (score 1-10):
1. clarity_and_specificity: Are instructions clear and specific?
2. context_sufficiency: Is sufficient background/context provided?
3. examples_provided: Are concrete examples included?
4. output_format_specification: Is the expected output format clearly defined?
5. edge_case_handling: Are edge cases and exceptions addressed?
6. tone_and_style_guidance: Is communication style/tone specified?
7. constraint_definition: Are limits and boundaries clearly set?
8. relevance_of_examples: Do examples match the domain/task?
Respond with ONLY a JSON object containing the scores."""
# Format with chat template
formatted = f"""<|system|>
{instruction}<|end|>
<|user|>
Analyze this prompt:
{prompt}<|end|>
<|assistant|>
"""
inputs = tokenizer(formatted, return_tensors="pt", max_length=2048, truncation=True)
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=400,
temperature=0.3,
do_sample=True,
top_p=0.9,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id
)
# Decode only new tokens
input_length = inputs['input_ids'].shape[1]
new_tokens = outputs[0][input_length:]
response = tokenizer.decode(new_tokens, skip_special_tokens=True)
# Parse JSON (may need robust parsing)
result = json.loads(response)
print(json.dumps(result, indent=2))
What It Does
Scores prompts on 8 dimensions (1-10 scale):
| Criterion | Description |
|---|---|
| Clarity & Specificity | Are instructions clear and unambiguous? |
| Context Sufficiency | Is enough background provided? |
| Examples Provided | Are there concrete examples? |
| Output Format | Is the expected format specified? |
| Edge Case Handling | Are exceptions addressed? |
| Tone & Style | Is communication style defined? |
| Constraint Definition | Are limits clearly set? |
| Example Relevance | Do examples match the task? |
Example Output
{
"scores": {
"clarity_and_specificity": 8,
"context_sufficiency": 7,
"examples_provided": 6,
"output_format_specification": 9,
"edge_case_handling": 5,
"tone_and_style_guidance": 6,
"constraint_definition": 7,
"relevance_of_examples": 8
},
"overall_score": 7.0,
"quality_tier": "good"
}
Model Details
- Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 (1.1B parameters)
- Method: LoRA fine-tuning
- LoRA Rank: 8
- Training Data: 500 synthetic prompt examples
- Training Time: ~10-15 minutes on CPU
- Chat Template: Uses
<|system|>,<|user|>,<|assistant|>format
Performance
- Success Rate: 100% with robust JSON parsing
- MAE: ~1.54 overall
- Accuracy (±1 point): ~61%
- Accuracy (±2 points): ~77%
- Parsing: Multi-strategy approach (direct JSON, fix common issues, regex extraction)
- Improvement: 75/75 successful vs 29/75 originally (39% → 100%)
Limitations
- Optimized for English prompts
- Works best with prompts <500 tokens
- May require JSON parsing helpers for consistent results
- Small model - accuracy improves with larger base models
Use Cases
- Prompt Engineering - Optimize prompts before deployment
- Quality Assurance - Evaluate prompt libraries
- Learning - Understand what makes effective prompts
- Automation - Batch analyze multiple prompts
Training
Trained on synthetic dataset with quality annotations across 8 criteria. Uses Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters.
Citation
@misc{extract-prompt-quality-criteria,
author = {Gregoire Cattan},
title = {Extract Prompt Quality Criteria},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/center-of-excellence/extract-prompt-quality-criteria}
}
- Downloads last month
- 153
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for center-of-excellence/extract-prompt-quality-criteria
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0