Qwen2.5-1.5B JSON Repair (Pruned + Distilled)

Overview

This model is a lightweight (~1.5B parameter) transformer specialized in repairing malformed JSON outputs generated by large language models.

It was created by:

  1. Structured pruning of Qwen2.5-3B-Instruct (50% layer reduction)
  2. Knowledge distillation from the 3B teacher model
  3. Fine-tuning on synthetic malformed → corrected JSON pairs
  4. Training on an NVIDIA A100 GPU using bfloat16 precision

The objective is syntactic repair and structural correction of JSON under realistic LLM failure patterns.

This is a specialized structural model — not a general-purpose reasoning model.


Base Model and Pruning Strategy

Teacher Model

  • Qwen2.5-3B-Instruct
  • ~3B parameters
  • 36 transformer layers

Student Model

  • 18 transformer layers (50% structured pruning)
  • Retained layers: 0, 2, 4, ..., 34
  • Embeddings, normalization layers, and LM head copied from teacher
  • No random reinitialization of retained layers

Parameter Reduction

  • Teacher: ~3.0B parameters
  • Student: ~1.5B parameters
  • ~50% reduction in depth and inference FLOPs

This uses structured depth pruning rather than magnitude pruning.

Transformers contain significant redundancy across layers. Retaining alternating layers preserves representation diversity while reducing compute and memory.


Dataset Construction

Source dataset: glaiveai/glaive-function-calling-v2

Procedure:

  1. Extract valid JSON objects from conversational samples.
  2. Apply synthetic corruptions:
    • Remove brace
    • Remove quote
    • Remove colon
    • Remove comma
  3. Keep only malformed → corrected pairs where corruption differs.
  4. Target dataset size: 10,000 pairs.
  5. 90/10 train-test split.

The corruption strategy simulates realistic LLM output failures:

  • Missing quotes
  • Missing separators
  • Truncated structures
  • Structural incompleteness

Training Methodology

Training uses Knowledge Distillation combining:

  1. Cross-entropy loss against ground truth
  2. KL divergence between teacher and student logits

Loss formulation:

L = α * CE(student, target) + (1 - α) * KL(student || teacher)

Hyperparameters

  • Epochs: 3
  • Training time: ~40 minutes
  • GPU: NVIDIA A100
  • Precision: bfloat16
  • Batch size: 4
  • Gradient accumulation: 4
  • Effective batch size: 16
  • Learning rate: 2e-5
  • Temperature (T): 2.0
  • Alpha: 0.5
  • Max sequence length: 256
  • Optimizer: AdamW (Transformers default)
  • Checkpoint saving: disabled

The teacher model remained frozen during training.


Evaluation Protocol

Evaluation was conducted on 100 samples from the test split.

Robust parsing was performed using:

json.JSONDecoder().raw_decode

This allows extraction of valid JSON even if trailing text exists.

Results

  • Valid JSON Rate: 55%
  • Exact Match Accuracy: 14%

Metric Definitions

Valid JSON Rate: Percentage of outputs that can be successfully parsed as JSON.

Exact Match Accuracy: Percentage of predictions that are semantically identical to ground truth (Python dict equality).

Interpretation:

The model frequently reconstructs syntactically valid JSON but sometimes:

  • Drops outer fields
  • Alters keys
  • Truncates top-level structure

This behavior is expected given:

  • Heavy pruning (50%)
  • Limited dataset size (10k)
  • Short training duration (3 epochs)

Intended Use

Designed for:

  • Post-processing node in agent pipelines
  • JSON validation layers
  • Function-calling repair
  • LangGraph validation nodes
  • Structured output correction

Not designed for:

  • General reasoning tasks
  • Complex semantic correction
  • Code generation
  • Multi-step instruction following

Known Failure Modes

Observed behaviors:

  • Run-on text continuation after valid JSON
  • Missing top-level keys
  • Partial schema reconstruction
  • Repetition artifacts

These are typical in partially converged distilled models.


How to Improve Performance

Potential improvements:

  1. Increase dataset size (50k–100k pairs)
  2. Train for 5–10 epochs
  3. Use cosine learning rate decay
  4. Introduce curriculum learning (simple → nested JSON)
  5. Add stop-token conditioning
  6. Apply grammar-constrained decoding
  7. Increase effective batch size
  8. Train longer on A100 (several hours)

With these changes, validity rate is expected to increase significantly (>80%).


Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "jamal-ibrahim/qwen2.5-1.5b-json-repair"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

malformed_json = '{"status": "success", "message" "QR code generated successfully"}'

prompt = f"User: Repair this JSON: {malformed_json}\n\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=False
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Conceptual Perspective

This model demonstrates that:

Structural knowledge survives aggressive pruning.

Distillation can preserve formal syntax understanding.

Small specialized models can effectively act as structural validators.

Instead of relying on large general models for everything, a modular architecture using lightweight structural validators can be more efficient and robust.

Training Compute Summary

Hardware: NVIDIA A100

Precision: bfloat16

Epochs: 3

Total training time: ~40 minutes

Approximate COâ‚‚ estimate: 0.3 kg (estimated)

License

Apache 2.0 (inherits from base model license).

Citation

If you use this model in research or production systems, please cite this repository.

Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for jamal-ibrahim/qwen2.5-1.5b-json-repair

Base model

Qwen/Qwen2.5-3B
Finetuned
(1187)
this model

Dataset used to train jamal-ibrahim/qwen2.5-1.5b-json-repair

Evaluation results

  • Exact Match Accuracy on glaive-function-calling-v2 (synthetic corruption subset)
    test set self-reported
    0.140
  • Valid JSON Rate on glaive-function-calling-v2 (synthetic corruption subset)
    test set self-reported
    0.550