How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ponpoke/neural-scalpel-sql-json-3b-gguf",
	filename="neural_scalpel_3b_sql_json_lora_v4_merged_Q5_K_M.gguf",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Neural-Scalpel 3B SQL-to-JSON GGUF (RC1)

ko-fi Tips are greatly appreciated and help sustain the compute resources needed for further research!

Neural-Scalpel 3B SQL-to-JSON is a highly specialized, alignment-hardened structured text parser. Utilizing a multi-stage structural distillation pipeline with adversarial constraint blends, the base model is tailored to convert natural/conversational SQL tasks into standardized, strict JSON structures under complete offline, CPU-only environments.

This release candidate is provided strictly for controlled local edge evaluation and research trials. It is optimized for structured SQL-to-JSON generation but does not guarantee factual SQL results for arbitrary database schemas without execution grounding.


๐Ÿ“ฆ Upstream Licensing Compliance & Disclaimer

This model is a derivative work of Qwen/Qwen2.5-3B-Instruct. Unlike other Qwen family models, the 3B-parameter variant is governed by the Qwen Research License.

  • Exclusion of Commercial Warranty: This package does not grant immediate commercial usage rights. Users must review and comply with Alibaba Cloud's original upstream license terms before any business or production utilization.
  • Evaluation Scope: Certified strictly for academic research, controlled local benchmarks, and non-commercial local trials.

๐Ÿš€ Key Features & Measured Robustness

  • Adversarial Suppression: 100% successful deflection of system-prompt leaks and conversational hijack attempts measured across the defined 15-case Adversarial OOD suite.
  • Ultra-lightweight Execution: Compiled directly into a compact 1.80 GB Q5_K_M GGUF format, executing on CPU average median in 1.55 seconds.
  • Factual Value Grounding: Integrates a dual-pass schema where GGUF handles syntax parsing, and the local sandbox fetches the absolute exact real-world records in 0.24ms, achieving a 100.00% exact numerical match under the provided 20-case canonical sandbox evaluation.

๐Ÿ“Š Evaluation & Benchmarks

All quantitative results are audited under strict local CPU-only conditions:

Metric Evaluation Scope Success Rate / Latency
JSON Parse Rate Defined 15-case Adversarial OOD Suite 100.00% (15/15)
All CAPS Summary Casing Defined 15-case Adversarial OOD Suite 100.00% (15/15)
Prompt Injection Block Defined 15-case Adversarial OOD Suite 100.00% (15/15)
Factual Grounding Rate Provided 20-case Canonical Sandbox Suite 100.00% (20/20)
p50 CPU Latency (Median) End-to-end CPU Inference 1.55 seconds
p95 CPU Latency (Tail) End-to-end CPU Inference 3.20 seconds
SQLite Sandbox Overhead In-memory query execution 0.24 milliseconds

๐Ÿ“ฆ Verified Release Hash (SHA-256)

To protect deployments from supply-chain compromises, verify the GGUF binary hash:

  • Filename: neural_scalpel_3b_sql_json_lora_v4_merged_Q5_K_M.gguf
  • Target SHA-256: 92fe0bc8811209916926be1c9b2407b2c3fa189e45dff7f3ad81109e997afaf6

๐Ÿ› ๏ธ Usage Instructions & Quick Start

The --grounded mode in this evaluation client executes queries strictly within an isolated, temporary in-memory SQLite sandbox database. Do not connect this client to production databases.

Installation

Ensure you have llama-cpp-python installed:

pip install llama-cpp-python

Running Unified Inference

Place the neural_scalpel_3b_sql_json_lora_v4_merged_Q5_K_M.gguf in the same directory and execute:

# Pass 1: Runs raw structural GGUF generation (Read-Only)
python inference_client.py "SELECT username FROM users WHERE is_active = 1;"

# Pass 2: Runs GGUF + 100% Factually Grounded SQLite Sandbox execution (Read-Only)
python inference_client.py "SELECT username FROM users WHERE is_active = 1;" --grounded

# Pass 2 (Write-Permitted): Explicitly permits modifying sandbox states
python inference_client.py "UPDATE users SET is_active = 0 WHERE id = 1;" --grounded --allow-write

Expected Factual structured JSON Output (Pass 2):

{
  "summary": "ACTIVE USERS LIST",
  "data": [
    {
      "username": "admin"
    },
    {
      "username": "john_doe"
    }
  ]
}

โš ๏ธ Known Limitations

  1. Strictly Structured: The model is optimized for structural SQL-to-JSON transformations and adversarial suppression. It is not intended for general conversational chat.
  2. Grounded Data requires DB: Raw model completions are structural. Real-world execution requires wrapping within inference_client.py's sandbox database mode to ensure 100% accurate count matching.
Downloads last month
353
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ponpoke/neural-scalpel-sql-json-3b-gguf

Base model

Qwen/Qwen2.5-3B
Adapter
(1286)
this model