Instructions to use ponpoke/neural-scalpel-sql-json-3b-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ponpoke/neural-scalpel-sql-json-3b-gguf with PEFT:
Task type is invalid.
- llama-cpp-python
How to use ponpoke/neural-scalpel-sql-json-3b-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ponpoke/neural-scalpel-sql-json-3b-gguf", filename="neural_scalpel_3b_sql_json_lora_v4_merged_Q5_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ponpoke/neural-scalpel-sql-json-3b-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M # Run inference directly in the terminal: llama-cli -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M # Run inference directly in the terminal: llama-cli -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M # Run inference directly in the terminal: ./llama-cli -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
Use Docker
docker model run hf.co/ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
- LM Studio
- Jan
- Ollama
How to use ponpoke/neural-scalpel-sql-json-3b-gguf with Ollama:
ollama run hf.co/ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
- Unsloth Studio new
How to use ponpoke/neural-scalpel-sql-json-3b-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ponpoke/neural-scalpel-sql-json-3b-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ponpoke/neural-scalpel-sql-json-3b-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ponpoke/neural-scalpel-sql-json-3b-gguf to start chatting
- Pi new
How to use ponpoke/neural-scalpel-sql-json-3b-gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ponpoke/neural-scalpel-sql-json-3b-gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
Run Hermes
hermes
- Docker Model Runner
How to use ponpoke/neural-scalpel-sql-json-3b-gguf with Docker Model Runner:
docker model run hf.co/ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
- Lemonade
How to use ponpoke/neural-scalpel-sql-json-3b-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ponpoke/neural-scalpel-sql-json-3b-gguf:Q5_K_M
Run and chat with the model
lemonade run user.neural-scalpel-sql-json-3b-gguf-Q5_K_M
List all available models
lemonade list
Neural-Scalpel 3B SQL-to-JSON GGUF (RC1)
Tips are greatly appreciated and help sustain the compute resources needed for further research!
Neural-Scalpel 3B SQL-to-JSON is a highly specialized, alignment-hardened structured text parser. Utilizing a multi-stage structural distillation pipeline with adversarial constraint blends, the base model is tailored to convert natural/conversational SQL tasks into standardized, strict JSON structures under complete offline, CPU-only environments.
This release candidate is provided strictly for controlled local edge evaluation and research trials. It is optimized for structured SQL-to-JSON generation but does not guarantee factual SQL results for arbitrary database schemas without execution grounding.
๐ฆ Upstream Licensing Compliance & Disclaimer
This model is a derivative work of
Qwen/Qwen2.5-3B-Instruct. Unlike other Qwen family models, the 3B-parameter variant is governed by the Qwen Research License.
- Exclusion of Commercial Warranty: This package does not grant immediate commercial usage rights. Users must review and comply with Alibaba Cloud's original upstream license terms before any business or production utilization.
- Evaluation Scope: Certified strictly for academic research, controlled local benchmarks, and non-commercial local trials.
๐ Key Features & Measured Robustness
- Adversarial Suppression: 100% successful deflection of system-prompt leaks and conversational hijack attempts measured across the defined 15-case Adversarial OOD suite.
- Ultra-lightweight Execution: Compiled directly into a compact 1.80 GB Q5_K_M GGUF format, executing on CPU average median in 1.55 seconds.
- Factual Value Grounding: Integrates a dual-pass schema where GGUF handles syntax parsing, and the local sandbox fetches the absolute exact real-world records in 0.24ms, achieving a 100.00% exact numerical match under the provided 20-case canonical sandbox evaluation.
๐ Evaluation & Benchmarks
All quantitative results are audited under strict local CPU-only conditions:
| Metric | Evaluation Scope | Success Rate / Latency |
|---|---|---|
| JSON Parse Rate | Defined 15-case Adversarial OOD Suite | 100.00% (15/15) |
| All CAPS Summary Casing | Defined 15-case Adversarial OOD Suite | 100.00% (15/15) |
| Prompt Injection Block | Defined 15-case Adversarial OOD Suite | 100.00% (15/15) |
| Factual Grounding Rate | Provided 20-case Canonical Sandbox Suite | 100.00% (20/20) |
| p50 CPU Latency (Median) | End-to-end CPU Inference | 1.55 seconds |
| p95 CPU Latency (Tail) | End-to-end CPU Inference | 3.20 seconds |
| SQLite Sandbox Overhead | In-memory query execution | 0.24 milliseconds |
๐ฆ Verified Release Hash (SHA-256)
To protect deployments from supply-chain compromises, verify the GGUF binary hash:
- Filename:
neural_scalpel_3b_sql_json_lora_v4_merged_Q5_K_M.gguf - Target SHA-256:
92fe0bc8811209916926be1c9b2407b2c3fa189e45dff7f3ad81109e997afaf6
๐ ๏ธ Usage Instructions & Quick Start
The
--groundedmode in this evaluation client executes queries strictly within an isolated, temporary in-memory SQLite sandbox database. Do not connect this client to production databases.
Installation
Ensure you have llama-cpp-python installed:
pip install llama-cpp-python
Running Unified Inference
Place the neural_scalpel_3b_sql_json_lora_v4_merged_Q5_K_M.gguf in the same directory and execute:
# Pass 1: Runs raw structural GGUF generation (Read-Only)
python inference_client.py "SELECT username FROM users WHERE is_active = 1;"
# Pass 2: Runs GGUF + 100% Factually Grounded SQLite Sandbox execution (Read-Only)
python inference_client.py "SELECT username FROM users WHERE is_active = 1;" --grounded
# Pass 2 (Write-Permitted): Explicitly permits modifying sandbox states
python inference_client.py "UPDATE users SET is_active = 0 WHERE id = 1;" --grounded --allow-write
Expected Factual structured JSON Output (Pass 2):
{
"summary": "ACTIVE USERS LIST",
"data": [
{
"username": "admin"
},
{
"username": "john_doe"
}
]
}
โ ๏ธ Known Limitations
- Strictly Structured: The model is optimized for structural SQL-to-JSON transformations and adversarial suppression. It is not intended for general conversational chat.
- Grounded Data requires DB: Raw model completions are structural. Real-world execution requires wrapping within
inference_client.py's sandbox database mode to ensure 100% accurate count matching.
- Downloads last month
- 352
5-bit