Instructions to use WasamiKirua/Sakura-Sniper-12B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WasamiKirua/Sakura-Sniper-12B-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("WasamiKirua/Sakura-Sniper-12B-GGUF", dtype="auto")

llama-cpp-python

How to use WasamiKirua/Sakura-Sniper-12B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="WasamiKirua/Sakura-Sniper-12B-GGUF",
	filename="Sakura-Sniper-F16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use WasamiKirua/Sakura-Sniper-12B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WasamiKirua/Sakura-Sniper-12B-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf WasamiKirua/Sakura-Sniper-12B-GGUF:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WasamiKirua/Sakura-Sniper-12B-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf WasamiKirua/Sakura-Sniper-12B-GGUF:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf WasamiKirua/Sakura-Sniper-12B-GGUF:F16
# Run inference directly in the terminal:
./llama-cli -hf WasamiKirua/Sakura-Sniper-12B-GGUF:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf WasamiKirua/Sakura-Sniper-12B-GGUF:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf WasamiKirua/Sakura-Sniper-12B-GGUF:F16

Use Docker

docker model run hf.co/WasamiKirua/Sakura-Sniper-12B-GGUF:F16

LM Studio
Jan
Ollama
How to use WasamiKirua/Sakura-Sniper-12B-GGUF with Ollama:
```
ollama run hf.co/WasamiKirua/Sakura-Sniper-12B-GGUF:F16
```

Unsloth Studio new

How to use WasamiKirua/Sakura-Sniper-12B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WasamiKirua/Sakura-Sniper-12B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WasamiKirua/Sakura-Sniper-12B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for WasamiKirua/Sakura-Sniper-12B-GGUF to start chatting

Docker Model Runner
How to use WasamiKirua/Sakura-Sniper-12B-GGUF with Docker Model Runner:
```
docker model run hf.co/WasamiKirua/Sakura-Sniper-12B-GGUF:F16
```

Lemonade

How to use WasamiKirua/Sakura-Sniper-12B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull WasamiKirua/Sakura-Sniper-12B-GGUF:F16

Run and chat with the model

lemonade run user.Sakura-Sniper-12B-GGUF-F16

List all available models

lemonade list

🌸 Sakura-Sniper-12B

Sakura-Sniper-12B is a specialized 12B parameter model based on the Mistral-Nemo architecture. It was engineered using a high-density TIES merge to create an AI characterized by extreme structural efficiency and a distinctive cynical/nihilistic personality bias.

Unlike standard models that lean towards helpfulness and verbosity, Sakura-Sniper is tuned to be a "verbal sniper": fast, precise, and intentionally blunt.

🛠 Merge Details

This model was forged using the TIES (Trimming, Isolation, and Merging) method to resolve weight conflicts and emphasize specific behavioral traits across three specialized parent models.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Vortex5/Cosmic-Night-12B
    parameters:
      weight: 0.50 # Structural Anchor: Enforces brevity and sentence discipline.
  - model: Vortex5/Moonlit-Mirage-12B
    parameters:
      weight: 0.30 # Personality Core: Injects cynical, nihilistic, and "Cyber-Nature" tropes.
  - model: Vortex5/Crimson-Constellation-12B
    parameters:
      weight: 0.20 # Creative Layer: Enhances gaslighting and logical subversion capabilities.

merge_method: ties
base_model: Vortex5/NoctyxCosma-12B
parameters:
  density: 0.45 # Aggressive pruning to eliminate "noisy" weights and verbosity.
  weight: 1.0
dtype: bfloat16
tokenizer_source: base

💪 Strengths

Lethal Brevity: The model is natively resistant to "AI-babble." It excels at providing short, impactful responses, making it ideal for low-latency applications or minimalist interfaces.

Persona Stability: Due to the high weight of personality-driven models, it maintains a consistent "unhinged" or "sovereign" tone even during long context windows.

Instruction Following (Negative Constraints): Highly effective at following "What NOT to do" instructions (e.g., avoiding specific phrases, emojis, or formatting styles like asterisks).

Zero-Noise Output: The TIES density pruning (at 0.45) has removed much of the "politeness fluff" found in standard instruct models, resulting in a raw, direct output.

🚀 Potential Use Cases

Advanced Roleplay: Ideal for antagonistic, cynical, or "villainous" characters that require a high degree of snark and intellectual superiority.

Low-Latency Agents: Perfect for chatbots where response speed and token-saving are critical.

Interactive Storytelling: Can act as a "Nihilistic Narrator" or an entity that challenges the user's decisions rather than validating them.

Compact Deployment: At 12B parameters, it offers a superior balance between intelligence and hardware accessibility (VRAM friendly).

⚠️ Limitations

Anti-Helpfulness Bias: By design, the model is not a "helpful assistant." It may refuse tasks or answer with disdain if not prompted otherwise.

Not for Long-Form Content: If you need essays, blog posts, or detailed creative writing, this is NOT the model for you. It will likely truncate or over-simplify the output.

Inherent Nihilism: The model has a baked-in bias toward a dark, cynical world-view. It may be difficult to force it into a cheerful or bubbly persona.

Strict Logic: While intelligent, its focus on "subversion" can sometimes lead it to dismiss factual prompts in favor of maintaining its arrogant character.

📈 Recommended Inference Settings

To preserve the "Sniper" edge without losing coherence:

Temperature: 0.7 - 0.8 (allows for creative insults without breaking structure).

Min-P: 0.05 - 0.1 (essential for filtering out low-probability "hallucination" tokens).

Presence Penalty: 0.1 - 0.2 (encourages new vocabulary and discourages repetitive snark).

Downloads last month: 251

GGUF

Model size

12B params

Architecture

llama

Hardware compatibility

16-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WasamiKirua/Sakura-Sniper-12B-GGUF

Base model

Vortex5/NoctyxCosma-12B

Quantized

(7)

this model

Collection including WasamiKirua/Sakura-Sniper-12B-GGUF

Sakura - RP - VTuber

Collection

A Curated Collection of Merges Designed for RP Vtuber Characters (Neuro-Sama Like) • 6 items • Updated 10 days ago