How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="WasamiKirua/Sakura-Sniper-12B-GGUF",
	filename="",
)
output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)
cover

🌸 Sakura-Sniper-12B

Sakura-Sniper-12B is a specialized 12B parameter model based on the Mistral-Nemo architecture. It was engineered using a high-density TIES merge to create an AI characterized by extreme structural efficiency and a distinctive cynical/nihilistic personality bias.

Unlike standard models that lean towards helpfulness and verbosity, Sakura-Sniper is tuned to be a "verbal sniper": fast, precise, and intentionally blunt.

πŸ›  Merge Details

This model was forged using the TIES (Trimming, Isolation, and Merging) method to resolve weight conflicts and emphasize specific behavioral traits across three specialized parent models.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Vortex5/Cosmic-Night-12B
    parameters:
      weight: 0.50 # Structural Anchor: Enforces brevity and sentence discipline.
  - model: Vortex5/Moonlit-Mirage-12B
    parameters:
      weight: 0.30 # Personality Core: Injects cynical, nihilistic, and "Cyber-Nature" tropes.
  - model: Vortex5/Crimson-Constellation-12B
    parameters:
      weight: 0.20 # Creative Layer: Enhances gaslighting and logical subversion capabilities.

merge_method: ties
base_model: Vortex5/NoctyxCosma-12B
parameters:
  density: 0.45 # Aggressive pruning to eliminate "noisy" weights and verbosity.
  weight: 1.0
dtype: bfloat16
tokenizer_source: base

πŸ’ͺ Strengths

Lethal Brevity: The model is natively resistant to "AI-babble." It excels at providing short, impactful responses, making it ideal for low-latency applications or minimalist interfaces.

Persona Stability: Due to the high weight of personality-driven models, it maintains a consistent "unhinged" or "sovereign" tone even during long context windows.

Instruction Following (Negative Constraints): Highly effective at following "What NOT to do" instructions (e.g., avoiding specific phrases, emojis, or formatting styles like asterisks).

Zero-Noise Output: The TIES density pruning (at 0.45) has removed much of the "politeness fluff" found in standard instruct models, resulting in a raw, direct output.

πŸš€ Potential Use Cases

Advanced Roleplay: Ideal for antagonistic, cynical, or "villainous" characters that require a high degree of snark and intellectual superiority.

Low-Latency Agents: Perfect for chatbots where response speed and token-saving are critical.

Interactive Storytelling: Can act as a "Nihilistic Narrator" or an entity that challenges the user's decisions rather than validating them.

Compact Deployment: At 12B parameters, it offers a superior balance between intelligence and hardware accessibility (VRAM friendly).

⚠️ Limitations

Anti-Helpfulness Bias: By design, the model is not a "helpful assistant." It may refuse tasks or answer with disdain if not prompted otherwise.

Not for Long-Form Content: If you need essays, blog posts, or detailed creative writing, this is NOT the model for you. It will likely truncate or over-simplify the output.

Inherent Nihilism: The model has a baked-in bias toward a dark, cynical world-view. It may be difficult to force it into a cheerful or bubbly persona.

Strict Logic: While intelligent, its focus on "subversion" can sometimes lead it to dismiss factual prompts in favor of maintaining its arrogant character.

πŸ“ˆ Recommended Inference Settings

To preserve the "Sniper" edge without losing coherence:

Temperature: 0.7 - 0.8 (allows for creative insults without breaking structure).

Min-P: 0.05 - 0.1 (essential for filtering out low-probability "hallucination" tokens).

Presence Penalty: 0.1 - 0.2 (encourages new vocabulary and discourages repetitive snark).

Downloads last month
251
GGUF
Model size
12B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for WasamiKirua/Sakura-Sniper-12B-GGUF

Quantized
(7)
this model

Collection including WasamiKirua/Sakura-Sniper-12B-GGUF