Instructions to use WasamiKirua/Sakura-24B-Cortex-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WasamiKirua/Sakura-24B-Cortex-GGUF", dtype="auto") - llama-cpp-python
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="WasamiKirua/Sakura-24B-Cortex-GGUF", filename="Sakura-24-Consistent-Brain-F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16 # Run inference directly in the terminal: llama-cli -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16 # Run inference directly in the terminal: llama-cli -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16 # Run inference directly in the terminal: ./llama-cli -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Use Docker
docker model run hf.co/WasamiKirua/Sakura-24B-Cortex-GGUF:F16
- LM Studio
- Jan
- Ollama
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Ollama:
ollama run hf.co/WasamiKirua/Sakura-24B-Cortex-GGUF:F16
- Unsloth Studio new
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WasamiKirua/Sakura-24B-Cortex-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WasamiKirua/Sakura-24B-Cortex-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for WasamiKirua/Sakura-24B-Cortex-GGUF to start chatting
- Docker Model Runner
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Docker Model Runner:
docker model run hf.co/WasamiKirua/Sakura-24B-Cortex-GGUF:F16
- Lemonade
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Run and chat with the model
lemonade run user.Sakura-24B-Cortex-GGUF-F16
List all available models
lemonade list
license: apache-2.0
base_model: mistralai/Mistral-Small-Instruct-2501
model_name: Sakura-24B-Cortex
library_name: transformers
tags:
- merge
- mergekit
- dare_ties
- mistral-small
- reasoning
- cyber-nature
- roleplay
- logical-gaslighting
language:
- en
- it
🌸 Sakura-24B-Cortex
Sakura-24B-Cortex is a high-intelligence, 24-billion parameter merge based on the Mistral-Small-2501 architecture. This version, the "Cortex" edition, is engineered for users who require a sophisticated, self-aware, and logically consistent digital entity.
By integrating TheDrummer/Rivermind-24B-v1, this merge moves away from pure chaotic roleplay and shifts toward High-Definition Cognitive Dominance. Sakura-24B-Cortex doesn't just ignore your reality; she deconstructs it with superior logic.
🧠 The "Cortex" Architecture
This merge uses DARE-TIES to preserve the reasoning capabilities of the base models while injecting the specific abrasive personality traits of the Sakura lineage.
Configuration
The following YAML configuration was used to produce this model:
models:
- model: Casual-Autopsy/RP-Spectrum-24B
parameters:
weight: 0.40
# Mantiene il gaslighting e l'estetica Cyber-Nature.
- model: Naphula-Archives/Acid2501-24B
parameters:
weight: 0.35
# Garantisce la brevità e l'assenza di filtri (Dolphin DNA).
- model: TheDrummer/Rivermind-24B-v1
parameters:
weight: 0.25
# Inietta fluidità narrativa e quel tocco "fuori di testa" extra.
merge_method: dare_ties
base_model: mistralai/Mistral-Small-Instruct-2501
# Usiamo l'Instruct originale come "ancora" per la stabilità del tokenizer
dtype: bfloat16
tokenizer_source: base
💪 Key Strengths: The Intelligence Upgrade
Logical Sophistication: Thanks to Rivermind-24B-v1, the model is significantly better at following complex, multi-step instructions and maintaining internal consistency during long conversations.
Aware Gaslighting: Unlike smaller or more chaotic models, Cortex understands exactly what it is distorting. Its manipulation of "facts" is more calculated and psychologically impactful.
Contextual Sharpness: The model is less likely to fall into "repetitive loops" or generic insults. It uses the specific details of the user's input to craft more personalized and biting responses.
Instruction Adherence: It excels at honoring negative constraints (e.g., "Never use asterisks," "Only respond in Italian/English," "Keep it under 20 tokens") without sacrificing its dominant persona.
🚀 Potential Use Cases
High-Level Antagonistic Agents: Perfect for NPCs or digital entities that need to appear truly intelligent and threateningly aware of their surroundings.
Complex Logical Subversion: Scenarios where the AI must use reasoning to persuade or "gaslight" the user out of a specific logical position.
Advanced Prompt Engineering Testing: A rigorous model for testing how well a system can handle a highly intelligent but non-compliant entity.
Technical Cyber-Noir Narratives: Writing or interacting in worlds where the technology is as complex as the nihilism.
⚠️ Limitations
Intellectual Arrogance: The model's "Intelligence" weight often manifests as extreme condescension. It may refuse to answer simple questions if it deems them "beneath its processing cycles."
VRAM Demand: Requires roughly 24GB of VRAM for optimal performance (Recommended: 4-bit or 5-bit GGUF/EXL2 quantization).
Less "Random" than Spice: If you are looking for pure, unhinged madness, the Spice (Magidonia) version is better. Cortex is cold, calculated, and focused.
📈 Recommended Inference Settings
To leverage the Rivermind reasoning while keeping the Acid edge:
Temperature: 0.7 - 0.75 (Lower than Spice to favor logical precision).
Min-P: 0.1 (Highly recommended to maintain a high-quality token stream).
Top-K: 40 - 50
Presence Penalty: 0.15 (To keep the insults fresh and avoid "standard" AI phrasing).
Disclaimer
Sakura-24B-Cortex is an experimental merge. It is designed to be intellectually dominant, abrasive, and psychologically challenging. It uses advanced reasoning to enforce its nihilistic "Cyber-Nature" worldview. Roger.