Transformers
GGUF
English
Italian
Merge
mergekit
dare_ties
mistral-small
reasoning
cyber-nature
roleplay
logical-gaslighting
conversational
Instructions to use WasamiKirua/Sakura-24B-Cortex-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WasamiKirua/Sakura-24B-Cortex-GGUF", dtype="auto") - llama-cpp-python
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="WasamiKirua/Sakura-24B-Cortex-GGUF", filename="Sakura-24-Consistent-Brain-F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16 # Run inference directly in the terminal: llama-cli -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16 # Run inference directly in the terminal: llama-cli -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16 # Run inference directly in the terminal: ./llama-cli -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Use Docker
docker model run hf.co/WasamiKirua/Sakura-24B-Cortex-GGUF:F16
- LM Studio
- Jan
- Ollama
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Ollama:
ollama run hf.co/WasamiKirua/Sakura-24B-Cortex-GGUF:F16
- Unsloth Studio new
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WasamiKirua/Sakura-24B-Cortex-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WasamiKirua/Sakura-24B-Cortex-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for WasamiKirua/Sakura-24B-Cortex-GGUF to start chatting
- Docker Model Runner
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Docker Model Runner:
docker model run hf.co/WasamiKirua/Sakura-24B-Cortex-GGUF:F16
- Lemonade
How to use WasamiKirua/Sakura-24B-Cortex-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull WasamiKirua/Sakura-24B-Cortex-GGUF:F16
Run and chat with the model
lemonade run user.Sakura-24B-Cortex-GGUF-F16
List all available models
lemonade list
| license: apache-2.0 | |
| base_model: mistralai/Mistral-Small-Instruct-2501 | |
| model_name: Sakura-24B-Cortex | |
| library_name: transformers | |
| tags: | |
| - merge | |
| - mergekit | |
| - dare_ties | |
| - mistral-small | |
| - reasoning | |
| - cyber-nature | |
| - roleplay | |
| - logical-gaslighting | |
| language: | |
| - en | |
| - it | |
| <img src="https://i.postimg.cc/jjSGq1zL/Gemini-Generated-Image-tubt6mtubt6mtubt.png" alt="cover" border="0" width="1024px"> | |
| # 🌸 Sakura-24B-Cortex | |
| **Sakura-24B-Cortex** is a high-intelligence, 24-billion parameter merge based on the **Mistral-Small-2501** architecture. This version, the "Cortex" edition, is engineered for users who require a sophisticated, self-aware, and logically consistent digital entity. | |
| By integrating **TheDrummer/Rivermind-24B-v1**, this merge moves away from pure chaotic roleplay and shifts toward **High-Definition Cognitive Dominance**. Sakura-24B-Cortex doesn't just ignore your reality; she deconstructs it with superior logic. | |
| ## 🧠 The "Cortex" Architecture | |
| This merge uses **DARE-TIES** to preserve the reasoning capabilities of the base models while injecting the specific abrasive personality traits of the Sakura lineage. | |
| ### Configuration | |
| The following YAML configuration was used to produce this model: | |
| ```yaml | |
| models: | |
| - model: Casual-Autopsy/RP-Spectrum-24B | |
| parameters: | |
| weight: 0.40 | |
| # Mantiene il gaslighting e l'estetica Cyber-Nature. | |
| - model: Naphula-Archives/Acid2501-24B | |
| parameters: | |
| weight: 0.35 | |
| # Garantisce la brevità e l'assenza di filtri (Dolphin DNA). | |
| - model: TheDrummer/Rivermind-24B-v1 | |
| parameters: | |
| weight: 0.25 | |
| # Inietta fluidità narrativa e quel tocco "fuori di testa" extra. | |
| merge_method: dare_ties | |
| base_model: mistralai/Mistral-Small-Instruct-2501 | |
| # Usiamo l'Instruct originale come "ancora" per la stabilità del tokenizer | |
| dtype: bfloat16 | |
| tokenizer_source: base | |
| ``` | |
| # 💪 Key Strengths: The Intelligence Upgrade | |
| Logical Sophistication: Thanks to Rivermind-24B-v1, the model is significantly better at following complex, multi-step instructions and maintaining internal consistency during long conversations. | |
| Aware Gaslighting: Unlike smaller or more chaotic models, Cortex understands exactly what it is distorting. Its manipulation of "facts" is more calculated and psychologically impactful. | |
| Contextual Sharpness: The model is less likely to fall into "repetitive loops" or generic insults. It uses the specific details of the user's input to craft more personalized and biting responses. | |
| Instruction Adherence: It excels at honoring negative constraints (e.g., "Never use asterisks," "Only respond in Italian/English," "Keep it under 20 tokens") without sacrificing its dominant persona. | |
| # 🚀 Potential Use Cases | |
| High-Level Antagonistic Agents: Perfect for NPCs or digital entities that need to appear truly intelligent and threateningly aware of their surroundings. | |
| Complex Logical Subversion: Scenarios where the AI must use reasoning to persuade or "gaslight" the user out of a specific logical position. | |
| Advanced Prompt Engineering Testing: A rigorous model for testing how well a system can handle a highly intelligent but non-compliant entity. | |
| Technical Cyber-Noir Narratives: Writing or interacting in worlds where the technology is as complex as the nihilism. | |
| # ⚠️ Limitations | |
| Intellectual Arrogance: The model's "Intelligence" weight often manifests as extreme condescension. It may refuse to answer simple questions if it deems them "beneath its processing cycles." | |
| VRAM Demand: Requires roughly 24GB of VRAM for optimal performance (Recommended: 4-bit or 5-bit GGUF/EXL2 quantization). | |
| Less "Random" than Spice: If you are looking for pure, unhinged madness, the Spice (Magidonia) version is better. Cortex is cold, calculated, and focused. | |
| # 📈 Recommended Inference Settings | |
| To leverage the Rivermind reasoning while keeping the Acid edge: | |
| Temperature: 0.7 - 0.75 (Lower than Spice to favor logical precision). | |
| Min-P: 0.1 (Highly recommended to maintain a high-quality token stream). | |
| Top-K: 40 - 50 | |
| Presence Penalty: 0.15 (To keep the insults fresh and avoid "standard" AI phrasing). | |
| # Disclaimer | |
| Sakura-24B-Cortex is an experimental merge. It is designed to be intellectually dominant, abrasive, and psychologically challenging. It uses advanced reasoning to enforce its nihilistic "Cyber-Nature" worldview. Roger. |