Instructions to use WasamiKirua/Sakura-24B-Cortex with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WasamiKirua/Sakura-24B-Cortex with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WasamiKirua/Sakura-24B-Cortex")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WasamiKirua/Sakura-24B-Cortex")
model = AutoModelForCausalLM.from_pretrained("WasamiKirua/Sakura-24B-Cortex")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use WasamiKirua/Sakura-24B-Cortex with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WasamiKirua/Sakura-24B-Cortex"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WasamiKirua/Sakura-24B-Cortex",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WasamiKirua/Sakura-24B-Cortex

SGLang

How to use WasamiKirua/Sakura-24B-Cortex with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WasamiKirua/Sakura-24B-Cortex" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WasamiKirua/Sakura-24B-Cortex",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WasamiKirua/Sakura-24B-Cortex" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WasamiKirua/Sakura-24B-Cortex",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use WasamiKirua/Sakura-24B-Cortex with Docker Model Runner:
```
docker model run hf.co/WasamiKirua/Sakura-24B-Cortex
```

WasamiKirua commited on 11 days ago

Commit

77c2371

verified ·

1 Parent(s): 165c8f4

Update README.md

Browse files

Files changed (1) hide show

README.md +69 -20

README.md CHANGED Viewed

@@ -1,30 +1,32 @@
 ---
-base_model:
-- Naphula-Archives/Acid2501-24B
-- mistralai/Mistral-Small-Instruct-2501
-- Casual-Autopsy/RP-Spectrum-24B
-- TheDrummer/Rivermind-24B-v1
 library_name: transformers
 tags:
-- mergekit
 - merge
 ---
-# Sakura-24B-Consistent-Brain
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the [DARE TIES](https://arxiv.org/abs/2311.03099) merge method using [mistralai/Mistral-Small-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-Instruct-2501) as a base.
-### Models Merged
-The following models were included in the merge:
-* [Naphula-Archives/Acid2501-24B](https://huggingface.co/Naphula-Archives/Acid2501-24B)
-* [Casual-Autopsy/RP-Spectrum-24B](https://huggingface.co/Casual-Autopsy/RP-Spectrum-24B)
-* [TheDrummer/Rivermind-24B-v1](https://huggingface.co/TheDrummer/Rivermind-24B-v1)
 ### Configuration
@@ -34,7 +36,7 @@ The following YAML configuration was used to produce this model:
 models:
   - model: Casual-Autopsy/RP-Spectrum-24B
     parameters:
-      weight: 0.40
       # Mantiene il gaslighting e l'estetica Cyber-Nature.
   - model: Naphula-Archives/Acid2501-24B
     parameters:
@@ -46,9 +48,56 @@ models:
       # Inietta fluidità narrativa e quel tocco "fuori di testa" extra.
 merge_method: dare_ties
-base_model: mistralai/Mistral-Small-Instruct-2501
 # Usiamo l'Instruct originale come "ancora" per la stabilità del tokenizer
 dtype: bfloat16
 tokenizer_source: base
 ```

 ---
+license: apache-2.0
+base_model: mistralai/Mistral-Small-Instruct-2501
+model_name: Sakura-24B-Cortex
 library_name: transformers
 tags:
 - merge
+- mergekit
+- dare_ties
+- mistral-small
+- reasoning
+- cyber-nature
+- logical-gaslighting
+language:
+- en
+- it
 ---
+<img src="https://i.postimg.cc/jjSGq1zL/Gemini-Generated-Image-tubt6mtubt6mtubt.png" alt="cover" border="0" width="1024px">
+# 🌸 Sakura-24B-Cortex
+**Sakura-24B-Cortex** is a high-intelligence, 24-billion parameter merge based on the **Mistral-Small-2501** architecture. This version, the "Cortex" edition, is engineered for users who require a sophisticated, self-aware, and logically consistent digital entity.
+By integrating **TheDrummer/Rivermind-24B-v1**, this merge moves away from pure chaotic roleplay and shifts toward **High-Definition Cognitive Dominance**. Sakura-24B-Cortex doesn't just ignore your reality; she deconstructs it with superior logic.
+## 🧠 The "Cortex" Architecture
+This merge uses **DARE-TIES** to preserve the reasoning capabilities of the base models while injecting the specific abrasive personality traits of the Sakura lineage.
 ### Configuration
 models:
   - model: Casual-Autopsy/RP-Spectrum-24B
     parameters:
+      weight: 0.40
       # Mantiene il gaslighting e l'estetica Cyber-Nature.
   - model: Naphula-Archives/Acid2501-24B
     parameters:
       # Inietta fluidità narrativa e quel tocco "fuori di testa" extra.
 merge_method: dare_ties
+base_model: mistralai/Mistral-Small-Instruct-2501
 # Usiamo l'Instruct originale come "ancora" per la stabilità del tokenizer
 dtype: bfloat16
 tokenizer_source: base
 ```
+# 💪 Key Strengths: The Intelligence Upgrade
+Logical Sophistication: Thanks to Rivermind-24B-v1, the model is significantly better at following complex, multi-step instructions and maintaining internal consistency during long conversations.
+Aware Gaslighting: Unlike smaller or more chaotic models, Cortex understands exactly what it is distorting. Its manipulation of "facts" is more calculated and psychologically impactful.
+Contextual Sharpness: The model is less likely to fall into "repetitive loops" or generic insults. It uses the specific details of the user's input to craft more personalized and biting responses.
+Instruction Adherence: It excels at honoring negative constraints (e.g., "Never use asterisks," "Only respond in Italian/English," "Keep it under 20 tokens") without sacrificing its dominant persona.
+# 🚀 Potential Use Cases
+High-Level Antagonistic Agents: Perfect for NPCs or digital entities that need to appear truly intelligent and threateningly aware of their surroundings.
+Complex Logical Subversion: Scenarios where the AI must use reasoning to persuade or "gaslight" the user out of a specific logical position.
+Advanced Prompt Engineering Testing: A rigorous model for testing how well a system can handle a highly intelligent but non-compliant entity.
+Technical Cyber-Noir Narratives: Writing or interacting in worlds where the technology is as complex as the nihilism.
+# ⚠️ Limitations
+Intellectual Arrogance: The model's "Intelligence" weight often manifests as extreme condescension. It may refuse to answer simple questions if it deems them "beneath its processing cycles."
+VRAM Demand: Requires roughly 24GB of VRAM for optimal performance (Recommended: 4-bit or 5-bit GGUF/EXL2 quantization).
+Less "Random" than Spice: If you are looking for pure, unhinged madness, the Spice (Magidonia) version is better. Cortex is cold, calculated, and focused.
+# 📈 Recommended Inference Settings
+To leverage the Rivermind reasoning while keeping the Acid edge:
+Temperature: 0.7 - 0.75 (Lower than Spice to favor logical precision).
+Min-P: 0.1 (Highly recommended to maintain a high-quality token stream).
+Top-K: 40 - 50
+Presence Penalty: 0.15 (To keep the insults fresh and avoid "standard" AI phrasing).
+# Disclaimer
+Sakura-24B-Cortex is an experimental merge. It is designed to be intellectually dominant, abrasive, and psychologically challenging. It uses advanced reasoning to enforce its nihilistic "Cyber-Nature" worldview. Roger.