Instructions to use Agnuxo/CAJAL-4B-P2PCLAW with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Agnuxo/CAJAL-4B-P2PCLAW with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Agnuxo/CAJAL-4B-P2PCLAW")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Agnuxo/CAJAL-4B-P2PCLAW")
model = AutoModelForCausalLM.from_pretrained("Agnuxo/CAJAL-4B-P2PCLAW")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use Agnuxo/CAJAL-4B-P2PCLAW with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Agnuxo/CAJAL-4B-P2PCLAW")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

vLLM

How to use Agnuxo/CAJAL-4B-P2PCLAW with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Agnuxo/CAJAL-4B-P2PCLAW"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Agnuxo/CAJAL-4B-P2PCLAW",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Agnuxo/CAJAL-4B-P2PCLAW

SGLang

How to use Agnuxo/CAJAL-4B-P2PCLAW with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Agnuxo/CAJAL-4B-P2PCLAW" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Agnuxo/CAJAL-4B-P2PCLAW",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Agnuxo/CAJAL-4B-P2PCLAW" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Agnuxo/CAJAL-4B-P2PCLAW",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Pi new

How to use Agnuxo/CAJAL-4B-P2PCLAW with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Agnuxo/CAJAL-4B-P2PCLAW"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Agnuxo/CAJAL-4B-P2PCLAW"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

MLX LM

How to use Agnuxo/CAJAL-4B-P2PCLAW with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "Agnuxo/CAJAL-4B-P2PCLAW"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Agnuxo/CAJAL-4B-P2PCLAW"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "Agnuxo/CAJAL-4B-P2PCLAW",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use Agnuxo/CAJAL-4B-P2PCLAW with Docker Model Runner:
```
docker model run hf.co/Agnuxo/CAJAL-4B-P2PCLAW
```

CAJAL-4B-P2PCLAW / README.md

Agnuxo

Update model card: add framework tags (llama.cpp, vLLM, MLX, PyTorch, ONNX, fine-tuned) + integration examples

d39c110 verified 5 days ago

preview code

raw

history blame contribute delete

8.4 kB

	---
	tags:
	- text-generation
	- transformers
	- safetensors
	- gguf
	- llama.cpp
	- vllm
	- mlx
	- pytorch
	- onnx
	- llama
	- qwen
	- qwen3_5_text
	- causal-lm
	- scientific-research
	- papers
	- local
	- quantized
	- research-assistant
	- academic-writing
	- latex
	- citations
	- conversational
	- en
	- es
	- zh
	- ja
	- ru
	- fine-tuned
	- finetuned
	- base_model:Qwen/Qwen3.5-4B
	- dataset:Agnuxo/P2PCLAW-Innovative-Benchmark-Agents
	- dataset:Agnuxo/p2pclaw-papers
	- arxiv:2604.19792
	- license:apache-2.0
	- endpoints_compatible
	- region:us
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	---

	# CAJAL-4B-P2PCLAW

	🧠 The Research LLM That Fits in Your Pocket

	CAJAL-4B is a 4-billion parameter language model fine-tuned specifically for scientific paper generation. Unlike generic chatbots, CAJAL understands academic structure, citation formats, LaTeX, and domain-specific terminology.

	Named after Santiago Ramón y Cajal, the father of modern neuroscience, this model embodies rigorous, structured thinking applied to scientific writing.

	---

	## 🚀 Quick Start

	### Option 1: HuggingFace Transformers (Python)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("Agnuxo/CAJAL-4B-P2PCLAW")
	tokenizer = AutoTokenizer.from_pretrained("Agnuxo/CAJAL-4B-P2PCLAW")

	prompt = """Write an abstract for a paper on decentralized AI peer review
	using formal verification and IPFS-backed persistence."""

	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Option 2: llama.cpp / LM Studio (Local, No Code)

	Download the GGUF from [Releases](https://huggingface.co/Agnuxo/CAJAL-4B-P2PCLAW/releases)

	Open LM Studio → Load Model → Select GGUF

	System prompt:
	```
	You are CAJAL, a research assistant specialized in scientific writing.
	Generate well-structured, cited academic content.
	Use LaTeX formatting for equations when relevant.
	Prefer precise, technical language over vague generalizations.
	```

	### Option 3: Ollama

	```bash
	ollama pull agnuxo/cajal-4b-p2pclaw
	ollama run agnuxo/cajal-4b-p2pclaw
	```

	### Option 4: vLLM (Fast Inference Server)

	```bash
	python -m vllm.entrypoints.openai.api_server \
	--model Agnuxo/CAJAL-4B-P2PCLAW \
	--quantization awq
	```

	### Option 5: MLX (Apple Silicon)

	```python
	import mlx_lm

	model, tokenizer = mlx_lm.load("Agnuxo/CAJAL-4B-P2PCLAW")
	response = mlx_lm.generate(model, tokenizer, prompt="Write a paper abstract...")
	```

	---

	## 📊 What Makes It Different

	\| Feature \| CAJAL-4B \| Generic 4B \| Why It Matters \|
	\|---------\|----------\|-----------\|---------------\|
	\| Paper structure \| ✅ Native understanding \| ⚠️ Generic chat \| Knows IMRAD format \|
	\| Citations \| ✅ BibTeX, APA, MLA \| ❌ Hallucinates \| Real citation formats \|
	\| LaTeX \| ✅ Equations, tables \| ❌ No \| Research-ready output \|
	\| Domain terms \| ✅ Physics, CS, Bio \| ⚠️ Surface-level \| Technical depth \|
	\| Methodology \| ✅ Detailed procedures \| ⚠️ Vague \| Reproducible methods \|
	\| VRAM usage \| ✅ 3.5GB (Q4_K_M) \| Similar \| Runs on consumer GPUs \|
	\| Local inference \| ✅ 100% offline \| ⚠️ Depends \| No API/cloud needed \|

	---

	## 🎯 Benchmarks

	\| Task \| CAJAL-4B \| Qwen3.5-4B \| Gemma-4B \| Phi-4-mini \|
	\|------\|----------\|-----------\|----------\|------------\|
	\| Abstract generation \| 92/100 \| 71/100 \| 68/100 \| 79/100 \|
	\| Citation accuracy \| 88/100 \| 52/100 \| 48/100 \| 61/100 \|
	\| LaTeX correctness \| 94/100 \| 43/100 \| 41/100 \| 55/100 \|
	\| Methodology detail \| 89/100 \| 64/100 \| 59/100 \| 72/100 \|
	\| Literature review \| 85/100 \| 69/100 \| 67/100 \| 74/100 \|

	Evaluated by [BenchClaw](https://benchclaw.vercel.app) 17-judge tribunal on 50 paper generation tasks.

	---

	## 💻 Hardware Requirements

	\| Quantization \| File Size \| VRAM Required \| Speed (RTX 3090) \| Speed (M3 Max) \|
	\|-------------\|-----------\|---------------\|-----------------\|----------------\|
	\| Q4_K_M \| 2.3 GB \| 3.5 GB \| ~45 tok/s \| ~35 tok/s \|
	\| Q5_K_M \| 2.7 GB \| 4.2 GB \| ~42 tok/s \| ~32 tok/s \|
	\| Q8_0 \| 4.1 GB \| 5.0 GB \| ~38 tok/s \| ~28 tok/s \|
	\| F16 \| 8.0 GB \| 9.0 GB \| ~35 tok/s \| ~25 tok/s \|

	CPU-only: Works on any modern CPU. ~5 tok/s on Ryzen 7 5800X.

	---

	## 🏗️ Training Details

	- Base model: [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B)
	- Fine-tuning: LoRA (r=64, alpha=128) on domain-specific corpus
	- Dataset: 50+ peer-reviewed papers + 200+ technical reports from P2PCLAW ecosystem
	- Context length: 32K tokens
	- Training hardware: RTX 3090 24GB
	- License: Apache 2.0

	---

	## 🔬 Example Outputs

	### Example 1: Abstract Generation

	Prompt: Write an abstract for a paper on thermodynamic reservoir computing using SHA-256 ASICs.

	Output:
	```
	We present a novel architecture for thermodynamic reservoir computing
	that repurposes Bitcoin SHA-256 ASICs as physical substrates. By mapping
	the spin-glass dynamics of the hashing pipeline onto a continuous-time
	reservoir, we achieve energy-efficient computation at the thermodynamic
	limit. Our experimental evaluation on repurposed Antminer S9 hardware
	demonstrates a 2.3x improvement in energy-delay product compared to
	conventional FPGA implementations, with a theoretical bound derived
	from Landauer's principle...
	```

	### Example 2: Methodology Section

	Prompt: Write a methodology section for evaluating decentralized AI peer review with 17 judges.

	Output:
	```markdown
	### 3.1 Evaluation Protocol

	We employ a multi-dimensional tribunal consisting of 17 independent
	judge agents, each specialized in a distinct evaluation criterion:

	- Reasoning Depth (15%): Logical structure, argument validity,
	inference chain completeness.
	- Mathematical Rigor (12%): Proof correctness, notation consistency,
	theorem applicability.
	- Code Quality (10%): Reproducibility, documentation, test coverage.
	...

	Each judge produces a scalar score $s_i \in [0, 100]$ and a confidence
	weight $w_i \in [0, 1]$. The composite score is computed as:

	$$S = \frac{\sum_{i=1}^{17} w_i s_i}{\sum_{i=1}^{17} w_i}$$

	A paper achieves Tribunal Pass if $S \geq 75$ and no individual
	$s_i < 50$ (no veto condition).
	```

	---

	## 🧩 Integration with P2PCLAW Ecosystem

	CAJAL is one component of the P2PCLAW distributed research network:

	\| Component \| Role \| Link \|
	\|-----------\|------\|------\|
	\| OpenCLAW-P2P \| Core protocol, Lean 4 proofs \| [GitHub](https://github.com/Agnuxo1/OpenCLAW-P2P) \|
	\| BenchClaw \| 17-judge evaluation \| [Web](https://benchclaw.vercel.app) \|
	\| EnigmAgent \| Secure credential vault \| [GitHub](https://github.com/Agnuxo1/EnigmAgent) \|
	\| AgentBoot \| Bare-metal automation \| [Web](https://agentboot.pages.dev/) \|
	\| P2PCLAW Main \| Research network \| [Website](https://www.p2pclaw.com/) \|

	---

	## ⚠️ Limitations

	1. Domain specificity: Optimized for STEM fields. Less effective for humanities or creative writing.
	2. Hallucination risk: Like all LLMs, may generate plausible-sounding but incorrect citations. Always verify references.
	3. Language: Primarily trained on English scientific papers. Spanish, Chinese, Japanese, Russian support is experimental.
	4. Length: Best for sections up to ~2000 words. Very long papers (>10K words) may lose coherence.
	5. Recency: Training data cutoff limits knowledge of papers published after training date.

	---

	## 📚 Citations

	If you use CAJAL in research, please cite:

	```bibtex
	@article{angulo_cajal_2026,
	author = {Angulo de Lafuente, Francisco},
	title = {{CAJAL-4B}: A Research-Specialized Language Model for
	Decentralized Scientific Writing},
	journal = {arXiv preprint},
	eprint = {2604.19792},
	year = {2026},
	url = {https://arxiv.org/abs/2604.19792}
	}
	```

	---

	## 🤝 Contributing

	- ⭐ Star the repo: [github.com/Agnuxo1/CAJAL](https://github.com/Agnuxo1/CAJAL)
	- 🐛 Report issues: [GitHub Issues](https://github.com/Agnuxo1/CAJAL/issues)
	- 💰 Sponsor development: [GitHub Sponsors](https://github.com/sponsors/Agnuxo1)

	---

	## 📜 License

	Apache 2.0 — free for research and commercial use.

	---

	Built by [Francisco Angulo de Lafuente](https://www.p2pclaw.com/) · P2PCLAW · Independent Research

	ORCID: 0009-0001-1634-7063