Instructions to use jsantillana/vectrayx-nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jsantillana/vectrayx-nano with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jsantillana/vectrayx-nano",
	filename="vectrayx-nano-f16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use jsantillana/vectrayx-nano with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jsantillana/vectrayx-nano:F16
# Run inference directly in the terminal:
llama-cli -hf jsantillana/vectrayx-nano:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jsantillana/vectrayx-nano:F16
# Run inference directly in the terminal:
llama-cli -hf jsantillana/vectrayx-nano:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jsantillana/vectrayx-nano:F16
# Run inference directly in the terminal:
./llama-cli -hf jsantillana/vectrayx-nano:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jsantillana/vectrayx-nano:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jsantillana/vectrayx-nano:F16

Use Docker

docker model run hf.co/jsantillana/vectrayx-nano:F16

LM Studio
Jan

vLLM

How to use jsantillana/vectrayx-nano with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jsantillana/vectrayx-nano"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jsantillana/vectrayx-nano",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jsantillana/vectrayx-nano:F16

Ollama
How to use jsantillana/vectrayx-nano with Ollama:
```
ollama run hf.co/jsantillana/vectrayx-nano:F16
```

Unsloth Studio new

How to use jsantillana/vectrayx-nano with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jsantillana/vectrayx-nano to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jsantillana/vectrayx-nano to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jsantillana/vectrayx-nano to start chatting

Docker Model Runner
How to use jsantillana/vectrayx-nano with Docker Model Runner:
```
docker model run hf.co/jsantillana/vectrayx-nano:F16
```

Lemonade

How to use jsantillana/vectrayx-nano with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jsantillana/vectrayx-nano:F16

Run and chat with the model

lemonade run user.vectrayx-nano-F16

List all available models

lemonade list

vectrayx-nano / README.md

jsantillana

Update README.md

1b4156b verified 2 days ago

preview code

raw

history blame contribute delete

5.94 kB

	---
	datasets:
	- vectrayx/vectrayx-bench
	language:
	- es
	license: apache-2.0
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-generation
	tags:
	- cybersecurity
	- spanish
	- tool-use
	- mcp
	- curriculum-learning
	- from-scratch
	- arxiv:2605.13989
	---

	# VectraYX-Nano

	VectraYX-Nano is a 42M-parameter Spanish cybersecurity language model trained from scratch with curriculum learning and native [Model Context Protocol (MCP)](https://modelcontextprotocol.io) tool use. It is, to our knowledge, the first published Spanish-native cybersecurity LLM with end-to-end MCP integration.

	[![arXiv](https://img.shields.io/badge/arXiv-2605.13989-b31b1b.svg)](https://arxiv.org/abs/2605.13989)
	[![Zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.20122226.svg)](https://doi.org/10.5281/zenodo.20122226)

	- Paper: [VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use](https://arxiv.org/abs/2605.13989)
	- Repository: [vectrayx/vectrayx-nano-paper](https://github.com/vectrayx/vectrayx-nano-paper)
	- arXiv DOI: https://doi.org/10.48550/arXiv.2605.13989
	- Author website: https://jsantillana.com

	---

	## Released Model: VectraYX-Nano v7 (Headline)

	VectraYX-Nano v7 is the released headline model. It uses the same 42M architecture and three-phase curriculum pre-training as the v2 bootstrap-ablation reference, with the SFT corpus rebalanced to a tool-use ratio of 1:21 (vs. 1:211 in v2). This single change raises B4 (tool-selection) from 0.000 to 0.230 ± 0.052 across N=4 seeds while retaining strong CVE recall (B1=0.332±0.005) and conversational quality (B5=0.725±0.130).

	Files in this repo:
	\| File \| Description \|
	\|---\|---\|
	\| `nano_sft_v7_s42.pt` \| Nano v7 seed 42 — recommended for inference \|
	\| `nano_sft_v5.pt` \| Nano v2 (mixed SFT, bootstrap-ablation reference) \|
	\| `vectrayx-nano-f16.gguf` \| F16 GGUF — run with llama.cpp / Ollama \|
	\| `lora/nano_lora_mini_s{42,7,13,23}.pt` \| LoRA adapters (tool-use density study, ratio 1:21) \|
	\| `tokenizer/vectrayx_bpe.model` \| BPE-16384 tokenizer \|
	\| `configs/nano.json` \| Nano 42M architecture config \|
	\| `configs/base.json` \| Base 260M architecture config \|

	---

	## Key Results (VectraYX-Bench, N=4 seeds)

	\| Model \| Params \| B1 KW \| B2 F1† \| B3 TM \| B4 Tool \| B5 Chat \|
	\|---\|---\|---\|---\|---\|---\|---\|
	\| VectraYX-Nano v7 (headline) \| 42M \| 0.332±0.005 \| — \| — \| 0.230±0.052 \| 0.725±0.130 \|
	\| VectraYX-Nano v2 (bootstrap ablation) \| 42M \| 0.226±0.065 \| 0.199±0.004 \| 0.029±0.035 \| 0.000 \| 0.775±0.043 \|
	\| Nano LoRA mini (ratio 1:21, N=4) \| 42M \| 0.011±0.004 \| 0.201±0.002 \| 0.021±0.012 \| 0.145±0.046 \| 0.575±0.043 \|
	\| SmolLM2-135M + LoRA-32 \| 135M \| 0.334 \| 0.225 \| 0.143 \| 0.160 \| 0.800 \|
	\| VectraYX-Base 260M \| 260M \| 0.325 \| 0.220 \| 0.114 \| 0.000 \| 0.800 \|
	\| Base 260M LoRA mini (ratio 1:21, N=4) \| 260M \| 0.019±0.003 \| 0.203±0.002 \| — \| 0.445±0.201 \| 0.600 \|
	\| VectraYX-Pro 3B \| 3.2B \| 0.341 \| 0.695 \| 0.686 \| 0.600 \| 0.800 \|
	\| VectraYX-Pro 7B \| 7B \| 0.335 \| 0.815 \| 0.686 \| 0.880 \| 0.800 \|
	\| GPT-4o (frontier reference) \| — \| 0.333 \| 0.110 \| 0.520 \| 0.615 \| 0.631 \|

	†B2 is a benchmark artifact in this revision (key mismatch in harness, fix queued).

	B5 inversion: Nano v7 (0.725±0.130) and Nano v2 (0.775±0.043) both exceed GPT-4o (0.631) on the 314-prompt held-out chat suite — the register-matched bootstrap corpus makes conversational Spanish the model's first language.

	---

	## Key Findings

	1. Loss-vs-register inversion. A higher-perplexity bootstrap corpus (OpenSubtitles-ES) yields better post-SFT chat behavior than a lower-perplexity alternative (mC4-ES). At the nano scale, the bootstrap corpus dictates the model's default response style; SFT cannot fully overwrite it.

	2. Tool-use is corpus-density-gated, not capacity-gated. The B4=0.000 floor in the mixed SFT (ratio 1:211) is a corpus-density artifact. Rebalancing to 1:21 (2,801 tool-use examples) shifts the first-token prior to `<\|tool_call\|>` and raises B4 to 0.230±0.052 at 42M — without retraining the backbone.

	---

	## Inference: llama.cpp / Ollama (GGUF)

	```bash
	# With Ollama
	ollama run hf.co/jsantillana/vectrayx-nano:vectrayx-nano-f16.gguf

	# With llama.cpp
	./llama-cli -m vectrayx-nano-f16.gguf \
	--chat-template llama3 \
	-p "<\|system\|>Eres VectraYX, asistente experto en ciberseguridad para LATAM.<\|end\|>" \
	-i
	```

	Runs at 6–10 tok/s on Raspberry Pi 4 and 60–100 tok/s on a laptop CPU.

	---

	## Inference: PyTorch

	```python
	from huggingface_hub import hf_hub_download
	import torch, json, sys

	sys.path.insert(0, ".") # needs training/transformer.py from vectrayx-paper-code

	ckpt = hf_hub_download("jsantillana/vectrayx-nano", "nano_sft_v7_s42.pt")
	tok = hf_hub_download("jsantillana/vectrayx-nano", "tokenizer/vectrayx_bpe.model")
	cfg = hf_hub_download("jsantillana/vectrayx-nano", "configs/nano.json")
	```

	Full inference script at [vectrayx-paper-code](https://huggingface.co/jsantillana/vectrayx-paper-code).

	---

	## Training Details

	\| Component \| Details \|
	\|---\|---\|
	\| Parameters \| 41.95M \|
	\| Architecture \| Transformer decoder, GQA (8q/2kv), QK-Norm, RMSNorm, SwiGLU, RoPE, z-loss \|
	\| Tokenizer \| BPE-16384, byte-fallback, 50/50 conv/tech balance \|
	\| Pre-training \| 170M tokens, 3-phase curriculum with 25% replay buffer \|
	\| SFT (v7) \| 13K OASST1-ES + 4K CVE Q&A + 2.8K tool-use (ratio 1:21) \|
	\| Hardware \| GCP L4 24GB (pre-training) + AWS g4dn.xlarge T4 16GB (multi-seed SFT) \|
	\| Cost \| ~$29 USD total (corpus + training) \|

	---

	## Citation

	```bibtex
	@misc{santillana2026vectrayx,
	title = {VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model
	with Curriculum Learning and Native Tool Use},
	author = {Santillana, Juan S.},
	year = {2026},
	eprint = {2605.13989},
	archivePrefix = {arXiv},
	primaryClass = {cs.CL},
	url = {https://arxiv.org/abs/2605.13989}
	}
	```