Instructions to use jsantillana/vectrayx-nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jsantillana/vectrayx-nano with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jsantillana/vectrayx-nano",
	filename="vectrayx-nano-v14-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use jsantillana/vectrayx-nano with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jsantillana/vectrayx-nano:F16
# Run inference directly in the terminal:
llama-cli -hf jsantillana/vectrayx-nano:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jsantillana/vectrayx-nano:F16
# Run inference directly in the terminal:
llama-cli -hf jsantillana/vectrayx-nano:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jsantillana/vectrayx-nano:F16
# Run inference directly in the terminal:
./llama-cli -hf jsantillana/vectrayx-nano:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jsantillana/vectrayx-nano:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jsantillana/vectrayx-nano:F16

Use Docker

docker model run hf.co/jsantillana/vectrayx-nano:F16

LM Studio
Jan

vLLM

How to use jsantillana/vectrayx-nano with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jsantillana/vectrayx-nano"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jsantillana/vectrayx-nano",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jsantillana/vectrayx-nano:F16

Ollama
How to use jsantillana/vectrayx-nano with Ollama:
```
ollama run hf.co/jsantillana/vectrayx-nano:F16
```

Unsloth Studio new

How to use jsantillana/vectrayx-nano with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jsantillana/vectrayx-nano to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jsantillana/vectrayx-nano to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jsantillana/vectrayx-nano to start chatting

Docker Model Runner
How to use jsantillana/vectrayx-nano with Docker Model Runner:
```
docker model run hf.co/jsantillana/vectrayx-nano:F16
```

Lemonade

How to use jsantillana/vectrayx-nano with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jsantillana/vectrayx-nano:F16

Run and chat with the model

lemonade run user.vectrayx-nano-F16

List all available models

lemonade list

jsantillana commited on 8 days ago

Commit

f5f5bbf

verified ·

1 Parent(s): c8590f5

update README: add v7 headline results, fix all numbers, add Zenodo v2 + arXiv badge, add GGUF usage

Browse files

Files changed (1) hide show

README.md +89 -32

README.md CHANGED Viewed

@@ -15,69 +15,126 @@ tags:
 - mcp
 - curriculum-learning
 - from-scratch
 ---
 # VectraYX-Nano
-VectraYX-Nano is a 42M-parameter Spanish cybersecurity language model trained from scratch with curriculum learning and native Model Context Protocol (MCP) tool use.
-- **Paper:** [VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use](https://huggingface.co/papers/2605.13989)
 - **Repository:** [vectrayx/vectrayx-nano-paper](https://github.com/vectrayx/vectrayx-nano-paper)
-## Key Results (VectraYX-Bench)
-| Model | Params | B1 KW | B2 F1 | B3 TM | B4 Tool | B5 |
 |---|---|---|---|---|---|---|
-| VectraYX-Nano v2 (N=4 seeds) | 42M | 0.228 ± 0.079 | 0.196 ± 0.005 | 0.029 ± 0.040 | 0.000 | 0.775 ± 0.050 |
-| **Nano + LoRA mini (N=4 seeds)** | 42M | 0.011 ± 0.004 | 0.201 ± 0.002 | 0.021 ± 0.012 | **0.145 ± 0.046** | 0.575 ± 0.043 |
 | VectraYX-Base 260M | 260M | 0.325 | 0.220 | 0.114 | 0.000 | 0.800 |
-| **Base + LoRA mini** | 260M | 0.025 | 0.200 | 0.000 | **0.580** | 0.600 |
 | VectraYX-Pro 3B | 3.2B | 0.341 | 0.695 | 0.686 | 0.600 | 0.800 |
 | VectraYX-Pro 7B | 7B | 0.335 | 0.815 | 0.686 | 0.880 | 0.800 |
-## Key Finding
-The B4=0.000 floor in mixed SFT is a **corpus-density artifact**, not a capacity gate.
-At ratio 1:21 (2,801 tool-use examples), Nano 42M achieves B4=0.145 ± 0.046 and
-Base 260M achieves B4=0.580.
-## Usage
-To use this model, please refer to the custom inference scripts provided in the official [GitHub repository](https://github.com/vectrayx/vectrayx-nano-paper).
 ```python
 from huggingface_hub import hf_hub_download
-import torch
-# Download checkpoint
-ckpt_path = hf_hub_download("vectrayx/vectrayx-nano", "nano_sft_v5.pt")
-tokenizer_path = hf_hub_download("vectrayx/vectrayx-nano", "tokenizer/vectrayx_bpe.model")
-config_path = hf_hub_download("vectrayx/vectrayx-nano", "configs/nano.json")
 ```
-## 📄 Paper & Technical Details
-Our research paper details the engineering constraints, the $25 USD data pipeline, the curriculum learning methodology, and empirical findings regarding Tool Use density at the nano-scale.
-**Title:** [VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use](https://arxiv.org/abs/2605.13989)
-**Authors:** Juan S. Santillana (Globant)
-**arXiv ID:** `2605.13989`
-### Key Discoveries from the Paper:
-- **Corpus Density vs. Model Size:** We empirically demonstrate that tool-selection failure in small models is a *corpus-density artifact*, not a capacity gate. Adjusting the tool-use to prose ratio from 1:211 to 1:21 allowed our 42M model to successfully use tools (B4 score improvement from 0.000 to 0.145).
-- **Loss-vs-Register Inversion:** We observed that models at the sub-Chinchilla scale cannot recover a conversational register if pre-trained exclusively on dense technical text, even when mathematical loss is low.
-- **Edge Inference:** The GGUF artifact is 81 MB (F16) and runs at sub-second Time-To-First-Token (TTFT) on commodity hardware.
-You can read the full preprint on [arXiv](https://arxiv.org/abs/2605.13989).
 ## Citation
 ```bibtex
-@inproceedings{santillana2026vectrayx,
   title     = {VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model
                with Curriculum Learning and Native Tool Use},
   author    = {Santillana, Juan S.},
-  booktitle = {Preprint},
-  year      = {2026}
 }
-```

 - mcp
 - curriculum-learning
 - from-scratch
+- arxiv:2605.13989
 ---
 # VectraYX-Nano
+VectraYX-Nano is a 42M-parameter Spanish cybersecurity language model trained **from scratch** with curriculum learning and native [Model Context Protocol (MCP)](https://modelcontextprotocol.io) tool use. It is, to our knowledge, the first published Spanish-native cybersecurity LLM with end-to-end MCP integration.
+[![arXiv](https://img.shields.io/badge/arXiv-2605.13989-b31b1b.svg)](https://arxiv.org/abs/2605.13989)
+[![Zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.20122226.svg)](https://doi.org/10.5281/zenodo.20122226)
+- **Paper:** [VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use](https://arxiv.org/abs/2605.13989)
 - **Repository:** [vectrayx/vectrayx-nano-paper](https://github.com/vectrayx/vectrayx-nano-paper)
+- **arXiv DOI:** https://doi.org/10.48550/arXiv.2605.13989
+---
+## Released Model: VectraYX-Nano v7 (Headline)
+**VectraYX-Nano v7** is the released headline model. It uses the same 42M architecture and three-phase curriculum pre-training as the v2 bootstrap-ablation reference, with the SFT corpus rebalanced to a tool-use ratio of 1:21 (vs. 1:211 in v2). This single change raises B4 (tool-selection) from 0.000 to **0.230 ± 0.052** across N=4 seeds while retaining strong CVE recall (B1=0.332±0.005) and conversational quality (B5=0.725±0.130).
+Files in this repo:
+| File | Description |
+|---|---|
+| `nano_sft_v7_s42.pt` | **Nano v7 seed 42 — recommended for inference** |
+| `nano_sft_v5.pt` | Nano v2 (mixed SFT, bootstrap-ablation reference) |
+| `vectrayx-nano-f16.gguf` | **F16 GGUF — run with llama.cpp / Ollama** |
+| `lora/nano_lora_mini_s{42,7,13,23}.pt` | LoRA adapters (tool-use density study, ratio 1:21) |
+| `tokenizer/vectrayx_bpe.model` | BPE-16384 tokenizer |
+| `configs/nano.json` | Nano 42M architecture config |
+| `configs/base.json` | Base 260M architecture config |
+---
+## Key Results (VectraYX-Bench, N=4 seeds)
+| Model | Params | B1 KW | B2 F1† | B3 TM | B4 Tool | B5 Chat |
 |---|---|---|---|---|---|---|
+| **VectraYX-Nano v7** *(headline)* | 42M | **0.332±0.005** | — | — | **0.230±0.052** | 0.725±0.130 |
+| VectraYX-Nano v2 *(bootstrap ablation)* | 42M | 0.226±0.065 | 0.199±0.004 | 0.029±0.035 | 0.000 | **0.775±0.043** |
+| Nano LoRA mini (ratio 1:21, N=4) | 42M | 0.011±0.004 | 0.201±0.002 | 0.021±0.012 | 0.145±0.046 | 0.575±0.043 |
+| SmolLM2-135M + LoRA-32 | 135M | 0.334 | 0.225 | 0.143 | 0.160 | 0.800 |
 | VectraYX-Base 260M | 260M | 0.325 | 0.220 | 0.114 | 0.000 | 0.800 |
+| Base 260M LoRA mini (ratio 1:21, N=4) | 260M | 0.019±0.003 | 0.203±0.002 | — | 0.445±0.201 | 0.600 |
 | VectraYX-Pro 3B | 3.2B | 0.341 | 0.695 | 0.686 | 0.600 | 0.800 |
 | VectraYX-Pro 7B | 7B | 0.335 | 0.815 | 0.686 | 0.880 | 0.800 |
+| GPT-4o *(frontier reference)* | — | 0.333 | 0.110 | 0.520 | 0.615 | 0.631 |
+†B2 is a benchmark artifact in this revision (key mismatch in harness, fix queued).
+**B5 inversion:** Nano v7 (0.725±0.130) and Nano v2 (0.775±0.043) both **exceed GPT-4o (0.631)** on the 314-prompt held-out chat suite — the register-matched bootstrap corpus makes conversational Spanish the model's first language.
+---
+## Key Findings
+**1. Loss-vs-register inversion.** A higher-perplexity bootstrap corpus (OpenSubtitles-ES) yields *better* post-SFT chat behavior than a lower-perplexity alternative (mC4-ES). At the nano scale, the bootstrap corpus dictates the model's default response style; SFT cannot fully overwrite it.
+**2. Tool-use is corpus-density-gated, not capacity-gated.** The B4=0.000 floor in the mixed SFT (ratio 1:211) is a corpus-density artifact. Rebalancing to 1:21 (2,801 tool-use examples) shifts the first-token prior to `<|tool_call|>` and raises B4 to 0.230±0.052 at 42M — without retraining the backbone.
+---
+## Inference: llama.cpp / Ollama (GGUF)
+```bash
+# With Ollama
+ollama run hf.co/jsantillana/vectrayx-nano:vectrayx-nano-f16.gguf
+# With llama.cpp
+./llama-cli -m vectrayx-nano-f16.gguf \
+  --chat-template llama3 \
+  -p "<|system|>Eres VectraYX, asistente experto en ciberseguridad para LATAM.<|end|>" \
+  -i
+```
+Runs at 6–10 tok/s on Raspberry Pi 4 and 60–100 tok/s on a laptop CPU.
+---
+## Inference: PyTorch
 ```python
 from huggingface_hub import hf_hub_download
+import torch, json, sys
+sys.path.insert(0, ".")  # needs training/transformer.py from vectrayx-paper-code
+ckpt = hf_hub_download("jsantillana/vectrayx-nano", "nano_sft_v7_s42.pt")
+tok  = hf_hub_download("jsantillana/vectrayx-nano", "tokenizer/vectrayx_bpe.model")
+cfg  = hf_hub_download("jsantillana/vectrayx-nano", "configs/nano.json")
 ```
+Full inference script at [vectrayx-paper-code](https://huggingface.co/jsantillana/vectrayx-paper-code).
+---
+## Training Details
+| Component | Details |
+|---|---|
+| Parameters | 41.95M |
+| Architecture | Transformer decoder, GQA (8q/2kv), QK-Norm, RMSNorm, SwiGLU, RoPE, z-loss |
+| Tokenizer | BPE-16384, byte-fallback, 50/50 conv/tech balance |
+| Pre-training | 170M tokens, 3-phase curriculum with 25% replay buffer |
+| SFT (v7) | 13K OASST1-ES + 4K CVE Q&A + 2.8K tool-use (ratio 1:21) |
+| Hardware | GCP L4 24GB (pre-training) + AWS g4dn.xlarge T4 16GB (multi-seed SFT) |
+| Cost | ~$29 USD total (corpus + training) |
+---
 ## Citation
 ```bibtex
+@misc{santillana2026vectrayx,
   title     = {VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model
                with Curriculum Learning and Native Tool Use},
   author    = {Santillana, Juan S.},
+  year      = {2026},
+  eprint    = {2605.13989},
+  archivePrefix = {arXiv},
+  primaryClass  = {cs.CL},
+  url       = {https://arxiv.org/abs/2605.13989}
 }
+```