Instructions to use jsantillana/vectrayx-nano-experimental with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jsantillana/vectrayx-nano-experimental with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jsantillana/vectrayx-nano-experimental", trust_remote_code=True)# Load model directly from transformers import VectraYXNano model = VectraYXNano.from_pretrained("jsantillana/vectrayx-nano-experimental", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use jsantillana/vectrayx-nano-experimental with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jsantillana/vectrayx-nano-experimental" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jsantillana/vectrayx-nano-experimental", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jsantillana/vectrayx-nano-experimental
- SGLang
How to use jsantillana/vectrayx-nano-experimental with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jsantillana/vectrayx-nano-experimental" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jsantillana/vectrayx-nano-experimental", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jsantillana/vectrayx-nano-experimental" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jsantillana/vectrayx-nano-experimental", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jsantillana/vectrayx-nano-experimental with Docker Model Runner:
docker model run hf.co/jsantillana/vectrayx-nano-experimental
VectraYX-Nano v14 (Experimental)
⚠️ Experimental release. v14 is the first nano checkpoint that emits tool-call syntax non-trivially (B4=0.16 vs the v2/v4/v6/v10 floor of 0.000), trained on top of v10's Chinchilla-optimal pretrain (~894 M tokens-procesados) with an SFT mixture rebalanced toward curated tool corpus density. B5 conversational gate stays at 0.70 and B1 (CVE keyword recall) recovers to 0.337. For production use, prefer the v7 headline release at jsantillana/vectrayx-nano.
VectraYX-Nano v14
A 42M-parameter Spanish-first language model for cybersecurity, optimized for Latin America, with native tool-call output.
- Author website: https://jsantillana.com
| Params | 41.95 M |
| Architecture | Decoder-only Transformer · 8 layers · 8 heads (2 KV) · RoPE · SwiGLU · QK-Norm · tied embeddings |
| Context | 1,024 tokens |
| Tokenizer | SentencePiece BPE 16,384 vocab (special tokens for chat + cyber: <|user|>, <|assistant|>, <|cve|>, <|tool_call|>, <|/tool_call|>, etc.) |
| Languages | Spanish (primary), Portuguese, English (technical terms) |
| Pretrain tokens | ~894 M tokens-procesados (≈ 21 tok/param, Chinchilla-optimal) — inherits v10 pretrain |
| SFT | v14 recipe: 6 epochs over the curated tool_sft_mini_v1.jsonl (2,801 ex) + sft_conversational.jsonl + oasst1_es.jsonl. Excludes the uncurated tooluse_dataset.jsonl (v1–v6 corpus) which had diluted v13. Tool-exposure-per-example ≈ 1.53× (vs v13's 0.38×). |
| Hardware | 1× NVIDIA A10G (SageMaker ml.g5.xlarge) · BF16 · ~30 min SFT-only on top of v10 phase-3 ckpt |
| License | Apache 2.0 |
Benchmarks
Evaluation suite B1–B5 designed to test Spanish cybersecurity knowledge + chat register at the nano scale (bench_v14.json in this repo).
| Benchmark | v14 | v10 (previous experimental) | v2 paper headline (N=4) | Notes |
|---|---|---|---|---|
| B1 CVE Q&A (keyword) | 0.337 | 0.307 | 0.226 ± 0.065 | Best nano result on B1 |
| B2 Classification (f1_macro) | 0.205 | 0.200 | 0.196 ± 0.014 | Capacity-bound at 42 M |
| B3 Commands (tool_match) | 0.029 | 0.000 | 0.029 ± 0.000 | Recovered to v2 baseline |
| B4 Tool-use | 0.160 | 0.000 | 0.230 ± 0.052 (v7) | First nano > 0 without LoRA; v7 with 4-seed mean reaches 0.23 |
| B5 Conversational gate | 0.700 | 0.800 | 0.775 ± 0.043 | Slight regression vs v10 (SFT mix favored tools) |
Single-seed (seed=42). For multi-seed B1–B5 with confidence intervals see the paper §8 Tables 7–8.
Quick start (HuggingFace transformers)
from transformers import AutoModelForCausalLM
import sentencepiece as spm
import torch
# Load model (custom_code; requires trust_remote_code)
model = AutoModelForCausalLM.from_pretrained(
"jsantillana/vectrayx-nano-experimental",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).eval()
# Tokenizer is SentencePiece (no HF tokenizer wrapper yet)
sp = spm.SentencePieceProcessor()
sp.load("tokenizer.model") # download alongside the repo
# Chat format expected by the model
prompt = "<|user|>¿Qué es un ataque de phishing?<|end|><|assistant|>"
ids = torch.tensor([sp.encode(prompt)])
out = model.generate_simple(ids, max_new_tokens=200, temperature=0.7, top_k=40)
print(sp.decode(out[0].tolist()))
Tool-call output format
v14 emits structured tool calls when the system prompt advertises tools. The wire format is:
<|tool_call|>{"name": "<tool_name>", "arguments": {<args>}}<|/tool_call|>
Example prompt:
SYSTEM = """Eres VectraYX-Nano. Tienes acceso a estas herramientas:
[
{"name": "search_cve", "description": "Look up a CVE by ID", "parameters": {"cve_id": "string"}},
{"name": "nmap_scan", "description": "Run nmap against a target", "parameters": {"target": "string", "ports": "string"}}
]
Cuando necesites una herramienta emite <|tool_call|>{...}<|/tool_call|>."""
prompt = f"<|system|>{SYSTEM}<|end|><|user|>Busca el CVE-2024-1234<|end|><|assistant|>"
Empirical B4 score: 0.16 — the model emits the bracketed format reliably, though argument selection is approximate at 42 M params (better at larger scales; see the Pro 3B / Analyst 7B paper rows).
Quick start (Ollama / llama.cpp)
⚠️ GGUF / Ollama support is currently broken. VectraYX-Nano uses QK-Norm (per-head-dim RMSNorm applied before RoPE) which matches the Qwen3 architecture on paper, but llama.cpp's Qwen3 implementation has subtle differences (likely in
build_qkvtensor layout or attention scale) that produce garbage output when loading our GGUF. Switching toarch=llamadrops QK-Norm and degrades output to "mostly coherent then diverges". A clean fix requires either:
- Adding a
vectrayxarch to llama.cpp upstream (~6–10 h C++ work + PR review), or- Re-training v14 without QK-Norm so the model becomes natively
arch=llamacompatible.Both options are tracked but out of scope for this experimental release. For now, use the HuggingFace
transformerspath above; PyTorch inference works correctly. Track the issue here if you want an update.
Intended use
- Designed for: defensive security education, cyber-incident triage assistance, CVE summarization in Spanish, FAQ for SOC analysts in LATAM, embedded chat in DevSecOps tooling, tool-call dispatch in MCP-aware agents.
- Out of scope: factual Q&A about events post-2024, code generation beyond shell snippets, long-context reasoning (>1 k tokens), English chat.
Known limitations
- Tool-call arguments are approximate. v14's B4=0.16 means the model emits the
<|tool_call|>...<|/tool_call|>envelope correctly but argument content can be hallucinated or pick a wrong tool name. Treat outputs as suggestions, not authoritative dispatch. Validate against your tool registry before execution. - Capacity-bound at 42 M params. B2 classification stays at the harness floor (0.20). For higher-fidelity tool use see the larger-tier checkpoints in the paper (Base 260M, Pro 3B, Analyst 7B).
- No safety RLHF — the model can be steered to produce harmful security-related content. Run behind a safety filter for production.
- Hallucinates LATAM institutional facts (DIVINDAT founding date, INDECOPI regulations, ANPD/LGPD article numbers, etc.). A LATAM-specific corpus was experimented with in v16 (full SFT — showed catastrophic forgetting) and v17 (LoRA — showed insufficient knowledge internalization at 3 K examples); neither is released. Robust LATAM factuality requires either a substantially larger LATAM corpus or training a larger base model with LATAM in pretrain (Base 260M v2 work in progress).
Training recipe
v14 = v10 pretrain checkpoint + clean SFT with curated tool corpus.
| Stage | Mix | Source | Purpose |
|---|---|---|---|
| v10 P1 | 100 % OpenSubtitles-ES | Helsinki-NLP/open_subtitles | Spanish chat register |
| v10 P2 | corpus_nano tech (NVD, Wiki-cyber, blogs, papers, malware, exploits) | corpus_nano.tar.gz | Cybersecurity domain |
| v10 P3 | glaive_fc_v2 + code_alpaca_bash + codefeedback_bash + exploitdb + github_repos | HuggingFace + corpus_nano | Function-calling + bash |
| v14 SFT | sft_conversational + oasst1_es + tool_sft_mini_v1 (curated, 2,801 ex) | local + curated | Tool-format + conv (6 ep) |
Pretrain budget: ~894 M tokens-procesados (≈ 21 tok/param @ 42 M = Chinchilla-optimal). v14 SFT runs ~30 min on top of v10's P3 checkpoint.
Citation
@misc{santillana2026vectrayx,
author = {Santillana, Juan},
title = {VectraYX-Nano: a 42M-parameter Spanish-first cybersecurity language model with native tool use},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/jsantillana/vectrayx-nano-experimental},
}
Authors
Juan Santillana — DevOps engineer at Globant.
See also
- Paper (in preparation): VectraYX paper with full ablations, corpus details, Chinchilla analysis, B1–B5 multi-seed results.
- Headline release: jsantillana/vectrayx-nano — v2/v4/v5/v6/v7 multi-seed checkpoints + LoRA adapters.
- Code: github.com/vectrayx/vectrayx-paper (training scripts, eval suite, prep pipeline).
- Downloads last month
- 38