In this repo you can find the quantized versions of "BSC-LT/salamandra-2b" ready to use in ollama and R.

Installation

To use it, first download this repo:

$ hf download "jrosell/salamandra-2b-GGUF"
$ cd "$HOME/.cache/huggingface/hub/models--BSC-LT--salamandra-2b/snapshots/$(cat ~/.cache/huggingface/hub/models--BSC-LT--salamandra-2b/refs/main)"

Usage

Use GGUF Q8_0 version in Ollama

$ echo "FROM ./models/salamandra-2b-Q8_0.gguf" > Modelfile.Q8_0
$ ollama create salamandra-2b:Q8_0 -f Modelfile.Q8_0
$ ollama run salamandra-2b:Q8_0

Use the Quantized Q4_K_M version in Ollama:

$ echo "FROM ./models/salamandra-2b-Q4_K_M.gguf" > Modelfile.Q4_K_M
$ ollama create salamandra-2b:Q4_K_M -f Modelfile.Q4_K_M
$ ollama run salamandra-2b:Q4_K_M

Use it in R:

$ R --vanilla -e 'ellmer::chat_ollama(model = "salamandra-2b:Q4_K_M")$chat("Explica un acudit sobre informàtics.")'

How I built this

$ mkdir -p salamandra-2b/models && cd salamandra-2b
$ hf download "BSC-LT/salamandra-2b"
$ git clone https://github.com/ggerganov/llama.cpp.git
$ uv init
$ uv add -r llama.cpp/requirements.txt
$ export MODEL_DIR="$HOME/.cache/huggingface/hub/models--BSC-LT--salamandra-2b/snapshots/$(cat ~/.cache/huggingface/hub/models--BSC-LT--salamandra-2b/refs/main)"
$ uv run llama.cpp/convert_hf_to_gguf.py $MODEL_DIR \
    --outfile models/salamandra-2b.gguf \
    --outtype auto
$ cd llama.cpp
$ cmake -B build
$ cmake --build build --config Release --target llama-quantize
$ ./build/bin/llama-quantize ../models/salamandra-2b.gguf ../models/salamandra-2b-Q4_K_M.gguf Q4_K_M
Downloads last month
14
GGUF
Model size
2B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jrosell/salamandra-2b-GGUF

Quantized
(6)
this model

Datasets used to train jrosell/salamandra-2b-GGUF