harrier-oss-v1-0.6B GGUF
This repository contains GGUF exports and quantized variants of microsoft/harrier-oss-v1-0.6b, a multilingual text embedding model from Microsoft.
These files are intended for llama.cpp-compatible runtimes that support embedding models in GGUF format. The original upstream repository remains the source of truth for the native Transformers and Sentence Transformers checkpoints, training details, and canonical examples.
Model Summary
- Base model:
microsoft/harrier-oss-v1-0.6b - Model type: multilingual text embedding model
- Parameters: 0.6B
- Embedding dimension: 1024
- Max context length: 32768 tokens
- Pooling: last-token pooling
- Normalization: L2 normalization
- Languages: 94 languages
- License: MIT
Harrier OSS v1 models are designed for retrieval, semantic similarity, clustering, classification, bitext mining, and reranking workloads. This 0.6B variant is the mid-sized model in the Harrier OSS v1 family.
Available Files
| File | Quantization | Size | SHA256 | Notes |
|---|---|---|---|---|
harrier-oss-v1-0.6B-BF16.gguf |
BF16 | 1.12 GiB | 450258290f9ae0f79229dd400be810a24f5489df36adca8e567eac32b49fe04f |
Highest-fidelity GGUF export |
harrier-oss-v1-0.6B-Q8_0.gguf |
Q8_0 | 609.83 MiB | f97092bd73f6814b8b1170ca855071bc468b5fa2cf61a6de7e1d2c4a8a6a50b0 |
Larger integer quant for higher quality retention |
harrier-oss-v1-0.6B-Q5_K_M.gguf |
Q5_K_M | 423.83 MiB | e283c242a3cad5473f5559f2d1577eca9b23b5cccecd2c523483c68c5b8871df |
Balanced size and quality |
harrier-oss-v1-0.6B-Q4_K_M.gguf |
Q4_K_M | 378.33 MiB | 90d684bf550ca2c50de9191f6d18c5d8b20d89dba78257478eb746e88c66ecb3 |
Smaller general-purpose quant |
harrier-oss-v1-0.6B-TQ1_0.gguf |
TQ1_0 | 216.23 MiB | e734fb321a84065785794ac53050bafaafce19340b2f2b2c00bf441352350ae3 |
Smallest ternary variant in this repo |
harrier-oss-v1-0.6B-TQ2_0.gguf |
TQ2_0 | 235.92 MiB | fc2cf8b19e58738ed0e0954fc983b4b6e47c4f1d864322d744b0799b64e2afbd |
Ternary variant with slightly larger footprint |
Benchmark Results
The table below reformats the local benchmark results that were already present in this repository. The values are preserved exactly and grouped by quantization for easier comparison on Hugging Face.
| Quantization | Bench Size | Params | Backend | Threads | pp512 tok/s |
tg128 tok/s |
|---|---|---|---|---|---|---|
TQ1_0 (1.69 bpw ternary) |
210.56 MiB | 596.05 M | CPU | 12 | 276.59 ± 20.85 | 65.16 ± 2.68 |
TQ2_0 (2.06 bpw ternary) |
230.25 MiB | 596.05 M | CPU | 12 | 355.23 ± 8.14 | 65.63 ± 5.47 |
Q8_0 |
604.15 MiB | 596.05 M | CPU | 12 | 302.93 ± 20.50 | 43.87 ± 3.60 |
Q5_K_M |
418.15 MiB | 596.05 M | CPU | 12 | 252.18 ± 5.95 | 52.16 ± 2.56 |
Q4_K_M |
372.65 MiB | 596.05 M | CPU | 12 | 340.34 ± 16.70 | 51.56 ± 3.47 |
Benchmark source: llama-bench build 6307ec07d (8604).
These are local throughput measurements, not embedding-quality scores. Expect different throughput on different CPUs, GPUs, thread counts, and llama.cpp revisions.
Usage Notes
This repository packages an embedding model, not a chat or text-generation model.
- Use a
llama.cpp-compatible runtime with embedding support. - Queries should include a short task instruction, following the upstream training format.
- Documents and passages should usually be encoded without an instruction prefix.
- Match the source model behavior by using last-token pooling and normalized embeddings if your runtime exposes those controls.
Minimal llama.cpp server example:
llama-server -m harrier-oss-v1-0.6B-Q4_K_M.gguf --embedding
Example query text format:
Instruct: Given a web search query, retrieve relevant passages that answer the query
Query: summit define
For the original Transformers and Sentence Transformers examples, refer to the upstream model card:
Source Model Details
According to the upstream Microsoft model card, Harrier OSS v1 uses a decoder-only architecture with last-token pooling and L2 normalization to produce dense text embeddings. The model family is designed for multilingual retrieval and related embedding tasks, and the 0.6B release reports an MTEB v2 score of 69.0.
Notes
- This repo only contains GGUF artifacts derived from the upstream model.
- No benchmark rows were added for BF16 because there were no BF16 benchmark results in the original local
README.md. - If you need the original checkpoint files, prompts, or training background, use the upstream Microsoft repository.
Acknowledgements
- Original model: Microsoft Harrier OSS v1
- GGUF ecosystem and runtime support:
llama.cpp
- Downloads last month
- 4,316
1-bit
2-bit
4-bit
5-bit
8-bit
16-bit
Model tree for SuperPauly/harrier-oss-v1-0.6b-gguf
Base model
microsoft/harrier-oss-v1-0.6b