harrier-oss-v1-0.6B GGUF

This repository contains GGUF exports and quantized variants of microsoft/harrier-oss-v1-0.6b, a multilingual text embedding model from Microsoft.

These files are intended for llama.cpp-compatible runtimes that support embedding models in GGUF format. The original upstream repository remains the source of truth for the native Transformers and Sentence Transformers checkpoints, training details, and canonical examples.

Model Summary

Base model: microsoft/harrier-oss-v1-0.6b
Model type: multilingual text embedding model
Parameters: 0.6B
Embedding dimension: 1024
Max context length: 32768 tokens
Pooling: last-token pooling
Normalization: L2 normalization
Languages: 94 languages
License: MIT

Harrier OSS v1 models are designed for retrieval, semantic similarity, clustering, classification, bitext mining, and reranking workloads. This 0.6B variant is the mid-sized model in the Harrier OSS v1 family.

Available Files

File	Quantization	Size	SHA256	Notes
`harrier-oss-v1-0.6B-BF16.gguf`	BF16	1.12 GiB	`450258290f9ae0f79229dd400be810a24f5489df36adca8e567eac32b49fe04f`	Highest-fidelity GGUF export
`harrier-oss-v1-0.6B-Q8_0.gguf`	Q8_0	609.83 MiB	`f97092bd73f6814b8b1170ca855071bc468b5fa2cf61a6de7e1d2c4a8a6a50b0`	Larger integer quant for higher quality retention
`harrier-oss-v1-0.6B-Q5_K_M.gguf`	Q5_K_M	423.83 MiB	`e283c242a3cad5473f5559f2d1577eca9b23b5cccecd2c523483c68c5b8871df`	Balanced size and quality
`harrier-oss-v1-0.6B-Q4_K_M.gguf`	Q4_K_M	378.33 MiB	`90d684bf550ca2c50de9191f6d18c5d8b20d89dba78257478eb746e88c66ecb3`	Smaller general-purpose quant
`harrier-oss-v1-0.6B-TQ1_0.gguf`	TQ1_0	216.23 MiB	`e734fb321a84065785794ac53050bafaafce19340b2f2b2c00bf441352350ae3`	Smallest ternary variant in this repo
`harrier-oss-v1-0.6B-TQ2_0.gguf`	TQ2_0	235.92 MiB	`fc2cf8b19e58738ed0e0954fc983b4b6e47c4f1d864322d744b0799b64e2afbd`	Ternary variant with slightly larger footprint

Benchmark Results

The table below reformats the local benchmark results that were already present in this repository. The values are preserved exactly and grouped by quantization for easier comparison on Hugging Face.

Quantization	Bench Size	Params	Backend	Threads	`pp512` tok/s	`tg128` tok/s
`TQ1_0` (1.69 bpw ternary)	210.56 MiB	596.05 M	CPU	12	276.59 ± 20.85	65.16 ± 2.68
`TQ2_0` (2.06 bpw ternary)	230.25 MiB	596.05 M	CPU	12	355.23 ± 8.14	65.63 ± 5.47
`Q8_0`	604.15 MiB	596.05 M	CPU	12	302.93 ± 20.50	43.87 ± 3.60
`Q5_K_M`	418.15 MiB	596.05 M	CPU	12	252.18 ± 5.95	52.16 ± 2.56
`Q4_K_M`	372.65 MiB	596.05 M	CPU	12	340.34 ± 16.70	51.56 ± 3.47

Benchmark source: llama-bench build 6307ec07d (8604).

These are local throughput measurements, not embedding-quality scores. Expect different throughput on different CPUs, GPUs, thread counts, and llama.cpp revisions.

Usage Notes

This repository packages an embedding model, not a chat or text-generation model.

Use a llama.cpp-compatible runtime with embedding support.
Queries should include a short task instruction, following the upstream training format.
Documents and passages should usually be encoded without an instruction prefix.
Match the source model behavior by using last-token pooling and normalized embeddings if your runtime exposes those controls.

Minimal llama.cpp server example:

llama-server -m harrier-oss-v1-0.6B-Q4_K_M.gguf --embedding

Example query text format:

Instruct: Given a web search query, retrieve relevant passages that answer the query
Query: summit define

For the original Transformers and Sentence Transformers examples, refer to the upstream model card:

https://huggingface.co/microsoft/harrier-oss-v1-0.6b

Source Model Details

According to the upstream Microsoft model card, Harrier OSS v1 uses a decoder-only architecture with last-token pooling and L2 normalization to produce dense text embeddings. The model family is designed for multilingual retrieval and related embedding tasks, and the 0.6B release reports an MTEB v2 score of 69.0.

Notes

This repo only contains GGUF artifacts derived from the upstream model.
No benchmark rows were added for BF16 because there were no BF16 benchmark results in the original local README.md.
If you need the original checkpoint files, prompts, or training background, use the upstream Microsoft repository.

Acknowledgements

Original model: Microsoft Harrier OSS v1
GGUF ecosystem and runtime support: llama.cpp

Downloads last month: 4,316

GGUF

Model size

0.6B params

Architecture

qwen3

Hardware compatibility

1-bit

2-bit

4-bit

5-bit

8-bit

16-bit

Model tree for SuperPauly/harrier-oss-v1-0.6b-gguf

Base model

microsoft/harrier-oss-v1-0.6b

Quantized

(12)

this model