harrier-oss-v1-0.6B GGUF

This repository contains GGUF exports and quantized variants of microsoft/harrier-oss-v1-0.6b, a multilingual text embedding model from Microsoft.

These files are intended for llama.cpp-compatible runtimes that support embedding models in GGUF format. The original upstream repository remains the source of truth for the native Transformers and Sentence Transformers checkpoints, training details, and canonical examples.

Model Summary

  • Base model: microsoft/harrier-oss-v1-0.6b
  • Model type: multilingual text embedding model
  • Parameters: 0.6B
  • Embedding dimension: 1024
  • Max context length: 32768 tokens
  • Pooling: last-token pooling
  • Normalization: L2 normalization
  • Languages: 94 languages
  • License: MIT

Harrier OSS v1 models are designed for retrieval, semantic similarity, clustering, classification, bitext mining, and reranking workloads. This 0.6B variant is the mid-sized model in the Harrier OSS v1 family.

Available Files

File Quantization Size SHA256 Notes
harrier-oss-v1-0.6B-BF16.gguf BF16 1.12 GiB 450258290f9ae0f79229dd400be810a24f5489df36adca8e567eac32b49fe04f Highest-fidelity GGUF export
harrier-oss-v1-0.6B-Q8_0.gguf Q8_0 609.83 MiB f97092bd73f6814b8b1170ca855071bc468b5fa2cf61a6de7e1d2c4a8a6a50b0 Larger integer quant for higher quality retention
harrier-oss-v1-0.6B-Q5_K_M.gguf Q5_K_M 423.83 MiB e283c242a3cad5473f5559f2d1577eca9b23b5cccecd2c523483c68c5b8871df Balanced size and quality
harrier-oss-v1-0.6B-Q4_K_M.gguf Q4_K_M 378.33 MiB 90d684bf550ca2c50de9191f6d18c5d8b20d89dba78257478eb746e88c66ecb3 Smaller general-purpose quant
harrier-oss-v1-0.6B-TQ1_0.gguf TQ1_0 216.23 MiB e734fb321a84065785794ac53050bafaafce19340b2f2b2c00bf441352350ae3 Smallest ternary variant in this repo
harrier-oss-v1-0.6B-TQ2_0.gguf TQ2_0 235.92 MiB fc2cf8b19e58738ed0e0954fc983b4b6e47c4f1d864322d744b0799b64e2afbd Ternary variant with slightly larger footprint

Benchmark Results

The table below reformats the local benchmark results that were already present in this repository. The values are preserved exactly and grouped by quantization for easier comparison on Hugging Face.

Quantization Bench Size Params Backend Threads pp512 tok/s tg128 tok/s
TQ1_0 (1.69 bpw ternary) 210.56 MiB 596.05 M CPU 12 276.59 ± 20.85 65.16 ± 2.68
TQ2_0 (2.06 bpw ternary) 230.25 MiB 596.05 M CPU 12 355.23 ± 8.14 65.63 ± 5.47
Q8_0 604.15 MiB 596.05 M CPU 12 302.93 ± 20.50 43.87 ± 3.60
Q5_K_M 418.15 MiB 596.05 M CPU 12 252.18 ± 5.95 52.16 ± 2.56
Q4_K_M 372.65 MiB 596.05 M CPU 12 340.34 ± 16.70 51.56 ± 3.47

Benchmark source: llama-bench build 6307ec07d (8604).

These are local throughput measurements, not embedding-quality scores. Expect different throughput on different CPUs, GPUs, thread counts, and llama.cpp revisions.

Usage Notes

This repository packages an embedding model, not a chat or text-generation model.

  • Use a llama.cpp-compatible runtime with embedding support.
  • Queries should include a short task instruction, following the upstream training format.
  • Documents and passages should usually be encoded without an instruction prefix.
  • Match the source model behavior by using last-token pooling and normalized embeddings if your runtime exposes those controls.

Minimal llama.cpp server example:

llama-server -m harrier-oss-v1-0.6B-Q4_K_M.gguf --embedding

Example query text format:

Instruct: Given a web search query, retrieve relevant passages that answer the query
Query: summit define

For the original Transformers and Sentence Transformers examples, refer to the upstream model card:

Source Model Details

According to the upstream Microsoft model card, Harrier OSS v1 uses a decoder-only architecture with last-token pooling and L2 normalization to produce dense text embeddings. The model family is designed for multilingual retrieval and related embedding tasks, and the 0.6B release reports an MTEB v2 score of 69.0.

Notes

  • This repo only contains GGUF artifacts derived from the upstream model.
  • No benchmark rows were added for BF16 because there were no BF16 benchmark results in the original local README.md.
  • If you need the original checkpoint files, prompts, or training background, use the upstream Microsoft repository.

Acknowledgements

  • Original model: Microsoft Harrier OSS v1
  • GGUF ecosystem and runtime support: llama.cpp
Downloads last month
4,316
GGUF
Model size
0.6B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SuperPauly/harrier-oss-v1-0.6b-gguf

Quantized
(12)
this model