About

Standard MLX quants of microsoft/harrier-oss-v1-27b. Converted using oMLX v0.3.5 on an M2 Ultra (192GB).

Harrier-OSS-v1-27B (oQ Quantized for MLX)

harrier-oss-v1 is a family of multilingual text embedding models developed by Microsoft. The models use decoder-only architectures with last-token pooling and L2 normalization to produce dense text embeddings. They can be applied to a wide range of tasks, including but not limited to retrieval, clustering, semantic similarity, classification, bitext mining, and reranking. The models achieve state-of-the-art results on the Multilingual MTEB v2 benchmark as of the release date.

Model	Parameters	Embedding Dimension	Max Tokens	MTEB v2 Score
harrier-oss-v1-270m	270M	640	32,768	66.5
harrier-oss-v1-0.6b	0.6B	1,024	32,768	69.0
harrier-oss-v1-27b	27B	5,376	32,768	74.3

This repository contains oQ quantized variants of the microsoft/harrier-oss-v1-27b multilingual embedding model. [cite_start]These weights are optimized specifically for Apple Silicon (M-series) hardware using the oMLX framework.

Quantization Details

These models were converted using oMLX v0.3.5 on an M2 Ultra (192GB).

Variant	Target bpw	RAM Usage (Est.)	Recommended Use Case
oQ4	~4.5	18.2 GB	Maximum throughput / Low VRAM overhead
oQ6	~6.7	21.1 GB	Durable Balance: Near-lossless RAG retrieval
oQ8	~8.5	26.8 GB	Archive/Audit grade fidelity

Usage (MLX / oMLX)

Prompting Requirements

CRITICAL: As per the original Microsoft implementation, instructions must be added to the query for optimal performance:

Query Format: Instruct: {task_description}\nQuery: {query}
Document Format: No instruction required; use raw text.

Via oMLX Dashboard

Open your oMLX Admin Console (localhost:8000/admin).
Search for splats/harrier-oss-v1-27b-oQ8-MLX.
Select the desired quantization folder (e.g., oQ8) from the model directory list.
Once the folder is downloaded, the model is ready to serve as an embedding endpoint.

Via CLI

hf download splats/harrier-oss-v1-27b-oQ8-MLX --local-dir ./harrier-oss-v1-27b-oQ8
omlx serve --model ./harrier-oss-v1-27b-oQ8

Downloads last month: 63

Safetensors

Model size

8B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for splats/harrier-oss-v1-27b-oQ8-MLX

Base model

microsoft/harrier-oss-v1-27b

Quantized

(4)

this model