About

Standard MLX quants of microsoft/harrier-oss-v1-27b. Converted using oMLX v0.3.5 on an M2 Ultra (192GB).

Harrier-OSS-v1-27B (oQ Quantized for MLX)

harrier-oss-v1 is a family of multilingual text embedding models developed by Microsoft. The models use decoder-only architectures with last-token pooling and L2 normalization to produce dense text embeddings. They can be applied to a wide range of tasks, including but not limited to retrieval, clustering, semantic similarity, classification, bitext mining, and reranking. The models achieve state-of-the-art results on the Multilingual MTEB v2 benchmark as of the release date.

Model Parameters Embedding Dimension Max Tokens MTEB v2 Score
harrier-oss-v1-270m 270M 640 32,768 66.5
harrier-oss-v1-0.6b 0.6B 1,024 32,768 69.0
harrier-oss-v1-27b 27B 5,376 32,768 74.3

This repository contains oQ quantized variants of the microsoft/harrier-oss-v1-27b multilingual embedding model. [cite_start]These weights are optimized specifically for Apple Silicon (M-series) hardware using the oMLX framework.

Quantization Details

These models were converted using oMLX v0.3.5 on an M2 Ultra (192GB).

Variant Target bpw RAM Usage (Est.) Recommended Use Case
oQ4 ~4.5 18.2 GB Maximum throughput / Low VRAM overhead
oQ6 ~6.7 21.1 GB Durable Balance: Near-lossless RAG retrieval
oQ8 ~8.5 26.8 GB Archive/Audit grade fidelity

Usage (MLX / oMLX)

Prompting Requirements

CRITICAL: As per the original Microsoft implementation, instructions must be added to the query for optimal performance:

  • Query Format: Instruct: {task_description}\nQuery: {query}
  • Document Format: No instruction required; use raw text.

Via oMLX Dashboard

  1. Open your oMLX Admin Console (localhost:8000/admin).
  2. Search for splats/harrier-oss-v1-27b-oQ8-MLX.
  3. Select the desired quantization folder (e.g., oQ8) from the model directory list.
  4. Once the folder is downloaded, the model is ready to serve as an embedding endpoint.

Via CLI

hf download splats/harrier-oss-v1-27b-oQ8-MLX --local-dir ./harrier-oss-v1-27b-oQ8
omlx serve --model ./harrier-oss-v1-27b-oQ8
Downloads last month
63
Safetensors
Model size
8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for splats/harrier-oss-v1-27b-oQ8-MLX

Quantized
(4)
this model