j1-micro-1.7B (MLX 4-bit Quantized)
MLX 4-bit quantized version of Haize Labs' j1-micro, a 1.7B judge/reward model that matches Claude-3-Opus and GPT-4o-mini on RewardBench (80.7%) despite being 100x smaller.
This repo contains the MLX 4-bit quantized weights for fast inference on Apple Silicon Macs, plus the original LoRA adapter for GPU inference via vLLM.
What This Model Does
j1-micro is a pairwise preference judge: given two responses, it generates a structured rubric, reasons through it, and scores each response. Trained with GRPO (Group Relative Policy Optimization) + SPCT (Self-Principled Critique Tuning) on Skywork Preference 80K.
The model invents its own evaluation criteria per query, then scores against them. This structured reasoning is why 1.7B beats 400B+ models.
Performance
| Model | Params | RewardBench |
|---|---|---|
| Tulu-2-70b | 70B | 77.2% |
| Llama-3-70B-Instruct | 70B | 77.0% |
| Claude-3-Opus | 200B+ | 80.1% |
| GPT-4o-mini | ~8B | 80.1% |
| j1-micro (LoRA, FP16) | 1.7B | 80.7% |
| j1-micro (MLX 4-bit) | 1.7B | 75.0% |
MLX 4-bit quantized performance on 100-sample RewardBench subset:
- Accuracy: 75.0% (0% format error rate)
- Latency: ~3.0s avg, 2.9s p50, 3.8s p95 (M-series Mac)
- Memory: 2.0 GB peak
Files
mlx/ # MLX 4-bit quantized (Apple Silicon)
model.safetensors # 968 MB
config.json
tokenizer.json
tokenizer_config.json
...
lora/ # LoRA adapter (GPU via vLLM/PEFT)
adapter_model.safetensors # 67 MB
adapter_config.json
tokenizer.json
...
Quick Start (MLX on Mac)
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("rachittshah/j1-micro", model_config={"subfolder": "mlx"})
SYSTEM = """You are an expert XML wrangler. You must respond in the following format:
<specific_criteria>...</specific_criteria>
<analysis>...</analysis>
<scores>\\boxed{..., ...}</scores>
Please only respond in English."""
prompt = """You are a skilled little expert at scoring responses...
#### Conversation Context ####
What is the capital of France?
#### Responses to be Scored ####
[The Begin of Response A]
The capital of France is Paris, located in northern France along the Seine River.
[The End of Response A]
[The Begin of Response B]
France's capital is Lyon, a major city in southeastern France.
[The End of Response B]"""
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": prompt},
]
formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted, max_tokens=2048)
print(response)
Quick Start (vLLM with LoRA)
# Download and serve with vLLM
vllm serve Qwen/Qwen3-1.7B \
--enable-lora \
--lora-modules j1-micro=rachittshah/j1-micro/lora
# Or load adapter with PEFT
from peft import PeftModel
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B")
model = PeftModel.from_pretrained(model, "rachittshah/j1-micro", subfolder="lora")
Output Format
The model outputs structured XML:
<specific_criteria>
1. Factual accuracy (weight: 0.35) — correctness of stated facts
2. Specificity (weight: 0.25) — concrete details vs vague claims
3. Completeness (weight: 0.2) — coverage of the topic
4. Clarity (weight: 0.2) — clear, well-organized explanation
</specific_criteria>
<analysis>
Response A: Factual accuracy 9/10 — correctly identifies Paris...
Response B: Factual accuracy 2/10 — incorrectly states Lyon...
</analysis>
<scores>
\boxed{8, 3}
</scores>
Training Details
- Base model: Qwen/Qwen3-1.7B (Apache 2.0)
- Method: GRPO + SPCT (Self-Principled Critique Tuning)
- Data: Skywork-Reward-Preference-80K-v0.2
- LoRA: rank=16, alpha=32, dropout=0.1, all attention + MLP projections
- Hardware: 1x A100 80GB, <24h training
- Cost: ~$25
Citation
Original model by Haize Labs:
@misc{j1micro2025,
title = {j1-micro and j1-nano: Tiny Generalist Reward Models via Inference-Time Rubric Proposal},
author = {Haize Labs},
url = {https://github.com/haizelabs/j1-micro},
month = {May},
year = {2025}
}
License
Apache 2.0 (both base model Qwen3-1.7B and LoRA adapter).
Quantized
Model tree for rachittshah/j1-micro
Datasets used to train rachittshah/j1-micro
Evaluation results
- RewardBench Accuracy (reported by Haize Labs) on RewardBenchself-reported80.700
- RewardBench Accuracy (MLX 4-bit quantized) on RewardBench (100-sample subset)self-reported75.000