DeepSeek-R1-Distill-Qwen-1.5B — ONNX (INT4)
INT4-quantized ONNX export of DeepSeek-R1-Distill-Qwen-1.5B, a 1.5B-parameter reasoning model distilled from DeepSeek-R1. Optimized for CPU inference with int4 RTN block-32 quantization.
Mirrored for use with inference4j, an inference-only AI library for Java.
Original Source
- Repository: DeepSeek / Microsoft
- License: mit
Usage with inference4j
try (TextGenerator gen = TextGenerator.builder()
.modelSource(ModelSources.deepSeekR1_1_5B())
.build()) {
GenerationResult result = gen.generate("What is 2 + 2? Think step by step.");
System.out.println(result.text());
}
Model Details
| Property | Value |
|---|---|
| Architecture | Qwen2 (1.5B parameters, 28 layers, 1536 hidden) |
| Task | Text generation / reasoning |
| Context length | 131072 tokens |
| Quantization | INT4 RTN block-32 acc-level-4 |
| Original framework | PyTorch (transformers) |
License
This model is licensed under the MIT License. Original model by DeepSeek, ONNX conversion by Microsoft.