DeepSeek-R1-Distill-Qwen-1.5B — ONNX (INT4)

INT4-quantized ONNX export of DeepSeek-R1-Distill-Qwen-1.5B, a 1.5B-parameter reasoning model distilled from DeepSeek-R1. Optimized for CPU inference with int4 RTN block-32 quantization.

Mirrored for use with inference4j, an inference-only AI library for Java.

Original Source

Usage with inference4j

try (TextGenerator gen = TextGenerator.builder()
        .modelSource(ModelSources.deepSeekR1_1_5B())
        .build()) {
    GenerationResult result = gen.generate("What is 2 + 2? Think step by step.");
    System.out.println(result.text());
}

Model Details

Property Value
Architecture Qwen2 (1.5B parameters, 28 layers, 1536 hidden)
Task Text generation / reasoning
Context length 131072 tokens
Quantization INT4 RTN block-32 acc-level-4
Original framework PyTorch (transformers)

License

This model is licensed under the MIT License. Original model by DeepSeek, ONNX conversion by Microsoft.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support