kread commited on
Commit
6bd60a2
·
verified ·
1 Parent(s): 3d017ce

Add model card

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-Reranker-4B
4
+ base_model_relation: quantized
5
+ library_name: openvino
6
+ tags:
7
+ - openvino
8
+ - openvino-export
9
+ - text-ranking
10
+ - reranker
11
+ - qwen3
12
+ language:
13
+ - multilingual
14
+ pipeline_tag: text-ranking
15
+ ---
16
+
17
+ # Qwen3-Reranker-4B — OpenVINO IR (INT8 weight-only, asymmetric)
18
+
19
+ > **This is a redistribution.** For the model's intended use, instruction
20
+ > format, full evaluation (MTEB-R / CMTEB-R / MMTEB-R / MLDR / MTEB-Code /
21
+ > FollowIR), and citation, please see the upstream card:
22
+ > **[Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B)**.
23
+
24
+ OpenVINO IR conversion of
25
+ [Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B),
26
+ weight-only quantized to **INT8 asymmetric** via NNCF. Intended for the
27
+ [OpenArc](https://github.com/SearchSavior/OpenArc) reranker engine and
28
+ `optimum-intel` pipelines targeting Intel CPUs / iGPUs / dGPUs / NPUs.
29
+
30
+ ## Files
31
+
32
+ - `openvino_model.{xml,bin}` — Qwen3 (4B) decoder, INT8 weights (~4.0 GB)
33
+ - `openvino_tokenizer.{xml,bin}` / `openvino_detokenizer.{xml,bin}` — OpenVINO Tokenizers IR
34
+ - `chat_template.jinja`, `generation_config.json`
35
+ - Standard HF tokenizer files: `tokenizer.json`, `tokenizer_config.json`,
36
+ `special_tokens_map.json`, `vocab.json`, `merges.txt`
37
+ - `LICENSE`, `NOTICE` — Apache-2.0 with attribution to the upstream Qwen Team.
38
+
39
+ ## Architecture
40
+
41
+ | | |
42
+ |---|---|
43
+ | Base model | Qwen3 ForCausalLM (Qwen3-4B-Base) |
44
+ | Hidden size | 2560 |
45
+ | Layers | 36 |
46
+ | Attention heads / KV heads | 32 / 8 |
47
+ | Max position | 40 960 |
48
+ | Vocabulary | 151 669 |
49
+ | Source dtype | bfloat16 |
50
+ | Quantization | NNCF INT8 weight-only, asymmetric |
51
+
52
+ ## Usage with OpenArc
53
+
54
+ ```bash
55
+ openarc add qwen3-4b-reranker \
56
+ --model-path /path/to/Qwen3-Reranker-4B-int8-ov \
57
+ --model-type rerank \
58
+ --engine optimum \
59
+ --device GPU
60
+
61
+ openarc serve
62
+ # POST /v1/rerank {"model": "qwen3-4b-reranker", "query": "...", "documents": [...]}
63
+ ```
64
+
65
+ ## Conversion notes
66
+
67
+ The standard CLI route currently fails silently for this model on
68
+ `optimum-intel @ HEAD`:
69
+
70
+ ```bash
71
+ optimum-cli export openvino --weight-format int8 \
72
+ --model Qwen/Qwen3-Reranker-4B ./out
73
+ # exits 0; openvino_model.xml is 0 bytes, openvino_model.bin is a ~13 MB stub
74
+ ```
75
+
76
+ The Python API path produces a usable model:
77
+
78
+ ```python
79
+ from optimum.intel import OVModelForCausalLM, OVWeightQuantizationConfig
80
+
81
+ quant = OVWeightQuantizationConfig(bits=8, sym=False)
82
+ m = OVModelForCausalLM.from_pretrained(
83
+ "Qwen/Qwen3-Reranker-4B",
84
+ export=True,
85
+ quantization_config=quant,
86
+ trust_remote_code=True,
87
+ )
88
+ m.save_pretrained("./out")
89
+ ```
90
+
91
+ Tokenizer / detokenizer IR generated separately via
92
+ `openvino_tokenizers.convert_tokenizer(..., with_detokenizer=True)`.
93
+
94
+ A more detailed walkthrough lives in
95
+ [OpenArc/docs/openvino_qwen3.md](https://github.com/SearchSavior/OpenArc).
96
+
97
+ ## License
98
+
99
+ Apache-2.0, inherited from
100
+ [Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B).
101
+ See `LICENSE` and `NOTICE` in this repo.
102
+
103
+ ## Citation
104
+
105
+ From the upstream model card:
106
+
107
+ ```bibtex
108
+ @article{qwen3embedding,
109
+ title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
110
+ author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
111
+ journal={arXiv preprint arXiv:2506.05176},
112
+ year={2025}
113
+ }
114
+ ```