hsaest commited on
Commit
96f1666
·
verified ·
1 Parent(s): 16db004

README: add benchmark result table from project page

Browse files
Files changed (1) hide show
  1. README.md +17 -6
README.md CHANGED
@@ -9,7 +9,20 @@ tags:
9
 
10
  # QUEST-35B-SFT
11
 
12
- **MoE** **vanilla SFT** checkpoint for the QUEST 35B line.
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ## Quick start
15
 
@@ -19,14 +32,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
19
  model_id = "osunlp/QUEST-35B-SFT"
20
  tokenizer = AutoTokenizer.from_pretrained(model_id)
21
  model = AutoModelForCausalLM.from_pretrained(
22
- model_id,
23
- device_map="auto",
24
- torch_dtype="auto",
25
  )
26
  ```
27
 
28
- Use the chat template and `tokenizer.apply_chat_template(...)` when available. VRAM and dtype requirements depend on model size and MoE vs dense architecture; see `config.json` (`model_type`, `architectures`).
29
 
30
  ## License
31
 
32
- This model is released under the **Apache License 2.0** (`apache-2.0`).
 
9
 
10
  # QUEST-35B-SFT
11
 
12
+ QUEST **35B-class MoE** SFT-only checkpoint (Qwen3.5-35B-A3B base, `Qwen3_5MoeForConditionalGeneration`). Intermediate stage before mid-training and RL.
13
+
14
+ ## Benchmark results
15
+
16
+ | Benchmark | Metric | Score |
17
+ | --- | --- | ---: |
18
+ | BrowseComp | avg@3 | 45.1 |
19
+ | Mind2Web 2 | avg@3 | 26.5 |
20
+ | HLE | avg@3 | 39.49 |
21
+ | DeepResearch Bench | avg@3 | 36.35 |
22
+ | BrowseComp-Plus | avg@3 | 57.9 |
23
+ | WideSearch | Item F1 avg@4 | 61.1 |
24
+ | GAIA | avg@3 | 83.5 |
25
+ | LiveResearchBench | avg@3 | 64.69 |
26
 
27
  ## Quick start
28
 
 
32
  model_id = "osunlp/QUEST-35B-SFT"
33
  tokenizer = AutoTokenizer.from_pretrained(model_id)
34
  model = AutoModelForCausalLM.from_pretrained(
35
+ model_id, device_map="auto", torch_dtype="auto",
 
 
36
  )
37
  ```
38
 
39
+ Apply the model's chat template with `tokenizer.apply_chat_template(...)` before passing prompts.
40
 
41
  ## License
42
 
43
+ Released under the **Apache License 2.0**.