mingkaid commited on
Commit
b34bbcb
·
verified ·
1 Parent(s): 9a9d8bb

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ tags:
8
+ - agent
9
+ - reasoning
10
+ - tool-use
11
+ - simulative-planning
12
+ base_model: Qwen/Qwen3-8B
13
+ ---
14
+
15
+ # SR²AM-v0.1-8B
16
+
17
+ ![SR²AM Illustration](model.png)
18
+
19
+ We argue that efficient agentic reasoning benefits from decomposing deliberation into three interacting systems: **reactive execution** (System I) for fine-grained reasoning and direct action; **simulative reasoning** (System II) that predicts consequences of proposed actions through a world model; and **self-regulation** (System III) that decides *when* and *how deeply* to plan through a learned **configurator**.
20
+
21
+ **SR²AM** (Self-Regulated Simulative Reasoning Agentic LLM) is our instantiation: the configurator and simulative planner are realized as distinct stages within an LLM's chain-of-thought reasoning, with the LLM itself serving as the world model in language space.
22
+
23
+ SR²AM-v0.1-8B achieves an overall Pass@1 of **57.0** across 11 benchmarks spanning math, science, tabular analysis, and web information seeking — competitive with systems at 120–355B parameters.
24
+
25
+ More details: [project website](https://sr2am-agentic-llm.github.io/) | [paper](https://arxiv.org) | [GitHub](<GITHUB_URL_TBD>).
26
+
27
+ ## Key Features
28
+
29
+ - **System I + II + III decomposition**: a configurator (System III) decides per-turn whether to plan, continue an existing plan, or act directly; a simulative planner (System II) constructs plans grounded in predicted future states; reactive execution (System I) handles fine-grained reasoning and tool use.
30
+ - **SFT + RL training**: supervised learning on data encoding the self-regulated planning structure, followed by reinforcement learning (GRPO) for task success.
31
+ - **Agentic tool use**: web search (SerpAPI), web browsing with LLM summarization, and stateless Python code execution (SandboxFusion).
32
+ - **Compact and efficient**: 3,698 reasoning tokens per trajectory on average — fewer or comparable to other systems at the same scale while outperforming them in Pass@1.
33
+
34
+ ## Quick Start
35
+
36
+ See the [GitHub repository](<GITHUB_URL_TBD>) for setup and inference instructions.
37
+
38
+ ## Main Results
39
+
40
+ ![Pass@1 vs. parameter size and reasoning-token count](main-results.png)
41
+
42
+ SR²AM-v0.1-8B sits above the size-vs-accuracy trendline in (a). The full benchmark breakdown is in the [paper](https://arxiv.org).
43
+
44
+ ## Citation
45
+
46
+ ```bibtex
47
+ @inproceedings{sr2am2026,
48
+ title={Efficient Agentic Reasoning Through Self-Regulated Simulative Planning},
49
+ author={Deng, Mingkai and Hou, Jinyu and Sá Neves, Lara and Pimpalkhute, Varad and Killian, Taylor W. and Liu, Zhengzhong and Xing, Eric P.},
50
+ booktitle={Preprint},
51
+ year={2026}
52
+ }
53
+ ```
54
+
55
+ ## License
56
+
57
+ Released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).