Jincenzi
/

SocialR1-4B

Text Generation

social-reasoning

reinforcement-learning

Model card Files Files and versions

Jincenzi commited on 14 days ago

Commit

7628efa

·

verified ·

1 Parent(s): 614074b

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +4 -6

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
 messages = [
-    {"role": "user", "content": "Your social reasoning question here"}
 ]
 text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tokenizer([text], return_tensors="pt").to(model.device)
@@ -49,7 +49,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 - **Base Model**: Qwen3-4B
 - **Training Method**: Group Relative Policy Optimization (GRPO)
-- **Training Steps**: 570
 - **Hardware**: 8× NVIDIA A100 (80GB)
 - **Group Size**: 5
 - **KL Coefficient**: 0.04
@@ -69,9 +69,7 @@ SocialR1-4B is evaluated across three complementary settings:
 | Resource | Link |
 |----------|------|
 | Paper | [arXiv:2603.09249](https://arxiv.org/abs/2603.09249) |
-| SocialR1-8B | [Jincenzi/SocialR1-8B](https://huggingface.co/Jincenzi/SocialR1-8B) |
-| SocialR1-Llama3.1-8B | [Jincenzi/SocialR1-Llama3.1-8B](https://huggingface.co/Jincenzi/SocialR1-Llama3.1-8B) |
-| ToMBench-Hard Dataset | [Jincenzi/ToMBench_Hard](https://huggingface.co/datasets/Jincenzi/ToMBench_Hard) |
 ## Citation
@@ -79,7 +77,7 @@ SocialR1-4B is evaluated across three complementary settings:
 @inproceedings{wu2026socialr1,
   title={Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning},
   author={Wu, Jincenzi and Lei, Yuxuan and Lian, Jianxun and Huang, Yitian and Zhou, Lexin and Li, Haotian and Yang, Deng and Xie, Xing and Meng, Helen},
-  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
   year={2026}
 }
 ```

 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
 messages = [
+    {"role": "user", "content": "You should first think about the reasoning process in the mind and then provide with the answer.The reasoning process and answer are enclosed within <think> </think> and <Answer> </Answer> tags, respectively."}
 ]
 text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tokenizer([text], return_tensors="pt").to(model.device)
 - **Base Model**: Qwen3-4B
 - **Training Method**: Group Relative Policy Optimization (GRPO)
+- **Training Steps**: 600
 - **Hardware**: 8× NVIDIA A100 (80GB)
 - **Group Size**: 5
 - **KL Coefficient**: 0.04
 | Resource | Link |
 |----------|------|
 | Paper | [arXiv:2603.09249](https://arxiv.org/abs/2603.09249) |
+| SocialR1-8B | [Jincenzi/SocialR1-8B](https://huggingface.co/Jincenzi/SocialR1-8B) |
 ## Citation
 @inproceedings{wu2026socialr1,
   title={Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning},
   author={Wu, Jincenzi and Lei, Yuxuan and Lian, Jianxun and Huang, Yitian and Zhou, Lexin and Li, Haotian and Yang, Deng and Xie, Xing and Meng, Helen},
+  booktitle={Arxiv},
   year={2026}
 }
 ```