Jincenzi commited on
Commit
7628efa
·
verified ·
1 Parent(s): 614074b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -6
README.md CHANGED
@@ -37,7 +37,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
37
  model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
38
 
39
  messages = [
40
- {"role": "user", "content": "Your social reasoning question here"}
41
  ]
42
  text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
43
  inputs = tokenizer([text], return_tensors="pt").to(model.device)
@@ -49,7 +49,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
49
 
50
  - **Base Model**: Qwen3-4B
51
  - **Training Method**: Group Relative Policy Optimization (GRPO)
52
- - **Training Steps**: 570
53
  - **Hardware**: 8× NVIDIA A100 (80GB)
54
  - **Group Size**: 5
55
  - **KL Coefficient**: 0.04
@@ -69,9 +69,7 @@ SocialR1-4B is evaluated across three complementary settings:
69
  | Resource | Link |
70
  |----------|------|
71
  | Paper | [arXiv:2603.09249](https://arxiv.org/abs/2603.09249) |
72
- | SocialR1-8B | [Jincenzi/SocialR1-8B](https://huggingface.co/Jincenzi/SocialR1-8B) |
73
- | SocialR1-Llama3.1-8B | [Jincenzi/SocialR1-Llama3.1-8B](https://huggingface.co/Jincenzi/SocialR1-Llama3.1-8B) |
74
- | ToMBench-Hard Dataset | [Jincenzi/ToMBench_Hard](https://huggingface.co/datasets/Jincenzi/ToMBench_Hard) |
75
 
76
  ## Citation
77
 
@@ -79,7 +77,7 @@ SocialR1-4B is evaluated across three complementary settings:
79
  @inproceedings{wu2026socialr1,
80
  title={Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning},
81
  author={Wu, Jincenzi and Lei, Yuxuan and Lian, Jianxun and Huang, Yitian and Zhou, Lexin and Li, Haotian and Yang, Deng and Xie, Xing and Meng, Helen},
82
- booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
83
  year={2026}
84
  }
85
  ```
 
37
  model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
38
 
39
  messages = [
40
+ {"role": "user", "content": "You should first think about the reasoning process in the mind and then provide with the answer.The reasoning process and answer are enclosed within <think> </think> and <Answer> </Answer> tags, respectively."}
41
  ]
42
  text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
43
  inputs = tokenizer([text], return_tensors="pt").to(model.device)
 
49
 
50
  - **Base Model**: Qwen3-4B
51
  - **Training Method**: Group Relative Policy Optimization (GRPO)
52
+ - **Training Steps**: 600
53
  - **Hardware**: 8× NVIDIA A100 (80GB)
54
  - **Group Size**: 5
55
  - **KL Coefficient**: 0.04
 
69
  | Resource | Link |
70
  |----------|------|
71
  | Paper | [arXiv:2603.09249](https://arxiv.org/abs/2603.09249) |
72
+ | SocialR1-8B | [Jincenzi/SocialR1-8B](https://huggingface.co/Jincenzi/SocialR1-8B) |
 
 
73
 
74
  ## Citation
75
 
 
77
  @inproceedings{wu2026socialr1,
78
  title={Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning},
79
  author={Wu, Jincenzi and Lei, Yuxuan and Lian, Jianxun and Huang, Yitian and Zhou, Lexin and Li, Haotian and Yang, Deng and Xie, Xing and Meng, Helen},
80
+ booktitle={Arxiv},
81
  year={2026}
82
  }
83
  ```