Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -37,7 +37,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
| 37 |
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
|
| 38 |
|
| 39 |
messages = [
|
| 40 |
-
{"role": "user", "content": "
|
| 41 |
]
|
| 42 |
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 43 |
inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
|
@@ -49,7 +49,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 49 |
|
| 50 |
- **Base Model**: Qwen3-4B
|
| 51 |
- **Training Method**: Group Relative Policy Optimization (GRPO)
|
| 52 |
-
- **Training Steps**:
|
| 53 |
- **Hardware**: 8× NVIDIA A100 (80GB)
|
| 54 |
- **Group Size**: 5
|
| 55 |
- **KL Coefficient**: 0.04
|
|
@@ -69,9 +69,7 @@ SocialR1-4B is evaluated across three complementary settings:
|
|
| 69 |
| Resource | Link |
|
| 70 |
|----------|------|
|
| 71 |
| Paper | [arXiv:2603.09249](https://arxiv.org/abs/2603.09249) |
|
| 72 |
-
| SocialR1-8B | [Jincenzi/SocialR1-8B](https://huggingface.co/Jincenzi/SocialR1-8B) |
|
| 73 |
-
| SocialR1-Llama3.1-8B | [Jincenzi/SocialR1-Llama3.1-8B](https://huggingface.co/Jincenzi/SocialR1-Llama3.1-8B) |
|
| 74 |
-
| ToMBench-Hard Dataset | [Jincenzi/ToMBench_Hard](https://huggingface.co/datasets/Jincenzi/ToMBench_Hard) |
|
| 75 |
|
| 76 |
## Citation
|
| 77 |
|
|
@@ -79,7 +77,7 @@ SocialR1-4B is evaluated across three complementary settings:
|
|
| 79 |
@inproceedings{wu2026socialr1,
|
| 80 |
title={Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning},
|
| 81 |
author={Wu, Jincenzi and Lei, Yuxuan and Lian, Jianxun and Huang, Yitian and Zhou, Lexin and Li, Haotian and Yang, Deng and Xie, Xing and Meng, Helen},
|
| 82 |
-
booktitle={
|
| 83 |
year={2026}
|
| 84 |
}
|
| 85 |
```
|
|
|
|
| 37 |
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
|
| 38 |
|
| 39 |
messages = [
|
| 40 |
+
{"role": "user", "content": "You should first think about the reasoning process in the mind and then provide with the answer.The reasoning process and answer are enclosed within <think> </think> and <Answer> </Answer> tags, respectively."}
|
| 41 |
]
|
| 42 |
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 43 |
inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
|
|
|
| 49 |
|
| 50 |
- **Base Model**: Qwen3-4B
|
| 51 |
- **Training Method**: Group Relative Policy Optimization (GRPO)
|
| 52 |
+
- **Training Steps**: 600
|
| 53 |
- **Hardware**: 8× NVIDIA A100 (80GB)
|
| 54 |
- **Group Size**: 5
|
| 55 |
- **KL Coefficient**: 0.04
|
|
|
|
| 69 |
| Resource | Link |
|
| 70 |
|----------|------|
|
| 71 |
| Paper | [arXiv:2603.09249](https://arxiv.org/abs/2603.09249) |
|
| 72 |
+
| SocialR1-8B | [Jincenzi/SocialR1-8B](https://huggingface.co/Jincenzi/SocialR1-8B) |
|
|
|
|
|
|
|
| 73 |
|
| 74 |
## Citation
|
| 75 |
|
|
|
|
| 77 |
@inproceedings{wu2026socialr1,
|
| 78 |
title={Social-R1: Enhancing Social Reasoning in LLMs through Trajectory-Level Reinforcement Learning},
|
| 79 |
author={Wu, Jincenzi and Lei, Yuxuan and Lian, Jianxun and Huang, Yitian and Zhou, Lexin and Li, Haotian and Yang, Deng and Xie, Xing and Meng, Helen},
|
| 80 |
+
booktitle={Arxiv},
|
| 81 |
year={2026}
|
| 82 |
}
|
| 83 |
```
|