13 6 3

Shengnan An

ShengnanAn

AI & ML interests

None yet

Recent Activity

upvoted a paper about 5 hours ago

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

submitted a paper about 5 hours ago

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

published a dataset about 5 hours ago

meituan-longcat/General365_Public

View all activity

Organizations

upvoted a paper about 5 hours ago

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

Paper • 2604.11778 • Published 1 day ago • 4

submitted a paper to Daily Papers about 5 hours ago

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

Paper • 2604.11778 • Published 1 day ago • 4

published a dataset about 5 hours ago

meituan-longcat/General365_Public

Viewer • Updated about 5 hours ago • 720 • 2 • 1

updated a dataset about 5 hours ago

meituan-longcat/General365_Public

Viewer • Updated about 5 hours ago • 720 • 2 • 1

updated a dataset 2 months ago

meituan-longcat/AMO-Bench

Viewer • Updated Feb 5 • 50 • 2.8k • 30

New activity in meituan-longcat/AMO-Bench 2 months ago

[Bug Report] Problem 30: labeled minimum 3736 is not minimal — counterexample achieves 239089/64

❤️ 2

#8 opened 3 months ago by

applesilicon

authored a paper 3 months ago

LongCat-Flash-Thinking-2601 Technical Report

Paper • 2601.16725 • Published Jan 23 • 180

upvoted a paper 3 months ago

LongCat-Flash-Thinking-2601 Technical Report

Paper • 2601.16725 • Published Jan 23 • 180

New activity in meituan-longcat/AMO-Bench 3 months ago

I want to know if there any agent can resolve 100%？

#7 opened 3 months ago by

Willjoe

commented a paper 4 months ago

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Paper • 2512.10739 • Published Dec 11, 2025 • 47 •

New activity in deepseek-ai/DeepSeek-V3.2 4 months ago

Default system prompt can hinder thinking-mode performance

#31 opened 4 months ago by

ShengnanAn

Should the thinking mode in DS v3.2 use the default system prompt?

#24 opened 4 months ago by

ShengnanAn

New activity in moonshotai/Kimi-K2-Thinking 5 months ago

Awesome work! Do you want to try AMO-Bench, the most challenging MO-level benchmark?

#3 opened 5 months ago by

ShengnanAn

authored a paper 5 months ago

AMO-Bench: Large Language Models Still Struggle in High School Math Competitions

Paper • 2510.26768 • Published Oct 30, 2025 • 34

New activity in meituan-longcat/AMO-Bench 5 months ago

[Bug Report] Problem 35: Official solution misuses “positive integers”, final count should be 7656 (not 7657)

#4 opened 5 months ago by

applesilicon

liked a dataset 5 months ago

meituan-longcat/UNO-Bench

Viewer • Updated Dec 4, 2025 • 3.73k • 1.43k • 21

New activity in meituan-longcat/AMO-Bench 5 months ago

Improve dataset card: Add task category, paper, project page, code, abstract, features, leaderboard, and sample usage

👍 1

#2 opened 5 months ago by

nielsr

updated a dataset 5 months ago

meituan-longcat/UNO-Bench

Viewer • Updated Dec 4, 2025 • 3.73k • 1.43k • 21

liked a dataset 5 months ago

meituan-longcat/AMO-Bench

Viewer • Updated Feb 5 • 50 • 2.8k • 30

New activity in meituan-longcat/AMO-Bench 5 months ago

[Bug Report] Problem 26 seems identical to Berkeley Math Circle 2014–2015 Monthly Contest 3, Problem 4

#3 opened 5 months ago by

applesilicon

Shengnan An

AI & ML interests

Recent Activity

Organizations

ShengnanAn's activity

[Bug Report] Problem 30: labeled minimum 3736 is not minimal — counterexample achieves 239089/64

I want to know if there any agent can resolve 100%？

Default system prompt can hinder thinking-mode performance

Should the thinking mode in DS v3.2 use the default system prompt?

Awesome work! Do you want to try AMO-Bench, the most challenging MO-level benchmark?

[Bug Report] Problem 35: Official solution misuses “positive integers”, final count should be 7656 (not 7657)

Improve dataset card: Add task category, paper, project page, code, abstract, features, leaderboard, and sample usage

[Bug Report] Problem 26 seems identical to Berkeley Math Circle 2014–2015 Monthly Contest 3, Problem 4