5 10 12

Tianyang Liu

tianyang

https://leolty.github.io/

AI & ML interests

None yet

Recent Activity

authored a paper about 15 hours ago

CocoaBench: Evaluating Unified Digital Agents in the Wild

upvoted a paper about 18 hours ago

CocoaBench: Evaluating Unified Digital Agents in the Wild

upvoted a paper 6 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

View all activity

Organizations

authored a paper about 15 hours ago

CocoaBench: Evaluating Unified Digital Agents in the Wild

Paper • 2604.11201 • Published 2 days ago • 28

upvoted a paper about 18 hours ago

CocoaBench: Evaluating Unified Digital Agents in the Wild

Paper • 2604.11201 • Published 2 days ago • 28

upvoted a paper 6 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 39

upvoted an article 6 months ago

Article

BigCodeArena: Judging code generations end to end with code executions

Oct 7, 2025

•

authored 2 papers 10 months ago

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Paper • 2411.08733 • Published Nov 13, 2024 • 1

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Paper • 2506.14965 • Published Jun 17, 2025 • 50

upvoted a paper 10 months ago

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Paper • 2506.14965 • Published Jun 17, 2025 • 50

liked a dataset 10 months ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20, 2025 • 91.9k • 1.74k • 46

authored a paper about 1 year ago

Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs

Paper • 2502.19411 • Published Feb 26, 2025 • 2

upvoted a paper about 1 year ago

Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs

Paper • 2502.19411 • Published Feb 26, 2025 • 2

liked a Space about 1 year ago

SWE Arena

🏢

SWE-Arena: Compare & Test Best AI Chatbots for Code

liked a Space over 1 year ago

Decentralized Arena Leaderboard

🥇

View and compare LLM evaluations across various domains

authored a paper over 1 year ago

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Paper • 2404.05221 • Published Apr 8, 2024 • 1

upvoted a paper over 1 year ago

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Paper • 2404.05221 • Published Apr 8, 2024 • 1

liked a Space about 2 years ago

AI Comic Factory

👩

11k

Create your own AI comic with a single prompt

New activity in tianyang/repobench_java_v1.1 about 2 years ago

Error when loading the dataset

#2 opened about 2 years ago by

Bilibili

upvoted a collection about 2 years ago

💫 StarCoder2

Collection

StarCoder2 models and datasets! • 8 items • Updated Mar 1, 2024 • 91

New activity in bigcode/starcoder2-evaluation about 2 years ago

RepoBench

#1 opened about 2 years ago by

tianyang

authored a paper about 2 years ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 156

upvoted a paper about 2 years ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 156

Tianyang Liu

AI & ML interests

Recent Activity

Organizations

tianyang's activity

BigCodeArena: Judging code generations end to end with code executions

SWE Arena

Decentralized Arena Leaderboard

AI Comic Factory

Error when loading the dataset

RepoBench