BahaaGalal 's Collections LLM for Coding
updated
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large
Language Models in Code Generation from Scientific Plots
Paper
• 2405.07990
• Published • 20
Large Language Models as Planning Domain Generators
Paper
• 2405.06650
• Published • 13
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler
Generation
Paper
• 2404.12753
• Published • 43
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
Computer Environments
Paper
• 2404.07972
• Published • 52
LLoCO: Learning Long Contexts Offline
Paper
• 2404.07979
• Published • 22
CodecLM: Aligning Language Models with Tailored Synthetic Data
Paper
• 2404.05875
• Published • 18
Elephants Never Forget: Memorization and Learning of Tabular Data in
Large Language Models
Paper
• 2404.06209
• Published • 5
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper
• 2404.05719
• Published • 83
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
Dialogues
Paper
• 2404.03820
• Published • 25
CodeEditorBench: Evaluating Code Editing Capability of Large Language
Models
Paper
• 2404.03543
• Published • 18
Language Models as Compilers: Simulating Pseudocode Execution Improves
Algorithmic Reasoning in Language Models
Paper
• 2404.02575
• Published • 50
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published • 72
Quiet-STaR: Language Models Can Teach Themselves to Think Before
Speaking
Paper
• 2403.09629
• Published • 79
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
• 2403.03163
• Published • 98
StarCoder 2 and The Stack v2: The Next Generation
Paper
• 2402.19173
• Published • 156
StructLM: Towards Building Generalist Models for Structured Knowledge
Grounding
Paper
• 2402.16671
• Published • 27
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API
LLMs
Paper
• 2402.15491
• Published • 15
OpenCodeInterpreter: Integrating Code Generation with Execution and
Refinement
Paper
• 2402.14658
• Published • 84
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper
• 2402.14261
• Published • 10
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue
Summarization
Paper
• 2402.13249
• Published • 15
Chain-of-Thought Reasoning Without Prompting
Paper
• 2402.10200
• Published • 109
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper
• 2402.09727
• Published • 38
MPIrigen: MPI Code Generation through Domain-Specific Language Models
Paper
• 2402.09126
• Published • 14
Multi-line AI-assisted Code Authoring
Paper
• 2402.04141
• Published • 10
StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
Paper
• 2402.01391
• Published • 43
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper
• 2401.16467
• Published • 10
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper
• 2401.03065
• Published • 11