MolmoPoint: Better Pointing for VLMs with Grounding Tokens Paper • 2603.28069 • Published 3 days ago • 6
Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training Paper • 2602.01511 • Published Feb 2 • 15
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment Paper • 2510.07743 • Published Oct 9, 2025 • 13
τ-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge Paper • 2603.04370 • Published 28 days ago • 3
Effective Strategies for Asynchronous Software Engineering Agents Paper • 2603.21489 • Published 10 days ago • 6
FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol Paper • 2603.24943 • Published 7 days ago • 12
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 7 days ago • 46
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents Paper • 2603.24329 • Published 8 days ago • 24
EVA: Efficient Reinforcement Learning for End-to-End Video Agent Paper • 2603.22918 • Published 9 days ago • 42
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 7 days ago • 27
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search Paper • 2603.22341 • Published 12 days ago • 36
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience Paper • 2603.24533 • Published 7 days ago • 44
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 7 days ago • 92