Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention Paper • 2605.29548 • Published 3 days ago • 3
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models Paper • 2605.30263 • Published 3 days ago • 46
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 3 days ago • 92
SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control Paper • 2605.27891 • Published 4 days ago • 3
How LoRA Remembers? A Parametric Memory Law for LLM Finetuning Paper • 2605.30260 • Published 3 days ago • 24
GEM: Generative Supervision Helps Embodied Intelligence Paper • 2605.28548 • Published 4 days ago • 37
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 4 days ago • 78
CubePart: An Open-Vocabulary Part-Controllable 3D Generator Paper • 2605.28763 • Published 4 days ago • 11
Self-Improving Language Models with Bidirectional Evolutionary Search Paper • 2605.28814 • Published 4 days ago • 52
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 4 days ago • 357
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research Paper • 2605.26114 • Published 6 days ago • 57
Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction Paper • 2605.26230 • Published 6 days ago • 38
Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration Paper • 2605.17423 • Published 14 days ago • 30
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence Paper • 2605.26494 • Published 5 days ago • 32
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 5 days ago • 124
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction Paper • 2605.26115 • Published 6 days ago • 50