SpotEdit: Selective Region Editing in Diffusion Transformers Paper • 2512.22323 • Published 8 days ago • 36
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published 12 days ago • 29
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 30
SparseD: Sparse Attention for Diffusion Language Models Paper • 2509.24014 • Published Sep 28, 2025 • 30
PixelThink: Towards Efficient Chain-of-Pixel Reasoning Paper • 2505.23727 • Published May 29, 2025 • 5
HoliTom: Holistic Token Merging for Fast Video Large Language Models Paper • 2505.21334 • Published May 27, 2025 • 21
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression Paper • 2505.19602 • Published May 26, 2025 • 13
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24, 2025 • 26
VeriThinker: Learning to Verify Makes Reasoning Model Efficient Paper • 2505.17941 • Published May 23, 2025 • 25
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning Paper • 2505.16400 • Published May 22, 2025 • 35