UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11 • 37
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13 • 94
Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published Nov 13 • 10
Canvas-to-Image: Compositional Image Generation with Multimodal Controls Paper • 2511.21691 • Published 20 days ago • 33