VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published 9 days ago • 58
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published 10 days ago • 153
When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning Paper • 2603.21289 • Published 14 days ago • 34
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 11 days ago • 48
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published 13 days ago • 131
MultiBind: A Benchmark for Attribute Misbinding in Multi-Subject Generation Paper • 2603.21937 • Published 13 days ago • 7
MultiBind: A Benchmark for Attribute Misbinding in Multi-Subject Generation Paper • 2603.21937 • Published 13 days ago • 7
MultiBind: A Benchmark for Attribute Misbinding in Multi-Subject Generation Paper • 2603.21937 • Published 13 days ago • 7
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 13 days ago • 120
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published Mar 2 • 64
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model Paper • 2602.21818 • Published Feb 25 • 56
SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL Paper • 2601.09136 • Published Jan 14 • 39
NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control Paper • 2602.09070 • Published Feb 9 • 46