InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published 13 days ago • 92
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 14 days ago • 28
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 8 days ago • 48
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 14 days ago • 19
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 15 days ago • 66
Exploring MLLM-Diffusion Information Transfer with MetaCanvas Paper • 2512.11464 • Published 20 days ago • 12
V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties Paper • 2512.11799 • Published 19 days ago • 29
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 23 days ago • 115
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published 26 days ago • 28
LATTICE: Democratize High-Fidelity 3D Generation at Scale Paper • 2512.03052 • Published Nov 24, 2025 • 10
SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper • 2512.04797 • Published 28 days ago • 24
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation Paper • 2512.05106 • Published 27 days ago • 15
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 28 days ago • 167
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 45
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published Nov 19, 2025 • 93