Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Paper • 2510.24821 • Published Oct 28, 2025 • 40
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published Oct 8, 2025 • 77
Ming-V2 Collection Ming is the multi-modal series of any-to-any models developed by Ant Ling team. • 14 items • Updated 1 day ago • 33
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Paper • 2503.08625 • Published Mar 11, 2025 • 27
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper • 2502.17157 • Published Feb 24, 2025 • 52