Submitted by akhaliq 57 CogVLM2: Visual Language Models for Image and Video Understanding · 25 authors 7.06k 5
Submitted by akhaliq 50 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling · 16 authors 1.27k 4
Submitted by akhaliq 32 ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model · 8 authors 2
Submitted by akhaliq 28 SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners · 7 authors 353 2
Submitted by zhuzeyuan 27 Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems · 4 authors 2
Submitted by akhaliq 18 CSGO: Content-Style Composition in Text-to-Image Generation · 8 authors 385 7
Submitted by hallisky 12 StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements · 6 authors 7 4
Submitted by necludov 8 Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold · 8 authors 2