Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline Paper • 2603.05484 • Published 4 days ago • 4
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline Paper • 2603.05484 • Published 4 days ago • 4
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision Paper • 2506.06253 • Published Jun 6, 2025 • 9
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision Paper • 2506.06253 • Published Jun 6, 2025 • 9
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published Jun 5, 2025 • 21
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published Jun 5, 2025 • 21
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21, 2025 • 67
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published Apr 4, 2025 • 16
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6, 2025 • 96
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Paper • 2501.14818 • Published Jan 20, 2025 • 9
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding Paper • 2412.12075 • Published Dec 16, 2024 • 1
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation Paper • 2111.02394 • Published Nov 3, 2021 • 2
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21, 2025 • 67