VLM dataset amaye15/NSFW Viewer • Updated Aug 4, 2024 • 67.5k • 1.08k • 91 wchai/AuroraCap-trainset Preview • Updated Oct 13, 2024 • 1.37k • 9 uclanlp/MRAG-Bench Viewer • Updated Nov 5, 2024 • 1.35k • 712 • 10
VLM For OCR Qwen/Qwen-VL Text Generation • Updated Jan 25, 2024 • 49.1k • 276 google/pix2struct-large Image-to-Text • 1B • Updated Sep 6, 2023 • 1.18k • 34 zai-org/cogagent-chat-hf Text Generation • 18B • Updated Dec 24, 2024 • 459 • 69 openbmb/MiniCPM-Llama3-V-2_5 Image-Text-to-Text • 9B • Updated Jan 15, 2025 • 68.5k • 1.41k
audio Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Paper • 2311.07919 • Published Nov 14, 2023 • 9 Stable Audio Open Paper • 2407.14358 • Published Jul 19, 2024 • 26 OpenMOSS-Team/AnyGPT-chat Text Generation • Updated Jun 5, 2024 • 13 • 19 FBK-MT/mosel Viewer • Updated Oct 7, 2025 • 2.2M • 3.24k • 89
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Paper • 2311.07919 • Published Nov 14, 2023 • 9
MiniCPM-V openbmb/MiniCPM-Llama3-V-2_5 Image-Text-to-Text • 9B • Updated Jan 15, 2025 • 68.5k • 1.41k openbmb/MiniCPM-Llama3-V-2_5-int4 Visual Question Answering • 9B • Updated Feb 27, 2025 • 907 • 78 openbmb/MiniCPM-Llama3-V-2_5-gguf Updated Feb 27, 2025 • 2.65k • 216 openbmb/MiniCPM-V-2 Visual Question Answering • 3B • Updated Jan 15, 2025 • 9.85k • 495
Dataset For OCR TencentARC/Plot2Code Viewer • Updated Aug 17, 2024 • 368 • 1.61k • 34 AIML-TUDA/TEdBench_plusplus Viewer • Updated Jan 23, 2025 • 139 • 92 • 16 hezarai/parsynth-ocr-200k Viewer • Updated May 7, 2024 • 200k • 460 • 21 anubhavmaity/notMNIST Viewer • Updated Dec 21, 2023 • 18.7k • 75 • 1
VLM dataset amaye15/NSFW Viewer • Updated Aug 4, 2024 • 67.5k • 1.08k • 91 wchai/AuroraCap-trainset Preview • Updated Oct 13, 2024 • 1.37k • 9 uclanlp/MRAG-Bench Viewer • Updated Nov 5, 2024 • 1.35k • 712 • 10
MiniCPM-V openbmb/MiniCPM-Llama3-V-2_5 Image-Text-to-Text • 9B • Updated Jan 15, 2025 • 68.5k • 1.41k openbmb/MiniCPM-Llama3-V-2_5-int4 Visual Question Answering • 9B • Updated Feb 27, 2025 • 907 • 78 openbmb/MiniCPM-Llama3-V-2_5-gguf Updated Feb 27, 2025 • 2.65k • 216 openbmb/MiniCPM-V-2 Visual Question Answering • 3B • Updated Jan 15, 2025 • 9.85k • 495
VLM For OCR Qwen/Qwen-VL Text Generation • Updated Jan 25, 2024 • 49.1k • 276 google/pix2struct-large Image-to-Text • 1B • Updated Sep 6, 2023 • 1.18k • 34 zai-org/cogagent-chat-hf Text Generation • 18B • Updated Dec 24, 2024 • 459 • 69 openbmb/MiniCPM-Llama3-V-2_5 Image-Text-to-Text • 9B • Updated Jan 15, 2025 • 68.5k • 1.41k
Dataset For OCR TencentARC/Plot2Code Viewer • Updated Aug 17, 2024 • 368 • 1.61k • 34 AIML-TUDA/TEdBench_plusplus Viewer • Updated Jan 23, 2025 • 139 • 92 • 16 hezarai/parsynth-ocr-200k Viewer • Updated May 7, 2024 • 200k • 460 • 21 anubhavmaity/notMNIST Viewer • Updated Dec 21, 2023 • 18.7k • 75 • 1
audio Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Paper • 2311.07919 • Published Nov 14, 2023 • 9 Stable Audio Open Paper • 2407.14358 • Published Jul 19, 2024 • 26 OpenMOSS-Team/AnyGPT-chat Text Generation • Updated Jun 5, 2024 • 13 • 19 FBK-MT/mosel Viewer • Updated Oct 7, 2025 • 2.2M • 3.24k • 89
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Paper • 2311.07919 • Published Nov 14, 2023 • 9