Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
apple 's Collections
SimpleSD
CLaRa
FastVLM
MobileCLIP2
DiffuCoder
AIMv2
Core ML Gallery Models
OpenELM Instruct Models
OpenELM Pretrained Models
MobileCLIP Models + DataCompDR Data
TiC-CLIP
DepthPro Models
Core ML Stable Diffusion
Core ML FastViT
Core ML Depth Anything
DFN Models + Data
AIM
DCLM
Core ML Segment Anything 2

FastVLM

updated Mar 2

Efficient Vision Encoding for Vision Language Models

Upvote
113

  • FastVLM: Efficient Vision Encoding for Vision Language Models

    Paper • 2412.13303 • Published Dec 17, 2024 • 77

  • Configuration error
    Featured
    446

    FastVLM WebGPU

    🍎
    446

    Real-time video captioning powered by FastVLM


  • apple/FastVLM-0.5B

    Text Generation • 0.8B • Updated Sep 3, 2025 • 20.4k • 393

  • apple/FastVLM-1.5B

    Text Generation • 2B • Updated Sep 3, 2025 • 3.02k • 80

  • apple/FastVLM-7B

    Text Generation • 8B • Updated Sep 3, 2025 • 1.41k • 270

  • apple/FastVLM-0.5B-fp16

    0.6B • Updated Sep 3, 2025 • 339 • 27

    Note MLX checkpoint


  • apple/FastVLM-1.5B-int8

    0.5B • Updated Sep 3, 2025 • 167 • 20

    Note MLX checkpoint


  • apple/FastVLM-7B-int4

    1B • Updated Sep 3, 2025 • 78 • 31

    Note MLX checkpoint

Upvote
113
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs