Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
LuMatic 's Collections
Polskie Modele
VLM Vision Models
STT. Multimodal
Image & Video Generation
LLMs
Code Models
TTS
Embedding RAG
Function Calling
WebGPU

STT. Multimodal

updated Feb 28
Upvote
-

  • Running on Zero
    Agents
    Featured
    115

    Llama3.1 S V0.2 Checkpoint 2024 08 20

    😻
    115

    Chat with Llama3.1 using spoken audio or synthesize speech


  • Menlo/instruction-speech-encodec-v1.5

    Viewer • Updated Aug 19, 2024 • 332k • 1.45k • 7

  • Menlo/llama3-s-v0.1

    Text Generation • 8B • Updated Jul 23, 2024 • 7 • 18

  • FunAudioLLM/SenseVoiceSmall

    Automatic Speech Recognition • Updated 13 days ago • 6.57k • 417

  • microsoft/Phi-4-multimodal-instruct

    Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 529k • 1.6k

  • nvidia/parakeet-tdt-0.6b-v3

    Automatic Speech Recognition • 0.6B • Updated 18 days ago • 99.4k • • 909

  • michaljunczyk/pl-asr-bigos

    Updated Jan 8, 2024 • 12 • 4

  • nvidia/parakeet-tdt-0.6b-v2

    Automatic Speech Recognition • Updated Apr 13 • 365k • 1.49k

  • MediaTek-Research/Breeze-ASR-25

    Automatic Speech Recognition • 2B • Updated Jul 8, 2025 • 9.18k • 125

  • marksverdhei/Qwen3-Voice-Embedding-12Hz-0.6B-onnx

    Feature Extraction • Updated Feb 23 • 21
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs