One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications
Abstract
A universal speech enhancement model with configurable algorithmic and computational latency controls using parallel convolutions and early-exit mechanisms.
Different real-time speech applications impose distinct latency budgets, often requiring separately trained enhancement models for each scenario. In this paper, we propose a one-for-all, real-time universal speech enhancement model that provides explicit control over both algorithmic and computational latency. Algorithmic latency is flexibly adjusted via configurable look-ahead frames. To avoid learning inefficiency caused by varying padding configurations, we introduce parallel convolutional layers corresponding to different look-ahead settings. Computational latency is controlled through an early-exit mechanism, enabling inference at different network depths. To narrow the performance gap between specialized and flexible models, we propose a two-stage training strategy with a shared-to-multiple decoder transition. Overall, the proposed framework enables a single model to be deployed across diverse latency budgets without retraining separate models.
Community
model: https://huggingface.co/nvidia/Real-time_RE-USE
HF Space interactive demo: https://huggingface.co/spaces/nvidia/Real-time_RE-USE
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Latency-Configurable Streaming Speech Enhancement via Asymmetric Temporal Padding (2026)
- Online Predictive Coding for Dual-Mode Self-Supervised Speech Model (2026)
- HALO: Half-Frame-Rate Adaptive Learnable Operator for Lightweight STFT-Based Speech Enhancement (2026)
- Real-time Speech Restoration using Data Prediction Mean Flows (2026)
- FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation (2026)
- MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice Conversion (2026)
- G-MaP-SE: Guided Speech Enhancement via GMM-Based Prior Matching (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.25621 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper