[OSDEnhancer] Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion (arXiv 2026)

Authors: Shuoyan Wei¹, Feng Li^2,*, Chen Zhou¹, Runmin Cong³, Yao Zhao¹, Huihui Bai¹

¹Beijing Jiaotong University, ²Hefei University of Technology, ³Shandong University

^*Corresponding Author

This repository contains the reference code for the paper "Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion".

In this paper, we propose OSDEnhancer, the first framework that achieves real-world STVSR in one-step diffusion. Given a low-resolution and low-frame-rate video as input, OSDEnhancer generates a high-resolution and high-frame-rate video.

OSDEnhancer begins with a linear initialization to establish essential spatiotemporal structures and adapt the model for one-step reconstruction. It then applies a divide-and-conquer strategy, introducing the temporal coherence (TC) and texture enrichment (TE) LoRAs that progressively specialize in inter-frame dynamics modeling and fine-grained texture recovery, respectively, while collaborating during inference for enhanced overall performance. A bidirectional VAE decoder employs deformable recurrent blocks to leverage the multi-scale structure of the vanilla VAE, enhancing latent-to-pixel reconstruction through joint multi-scale deformable aggregation and inter-frame feature propagation.

🔈News

📌 [Jun 2026] We made an important checkpoint fix on Hugging Face to correct the VAE config and Transformer LoRA key names. To reproduce the results reported in our paper, please use the latest checkpoint files 👉
✅ [May 2026] The inference code and pretrained checkpoints are now available 👉
✅ [Jan 2026] The arXiv version of our paper has been released 👉

📚 Installation

git clone https://github.com/W-Shuoyan/OSDEnhancer.git
cd OSDEnhancer
conda create -n OSDEnhancer python=3.10
conda activate OSDEnhancer
pip install torch==2.8.0+cu128 torchvision==0.23.0+cu128 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

🚀 Usage

Pretrained Checkpoints

Important Note: On June 25, 2026, we updated the Hugging Face checkpoint files to correct the VAE config and Transformer LoRA key names. To reproduce the results reported in our paper, please use the latest checkpoint files from Hugging Face.

The pretrained checkpoint is available below.

Model Name	Base Model	Download Link 🔗
OSDEnhancer-v1.0	CogVideoX1.5-5B	🤗 Hugging Face

By default, the inference script automatically loads the checkpoint from Hugging Face. For local checkpoint loading, the checkpoint directory should be organized as follows:

ckpt/
├── transformer/
│   ├── config.json
│   ├── diffusion_pytorch_model-00001-of-00002.safetensors
│   ├── diffusion_pytorch_model-00002-of-00002.safetensors
│   └── diffusion_pytorch_model.safetensors.index.json
├── vae/
│   ├── config.json
│   └── diffusion_pytorch_model.safetensors
├── scheduler/
│   └── scheduler_config.json
└── prompt_embeddings/
    └── empty.safetensors

Inference

Run OSDEnhancer on an input video:

python inference.py \
  --input demo/input.mp4 \
  --output demo/output.mp4 \
  --spatial_scale 4 \
  --temporal_scale 2

For stable inference, we recommend using a GPU with not less than 80GB of VRAM. We recommend setting spatial_scale = 4 and temporal_scale = 2. To use a local checkpoint, specify --ckpt_path. For long videos or high-resolution inputs, enable chunk-based inference by additionally setting --chunk_length and --overlap, where --chunk_length should satisfy the form of 8N+1.

📧 Contact

If you meet any problems, please feel free to contact us via email: shuoyan.wei@bjtu.edu.cn

💡 Cite

If you find this work useful for your research, please consider citing our paper 😊

@article{wei2026osdenhancer,
  title={Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion},
  author={Wei, Shuoyan and Li, Feng and Zhou, Chen and Cong, Runmin and Zhao, Yao and Bai, Huihui},
  journal={arXiv preprint arXiv:2601.20308},
  year={2026}
}

📕 License & Acknowledgement

This project is released under the Apache License 2.0. OSDEnhancer is built upon CogVideoX. We also sincerely thank the authors of DOVE, EvEnhancer, and RealBasicVSR for their excellent open-source implementations, which provided valuable references for this project.

Downloads last month: -

Inference Providers NEW

Video-to-Video

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for W-Shuoyan/OSDEnhancer

Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion

Paper • 2601.20308 • Published May 19