Video-to-Video
Diffusers
Safetensors

[OSDEnhancer] Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion (arXiv 2026)

Authors: Shuoyan Wei1, Feng Li2,*, Chen Zhou1, Runmin Cong3, Yao Zhao1, Huihui Bai1

1Beijing Jiaotong University, 2Hefei University of Technology, 3Shandong University

*Corresponding Author

arXiv Hugging Face GitHub Stars

This repository contains the reference code for the paper "Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion".


HEAD

In this paper, we propose OSDEnhancer, the first framework that achieves real-world STVSR in one-step diffusion. Given a low-resolution and low-frame-rate video as input, OSDEnhancer generates a high-resolution and high-frame-rate video.

OSDEnhancer begins with a linear initialization to establish essential spatiotemporal structures and adapt the model for one-step reconstruction. It then applies a divide-and-conquer strategy, introducing the temporal coherence (TC) and texture enrichment (TE) LoRAs that progressively specialize in inter-frame dynamics modeling and fine-grained texture recovery, respectively, while collaborating during inference for enhanced overall performance. A bidirectional VAE decoder employs deformable recurrent blocks to leverage the multi-scale structure of the vanilla VAE, enhancing latent-to-pixel reconstruction through joint multi-scale deformable aggregation and inter-frame feature propagation.

πŸ”ˆNews

  • πŸ“Œ [Jun 2026] We made an important checkpoint fix on Hugging Face to correct the VAE config and Transformer LoRA key names. To reproduce the results reported in our paper, please use the latest checkpoint files πŸ‘‰ Hugging Face
  • βœ… [May 2026] The inference code and pretrained checkpoints are now available πŸ‘‰ GitHub Stars
  • βœ… [Jan 2026] The arXiv version of our paper has been released πŸ‘‰ arXiv

πŸ“š Installation

git clone https://github.com/W-Shuoyan/OSDEnhancer.git
cd OSDEnhancer
conda create -n OSDEnhancer python=3.10
conda activate OSDEnhancer
pip install torch==2.8.0+cu128 torchvision==0.23.0+cu128 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

πŸš€ Usage

Pretrained Checkpoints

Important Note: On June 25, 2026, we updated the Hugging Face checkpoint files to correct the VAE config and Transformer LoRA key names. To reproduce the results reported in our paper, please use the latest checkpoint files from Hugging Face.

The pretrained checkpoint is available below.

Model Name Base Model Download Link πŸ”—
OSDEnhancer-v1.0 CogVideoX1.5-5B πŸ€— Hugging Face

By default, the inference script automatically loads the checkpoint from Hugging Face. For local checkpoint loading, the checkpoint directory should be organized as follows:

ckpt/
β”œβ”€β”€ transformer/
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ diffusion_pytorch_model-00001-of-00002.safetensors
β”‚   β”œβ”€β”€ diffusion_pytorch_model-00002-of-00002.safetensors
β”‚   └── diffusion_pytorch_model.safetensors.index.json
β”œβ”€β”€ vae/
β”‚   β”œβ”€β”€ config.json
β”‚   └── diffusion_pytorch_model.safetensors
β”œβ”€β”€ scheduler/
β”‚   └── scheduler_config.json
└── prompt_embeddings/
    └── empty.safetensors

Inference

Run OSDEnhancer on an input video:

python inference.py \
  --input demo/input.mp4 \
  --output demo/output.mp4 \
  --spatial_scale 4 \
  --temporal_scale 2

For stable inference, we recommend using a GPU with not less than 80GB of VRAM. We recommend setting spatial_scale = 4 and temporal_scale = 2. To use a local checkpoint, specify --ckpt_path. For long videos or high-resolution inputs, enable chunk-based inference by additionally setting --chunk_length and --overlap, where --chunk_length should satisfy the form of 8N+1.

πŸ“§ Contact

If you meet any problems, please feel free to contact us via email: shuoyan.wei@bjtu.edu.cn

πŸ’‘ Cite

If you find this work useful for your research, please consider citing our paper 😊

@article{wei2026osdenhancer,
  title={Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion},
  author={Wei, Shuoyan and Li, Feng and Zhou, Chen and Cong, Runmin and Zhao, Yao and Bai, Huihui},
  journal={arXiv preprint arXiv:2601.20308},
  year={2026}
}

πŸ“• License & Acknowledgement

This project is released under the Apache License 2.0. OSDEnhancer is built upon CogVideoX. We also sincerely thank the authors of DOVE, EvEnhancer, and RealBasicVSR for their excellent open-source implementations, which provided valuable references for this project.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for W-Shuoyan/OSDEnhancer