Papers
arxiv:2604.18780

Streaming Structured Inference with Flash-SemiCRF

Published on Apr 20
· Submitted by
Ben Johnson
on Apr 23
Authors:
,
,
,

Abstract

Semi-Markov Conditional Random Fields are enhanced through efficient memory management techniques that enable exact inference on long sequences and large label sets by using on-the-fly computation and streaming algorithms.

AI-generated summary

Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference over segment-level features and principled uncertainty estimates at their boundaries. However, existing implementations must materialize a large edge potential tensor whose size grows with sequence length, maximum segment length, and label count, becoming prohibitive for speech-scale state spaces and intractable at genomic scales where sequences can exceed 100,000 positions. This memory bottleneck has limited the adoption of exact segment-level inference for long sequences and large label sets. We identify that the core inefficiency is materializing edge potentials that can instead be evaluated on-the-fly from a compact prefix-sum array, and make several improvements. First, replacing the stored edge tensor with prefix-sum lookup reduces the memory footprint by a factor proportional to the product of segment length and label count. Second, a streaming forward-backward pass with checkpoint-boundary normalization keeps working memory sublinear in sequence length while preserving exact gradients. Third, zero-centered cumulative scores control numerical drift and induce an adaptive duration prior under label imbalance. We integrate these ideas into Flash-SemiCRF, a fused Triton kernel that enables exact semi-CRF inference on previously intractable problem sizes. Available at https://github.com/biobenkj/flash-semicrf.

Community

Paper author Paper submitter

Flash-SemiCRF makes exact semi-Markov CRF inference tractable at genomic scale (T > 10⁵) by avoiding the O(TKC²) edge tensor - where T is sequence length, K is maximum segment duration, and C is the number of labels - and taking a streaming approach. Three ideas drive the implementation: (1) prefix-sum decomposition enables O(1)-per-edge evaluation from an O(TC) cumulative score array, so the full edge tensor is never stored; (2) ring-buffer streaming keeps forward-backward working memory at O(KC), independent of sequence length; and (3) sublinear gradient checkpointing at √(TK) intervals preserves exact gradients. A globally-centered emission baseline simultaneously controls numerical drift and induces an adaptive, data-dependent duration prior that regularizes segmentation under label imbalance. The system is implemented as a fused Triton kernel. On TIMIT phoneme segmentation, Flash-SemiCRF achieves a 25x training speedup and 178x inference speedup over pytorch-struct's linear scan, the only existing exact backend capable of running K=30 without OOM. We further prove that banded and block-sparse matrix formats cannot exploit the bounded segment length K in tree-structured formulations, as fill-in under composition drives intermediates to near-dense within O(log(T/K)) tree levels.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.18780
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.18780 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.18780 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.18780 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.