The GDN-CC Dataset: Automatic Corpus Clarification for AI-enhanced Democratic Citizen Consultations Paper • 2601.14944 • Published Apr 20 • 5
view article Article DenseOn with the LateOn: Open State-of-the-Art Single and Multi-Vector Models lightonai • Apr 21 • 38
Boosting Visual Instruction Tuning with Self-Supervised Guidance Paper • 2604.12966 • Published Apr 14 • 11
Sparton: Fast and Memory-Efficient Triton Kernel for Learned Sparse Retrieval Paper • 2603.25011 • Published Mar 26 • 1
NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 27
view article Article **ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models?** lightonai • Feb 19 • 21
reproducing-cross-encoders Collection A set of cross-encoders trained from various backbones and losses for equal comparison • 55 items • Updated Mar 5 • 4
ToMMeR -- Efficient Entity Mention Detection from Large Language Models Paper • 2510.19410 • Published Oct 22, 2025 • 4
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling lightonai • Feb 12 • 56
Seq vs Seq: An Open Suite of Paired Encoders and Decoders Paper • 2507.11412 • Published Jul 15, 2025 • 32
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30, 2025 • 550
DIP: Unsupervised Dense In-Context Post-training of Visual Representations Paper • 2506.18463 • Published Jun 23, 2025 • 21