ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models Paper • 2510.16928 • Published Oct 19, 2025 • 4
Genomic Next-Token Predictors are In-Context Learners Paper • 2511.12797 • Published Nov 16, 2025 • 7
Genomic Next-Token Predictors are In-Context Learners Paper • 2511.12797 • Published Nov 16, 2025 • 7 • 2
SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains Paper • 2507.07229 • Published Jul 9, 2025 • 11
World-in-World: World Models in a Closed-Loop World Paper • 2510.18135 • Published Oct 20, 2025 • 76
World-in-World: World Models in a Closed-Loop World Paper • 2510.18135 • Published Oct 20, 2025 • 76
MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification Paper • 2505.18452 • Published May 24, 2025 • 4
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9, 2025 • 41
IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning Paper • 2509.22621 • Published Sep 26, 2025 • 8
The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks Paper • 2509.25671 • Published Sep 30, 2025 • 6
mmBERT: A Modern Multilingual Encoder with Annealed Language Learning Paper • 2509.06888 • Published Sep 8, 2025 • 12
Jailbreak Distillation: Renewable Safety Benchmarking Paper • 2505.22037 • Published May 28, 2025 • 1
The Trickle-down Impact of Reward (In-)consistency on RLHF Paper • 2309.16155 • Published Sep 28, 2023 • 1
Jailbreak Distillation: Renewable Safety Benchmarking Paper • 2505.22037 • Published May 28, 2025 • 1
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2, 2025 • 24
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2, 2025 • 24