Developing Safe and Responsible Large Language Models -- A Comprehensive Framework Paper • 2404.01399 • Published Apr 1, 2024 • 1
DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs Paper • 2503.15793 • Published Mar 20, 2025
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models Paper • 2503.01781 • Published Mar 3, 2025 • 2
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs Paper • 2509.08031 • Published Sep 9, 2025 • 21
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents Paper • 2605.13841 • Published 8 days ago • 61
SN-Col/numina-math-9sources-25each-modified-problems-o1-mod-2-only_SNOW Viewer • Updated Feb 27, 2025 • 225 • 14
SN-Col/numina-math-9sources-25each-modified-problems-o1-mod-2-only_SNOW Viewer • Updated Feb 27, 2025 • 225 • 14
SN-Col/numina-math-9sources-25each-modified-problems-o1-mod-2-only Viewer • Updated Feb 27, 2025 • 225 • 24
SN-Col/numina-math-9sources-25each-modified-problems-o1-mod-2-only Viewer • Updated Feb 27, 2025 • 225 • 24
SN-Col/servicenow-r1-numina-math-deepseek-r1-6-selected-suffixes_SNOW Viewer • Updated Feb 21, 2025 • 1.2k • 9 • 1
SN-Col/servicenow-r1-numina-math-deepseek-r1-6-selected-suffixes_SNOW Viewer • Updated Feb 21, 2025 • 1.2k • 9 • 1
SN-Col/numina-math-9sources-25each-modified-problems-o1-responses-with-original-responses-final_SNOW Viewer • Updated Feb 21, 2025 • 480 • 11
SN-Col/numina-math-9sources-25each-modified-problems-o1-responses-with-original-responses-final_SNOW Viewer • Updated Feb 21, 2025 • 480 • 11
BERTology Meets Biology: Interpreting Attention in Protein Language Models Paper • 2006.15222 • Published Jun 26, 2020
GeDi: Generative Discriminator Guided Sequence Generation Paper • 2009.06367 • Published Sep 14, 2020
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Paper • 2210.01970 • Published Sep 30, 2022 • 14
Explain Yourself! Leveraging Language Models for Commonsense Reasoning Paper • 1906.02361 • Published Jun 6, 2019