EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge Paper โข 2601.09142 โข Published 6 days ago โข 8 โข 3
DramaBench: A Six-Dimensional Evaluation Framework for Drama Script Continuation Paper โข 2512.19012 โข Published 29 days ago โข 16 โข 4
DramaBench: A Six-Dimensional Evaluation Framework for Drama Script Continuation Paper โข 2512.19012 โข Published 29 days ago โข 16 โข 4