Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation Paper • 2602.16990 • Published 19 days ago • 11
Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation Paper • 2602.16990 • Published 19 days ago • 11
Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments Paper • 2601.07606 • Published Jan 12
RNAGenScape: Property-Guided, Optimized Generation of mRNA Sequences with Manifold Langevin Dynamics Paper • 2510.24736 • Published Oct 14, 2025 • 1
Dispersion Loss Counteracts Embedding Condensation and Improves Generalization in Small Language Models Paper • 2602.00217 • Published Jan 30 • 1
CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models Paper • 2510.15962 • Published Oct 11, 2025
Self-Supervised Visual Prompting for Cross-Domain Road Damage Detection Paper • 2511.12410 • Published Nov 16, 2025
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs Paper • 2510.08886 • Published Oct 10, 2025 • 20
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents Paper • 2510.11695 • Published Oct 13, 2025 • 3
FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation Paper • 2511.14998 • Published Nov 19, 2025
Ebisu: Benchmarking Large Language Models in Japanese Finance Paper • 2602.01479 • Published Feb 1 • 17
Ebisu: Benchmarking Large Language Models in Japanese Finance Paper • 2602.01479 • Published Feb 1 • 17
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents Paper • 2510.11695 • Published Oct 13, 2025 • 3
FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation Paper • 2511.14998 • Published Nov 19, 2025
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models Paper • 2601.03425 • Published Jan 6 • 16
All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection Paper • 2601.04160 • Published Jan 7 • 4
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection Paper • 2601.05403 • Published Jan 8 • 10
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models Paper • 2601.03425 • Published Jan 6 • 16
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models Paper • 2601.03425 • Published Jan 6 • 16
MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment Paper • 2512.09636 • Published Dec 10, 2025 • 26