Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems Paper • 2602.08847 • Published 3 days ago • 15
LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth Paper • 2602.07962 • Published 4 days ago • 24
LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth Paper • 2602.07962 • Published 4 days ago • 24
view post Post 3389 You can now run Kimi K2.5 locally! 🔥We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.GGUF: unsloth/Kimi-K2.5-GGUFGuide: https://unsloth.ai/docs/models/kimi-k2.5 See translation 7 replies · 🚀 16 16 😎 3 3 🔥 2 2 👀 1 1 + Reply
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 224
SWE-RM: Execution-free Feedback For Software Engineering Agents Paper • 2512.21919 • Published Dec 26, 2025 • 10
SWE-RM: Execution-free Feedback For Software Engineering Agents Paper • 2512.21919 • Published Dec 26, 2025 • 10
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published Nov 23, 2025 • 296
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models Paper • 2511.08577 • Published Nov 11, 2025 • 108
TiDAR: Think in Diffusion, Talk in Autoregression Paper • 2511.08923 • Published Nov 12, 2025 • 127
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 223
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published Oct 24, 2025 • 101
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published Oct 17, 2025 • 148
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards Paper • 2509.24981 • Published Sep 29, 2025 • 29
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18, 2025 • 111
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 190