OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning Paper • 2606.26790 • Published 3 days ago • 40
Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles Paper • 2605.22177 • Published May 21 • 21
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning Paper • 2605.00380 • Published May 1 • 7