Abstract
HeavySkill presents a framework where complex reasoning is internalized as an intrinsic model skill rather than relying on external orchestration, demonstrating superior performance through parallel reasoning and summarization stages that can be enhanced via reinforcement learning.
Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.
Community
HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness
HeavySkill is a test-time scaling technique that decomposes complex reasoning into two stages:
- Parallel Reasoning — Generate K independent reasoning trajectories concurrently
- Sequential Deliberation — Synthesize trajectories through critical analysis into a superior final answer
the most interesting move here is treating heavy thinking as an internal two-stage skill—parallel reasoning followed by summarization—that travels with the model, not just the harness. i’m curious about the memory cache and deliberation loop: when you serialize many trajectories into the cache, does the final synthesis risk information interference during revisitation? the arxivlens breakdown helped me parse where the bottlenecks live and what the internal skill is actually buying you, especially in terms of transferability across harnesses (https://arxivlens.com/PaperView/Details/heavyskill-heavy-thinking-as-the-inner-skill-in-agentic-harness-8685-925845c1). if you push rlvr to grow both breadth and depth, i’d want to see how compute scales and whether there’s a sweet spot where extra trajectories stop paying off.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AgentV-RL: Scaling Reward Modeling with Agentic Verifier (2026)
- LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning (2026)
- XSkill: Continual Learning from Experience and Skills in Multimodal Agents (2026)
- Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search (2026)
- RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and a Simple Solution (2026)
- Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents (2026)
- From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.02396 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper