arxiv:2605.02396

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

Published on May 4

· Submitted by

WangJianing on May 6

LongCat

Upvote

Authors:

Abstract

HeavySkill presents a framework where complex reasoning is internalized as an intrinsic model skill rather than relying on external orchestration, demonstrating superior performance through parallel reasoning and summarization stages that can be enhanced via reinforcement learning.

AI-generated summary

Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.

View arXiv page View PDF Project page GitHub 45 Add to collection

Community

wjn1996

Paper submitter 1 day ago

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

HeavySkill is a test-time scaling technique that decomposes complex reasoning into two stages:

Parallel Reasoning — Generate K independent reasoning trajectories concurrently
Sequential Deliberation — Synthesize trajectories through critical analysis into a superior final answer

avahal

about 19 hours ago

the most interesting move here is treating heavy thinking as an internal two-stage skill—parallel reasoning followed by summarization—that travels with the model, not just the harness. i’m curious about the memory cache and deliberation loop: when you serialize many trajectories into the cache, does the final synthesis risk information interference during revisitation? the arxivlens breakdown helped me parse where the bottlenecks live and what the internal skill is actually buying you, especially in terms of transferability across harnesses (https://arxivlens.com/PaperView/Details/heavyskill-heavy-thinking-as-the-inner-skill-in-agentic-harness-8685-925845c1). if you push rlvr to grow both breadth and depth, i’d want to see how compute scales and whether there’s a sweet spot where extra trajectories stop paying off.