LegendaryDawn/self-debate-exp-Qwen2.3-3B-balance-diff_sol2048-n8-bs256-long8-DAPO-step200 3B • Updated 11 days ago • 10
LegendaryDawn/self-debate-exp-Qwen2.3-3B-balance-diff_sol2048-n8-bs256-long8-DAPO-step200 3B • Updated 11 days ago • 10
LegendaryDawn/self-debate-exp-Qwen2.3-7B-grpo-diff_sol2048-n8-bs256-long8-DAPO-step200 8B • Updated 12 days ago • 5
LegendaryDawn/self-debate-exp-Qwen2.3-7B-grpo-diff_sol2048-n8-bs256-long8-DAPO-step200 8B • Updated 12 days ago • 5
LegendaryDawn/self-debate-baseline-DAPO-Qwen2.5-7B-n8-bs256-long8-step200 8B • Updated 12 days ago • 6
LegendaryDawn/self-debate-baseline-DAPO-Qwen2.5-7B-n8-bs256-long8-step200 8B • Updated 12 days ago • 6
LegendaryDawn/mbpo-iclr-Qwen2_vl_7b_instruct-R_multiyn1-lr5e-7-beta01-Rmmseed-GPT4o-adv-multiyn-12k 8B • Updated 14 days ago • 11
LegendaryDawn/mbpo-iclr-Qwen2_vl_7b_instruct-R_multiyn1-lr5e-7-beta01-Rmmseed-GPT4o-adv-multiyn-12k 8B • Updated 14 days ago • 11
LegendaryDawn/mbpo-iclr-Qwen2_5_vl_7b_instruct-R_multiyn1-beta02-lr2e-7-mixed-10-8-64-12k 8B • Updated 14 days ago • 9
LegendaryDawn/mbpo-iclr-Qwen2_5_vl_7b_instruct-R_multiyn1-beta02-lr2e-7-mixed-10-8-64-12k 8B • Updated 14 days ago • 9
LegendaryDawn/self-debate-exp-Qwen2.3-3B-grpo-diff_sol2048-n8-bs256-long8-DAPO-step200 3B • Updated 16 days ago • 38
LegendaryDawn/self-debate-exp-Qwen2.3-3B-grpo-diff_sol2048-n8-bs256-long8-DAPO-step200 3B • Updated 16 days ago • 38
LegendaryDawn/self-debate-baseline-DAPO-Qwen2.5-3B-Instruct-n8-bs256-long8-step200 3B • Updated 22 days ago • 14
LegendaryDawn/self-debate-baseline-DAPO-Qwen2.5-3B-Instruct-n8-bs256-long8-step200 3B • Updated 22 days ago • 14
LegendaryDawn/self-debate-exp-Qwen2.5-3B-diff_sol2048-overall_debate_grpo_loss-n8-bs256-long8-DAPO-step200 3B • Updated 22 days ago • 11
LegendaryDawn/self-debate-exp-Qwen2.5-3B-diff_sol2048-overall_debate_grpo_loss-n8-bs256-long8-DAPO-step200 3B • Updated 22 days ago • 11
LegendaryDawn/self-debate-baseline-dapo-Qwen2.5-3b-n8-bs256-long8-step200 3B • Updated 22 days ago • 37
LegendaryDawn/self-debate-baseline-dapo-Qwen2.5-3b-n8-bs256-long8-step200 3B • Updated 22 days ago • 37
Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models Paper • 2511.04800 • Published Nov 6 • 1
LegendaryDawn/erpo-iclr-rebuttal-llama3.2-3B-Instruct-baseline-dapo-step180-step180 4B • Updated Nov 21 • 3