Chenlu123/shampoo_npg_tr_scale_delta20_lam1e-12_warmup_1_graftTrue_qwen2_5_math_1_5b Updated 10 days ago
Chenlu123/shampoo_npg_tr_scale_delta20_lam1e-12_warmup_1_graftTrue_qwen2_5_math_1_5b Updated 10 days ago
AgentSPEX: An Agent SPecification and EXecution Language Paper • 2604.13346 • Published 20 days ago • 162
Chenlu123/grpo_warmup_graftTrue_qwen2_5_math_1_5b_guru_n16_bz64_mini_bz64_global_step_80 Updated 26 days ago
Chenlu123/grpo_warmup_graftTrue_qwen2_5_math_1_5b_guru_n16_bz64_mini_bz64_global_step_80 Updated 26 days ago
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL Paper • 2603.19470 • Published Mar 19 • 3
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL Paper • 2603.19470 • Published Mar 19 • 3
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step460 2B • Updated Mar 20 • 3
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step460 2B • Updated Mar 20 • 3
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step440 2B • Updated Mar 20 • 2
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step440 2B • Updated Mar 20 • 2
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step420 2B • Updated Mar 20 • 2