rlvr-weak-supervision Models from "When Can LLMs Learn to Reason with Weak Supervision?" — Llama-3.2-3B with continual pre-training and Thinking SFT. pavelslab-nyu/Llama-3.2-3B-ThinkSFT 3B • Updated Apr 20 • 3 pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT 3B • Updated Apr 20 • 8 pavelslab-nyu/Llama-3.2-3B-CPT-Math 3B • Updated Apr 20 • 5
rlvr-weak-supervision Models from "When Can LLMs Learn to Reason with Weak Supervision?" — Llama-3.2-3B with continual pre-training and Thinking SFT. pavelslab-nyu/Llama-3.2-3B-ThinkSFT 3B • Updated Apr 20 • 3 pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT 3B • Updated Apr 20 • 8 pavelslab-nyu/Llama-3.2-3B-CPT-Math 3B • Updated Apr 20 • 5