RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
•
2412.14922
•
Published
•
88
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
•
2412.17256
•
Published
•
47
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper
•
2412.17747
•
Published
•
32
Outcome-Refining Process Supervision for Code Generation
Paper
•
2412.15118
•
Published
•
19
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
•
2501.03262
•
Published
•
103
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
115
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
431
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
•
2501.12599
•
Published
•
126
Towards General-Purpose Model-Free Reinforcement Learning
Paper
•
2501.16142
•
Published
•
30
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
•
2501.17703
•
Published
•
59