Controlling Multimodal LLMs via Reward-guided Decoding Paper • 2508.11616 • Published Aug 15, 2025 • 7
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks Paper • 2210.12786 • Published Oct 23, 2022
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory Paper • 2307.10768 • Published Jul 20, 2023
Learning to Learn: How to Continuously Teach Humans and Machines Paper • 2211.15470 • Published Nov 28, 2022
The Promise of RL for Autoregressive Image Editing Paper • 2508.01119 • Published Aug 1, 2025 • 11 • 3
REARANK: Reasoning Re-ranking Agent via Reinforcement Learning Paper • 2505.20046 • Published May 26, 2025 • 18
sikarwarank/imged_rl_grpo_no_reasoning_ckpt_11600_sftnocomplex_rlcomplex__kl3e_4__lr1e_6__100K_backup_v1_n Updated May 14, 2025