Methods
updated
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page
Multi-document Understanding
Paper
•
2411.04952
•
Published
•
29
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion
Models
Paper
•
2411.05005
•
Published
•
13
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for
Evaluating Foundation Models
Paper
•
2411.04075
•
Published
•
16
Self-Consistency Preference Optimization
Paper
•
2411.04109
•
Published
•
19
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge
in RAG Systems
Paper
•
2411.02959
•
Published
•
71
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
99
Transformer^2: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
•
54
Evaluating Sample Utility for Data Selection by Mimicking Model Weights
Paper
•
2501.06708
•
Published
•
5
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
300
3DIS-FLUX: simple and efficient multi-instance generation with DiT
rendering
Paper
•
2501.05131
•
Published
•
37
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token
Marks
Paper
•
2501.08326
•
Published
•
33
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
Paper
•
2501.08292
•
Published
•
17
FastKV: KV Cache Compression for Fast Long-Context Processing with
Token-Selective Propagation
Paper
•
2502.01068
•
Published
•
18
Improving Transformer World Models for Data-Efficient RL
Paper
•
2502.01591
•
Published
•
9
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
•
2501.19324
•
Published
•
39
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing
Guardrail Moderation
Paper
•
2501.17433
•
Published
•
10
AnimeGamer: Infinite Anime Life Simulation with Next Game State
Prediction
Paper
•
2504.01014
•
Published
•
70
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in
One Step
Paper
•
2504.01956
•
Published
•
41
Towards Physically Plausible Video Generation via VLM Planning
Paper
•
2503.23368
•
Published
•
40
VisualCloze: A Universal Image Generation Framework via Visual
In-Context Learning
Paper
•
2504.07960
•
Published
•
50
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
•
2504.08736
•
Published
•
46
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
•
2504.11536
•
Published
•
63
AlayaDB: The Data Foundation for Efficient and Effective Long-context
LLM Inference
Paper
•
2504.10326
•
Published
•
25
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion
Transformers
Paper
•
2504.10483
•
Published
•
21
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
Paper
•
2504.09566
•
Published
•
11
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
Language Model Pre-training
Paper
•
2504.13161
•
Published
•
93
Generate, but Verify: Reducing Hallucination in Vision-Language Models
with Retrospective Resampling
Paper
•
2504.13169
•
Published
•
39
InstantCharacter: Personalize Any Characters with a Scalable Diffusion
Transformer Framework
Paper
•
2504.12395
•
Published
•
16
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU
Inference via Dynamic-Length Float
Paper
•
2504.11651
•
Published
•
31
Complex-Edit: CoT-Like Instruction Generation for
Complexity-Controllable Image Editing Benchmark
Paper
•
2504.13143
•
Published
•
7
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through
Task Tokenization
Paper
•
2503.19901
•
Published
•
41
Self-Supervised Learning of Motion Concepts by Optimizing
Counterfactuals
Paper
•
2503.19953
•
Published
•
3
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
•
2504.20571
•
Published
•
98
ReasonIR: Training Retrievers for Reasoning Tasks
Paper
•
2504.20595
•
Published
•
53
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement
Fine-Tuning
Paper
•
2505.03318
•
Published
•
92
Shifting AI Efficiency From Model-Centric to Data-Centric Compression
Paper
•
2505.19147
•
Published
•
144
ARM: Adaptive Reasoning Model
Paper
•
2505.20258
•
Published
•
45
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
•
2505.19914
•
Published
•
45
Paper
•
2505.19752
•
Published
•
17
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs
Paper
•
2506.15211
•
Published
•
38
SwarmAgentic: Towards Fully Automated Agentic System Generation via
Swarm Intelligence
Paper
•
2506.15672
•
Published
•
15
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal
Large Language Models
Paper
•
2506.14824
•
Published
•
7
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement
Learning
Paper
•
2506.18841
•
Published
•
56
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
•
2506.18896
•
Published
•
29
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
•
2506.18254
•
Published
•
31
TC-Light: Temporally Consistent Relighting for Dynamic Long Videos
Paper
•
2506.18904
•
Published
•
10
FaithfulSAE: Towards Capturing Faithful Features with Sparse
Autoencoders without External Dataset Dependencies
Paper
•
2506.17673
•
Published
•
7
MMSearch-R1: Incentivizing LMMs to Search
Paper
•
2506.20670
•
Published
•
64
FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient
Multi-turn Image Editing
Paper
•
2506.20911
•
Published
•
41
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Paper
•
2506.21506
•
Published
•
51
Scaling RL to Long Videos
Paper
•
2507.07966
•
Published
•
159
A Survey of Context Engineering for Large Language Models
Paper
•
2507.13334
•
Published
•
259
LongCodeZip: Compress Long Context for Code Language Models
Paper
•
2510.00446
•
Published
•
106
Interactive Training: Feedback-Driven Neural Network Optimization
Paper
•
2510.02297
•
Published
•
42