Abstract
_paperbanana is an agentic framework that automates the creation of publication-ready academic illustrations using advanced vision-language models and image generation techniques.
Despite rapid advances in autonomous AI scientists powered by language models, generating publication-ready illustrations remains a labor-intensive bottleneck in the research workflow. To lift this burden, we introduce PaperBanana, an agentic framework for automated generation of publication-ready academic illustrations. Powered by state-of-the-art VLMs and image generation models, PaperBanana orchestrates specialized agents to retrieve references, plan content and style, render images, and iteratively refine via self-critique. To rigorously evaluate our framework, we introduce PaperBananaBench, comprising 292 test cases for methodology diagrams curated from NeurIPS 2025 publications, covering diverse research domains and illustration styles. Comprehensive experiments demonstrate that PaperBanana consistently outperforms leading baselines in faithfulness, conciseness, readability, and aesthetics. We further show that our method effectively extends to the generation of high-quality statistical plots. Collectively, PaperBanana paves the way for the automated generation of publication-ready illustrations.
Community
PaperBanana automates publication-ready AI research illustrations via an agentic framework using VLMs and image models, orchestrating reference retrieval, planning, rendering, and self-critique with a benchmarking suite.
This is excellent, I never considered science illustrations as a use-case for image gen models, but it makes total sense and I can see this applying to technical blogging as well.
Interestingly, I had to design a similar pipeline for illustrating games. We're a game studio trying to play "research lab" to push our frontiers, and the need to create structured illustrations at scale, with precision, seems to be a shared objective here.
We're just learning how to write up our results in a more "scientific" way besides "comments.md", and this is a helpful piece of the puzzle.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility (2026)
- APEX: Academic Poster Editing Agentic Expert (2026)
- SciFig: Towards Automating Scientific Figure Generation (2026)
- SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics (2026)
- ProImage-Bench: Rubric-Based Evaluation for Professional Image Generation (2025)
- ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement (2025)
- Unified Thinker: A General Reasoning Modular Core for Image Generation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper