Graphical Abstracts (GAs) play a crucial role in visually conveying the key findings of scientific papers. While recent research has increasingly incorporated visual materials such as Figure 1 as de facto GAs, their potential to enhance scientific communication remains largely unexplored. Moreover, designing effective GAs requires advanced visualization skills, creating a barrier to their widespread adoption. To tackle these challenges, we introduce SciGA-145k, a large-scale dataset comprising approximately 145,000 scientific papers and 1.14 million figures, explicitly designed for supporting GA selection and recommendation as well as facilitating research in automated GA generation. As a preliminary step toward GA design support, we define two tasks: 1) Intra-GA recommendation, which identifies figures within a given paper that are well-suited to serve as GAs, and 2) Inter-GA recommendation, which retrieves GAs from other papers to inspire the creation of new GAs. We provide reasonable baseline models for these tasks. Furthermore, we propose Confidence Adjusted top-1 ground truth Ratio (CAR), a novel recommendation metric that offers a fine-grained analysis of model behavior. CAR addresses limitations in traditional ranking-based metrics by considering cases where multiple figures within a paper, beyond the explicitly labeled GA, may also serve as GAs. By unifying these tasks and metrics, our SciGA-145k establishes a foundation for advancing visual scientific communication while contributing to the development of AI for Science.
Example GAs and their annotations in our SciGA-145k. Our dataset includes three types of GAs: Original (newly created), Reused (directly copied from paper figures), and Modified (combining/altering existing figures). The SciGA-145k uniquely offers full-text content with comprehensive figure support and explicit GA/teaser annotations, featuring elements designed to facilitate GA creation, recommendation, and future automated generation.
To support the design of GAs, we define two recommendation tasks in SciGA-145k:
We benchmark several methods — including caption-aware retrieval using CLIP and Long-CLIP — showing that incorporating figure captions alongside visual features significantly boosts accuracy and consistency.
These tasks provide a foundation for developing tools that automate or assist GA creation, promoting broader adoption and better visual communication in academic publishing.
Examples of Intra-GA Recommendation results demonstrating the intuition behind CAR@k scores. The yellow-highlighted figures represent GTs. Left: High CAR@k indicates the model confidently recommends the correct GA. Center: Medium CAR@k represents cases where multiple candidates are similarly plausible, resulting in lower confidence. Right: Low CAR@k reflects high model confidence but incorrect recommendations, highlighting mismatches between the model’s confidence and actual relevance.
@article{kawada2025sciga,
title={SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers},
author={Takuro Kawada and Shunsuke Kitada and Sota Nemoto and Hitoshi Iyatomi},
journal={arXiv preprint arXiv:2507.xxxxx},
year={2025}
}