Enhancing PyBibX: Extensions for Data Acquisition, Snowballing, Venue Ranking and Multivocal Literature Analysis
Abstract:
This work extends the capabilities of PyBibX, a Python library for bibliometric and scientometric analysis, by integrating several advanced features aimed at supporting rigorous and reproducible
literature review workflows. The contribution enhances the data acquisition layer by adding automated coverage checks and search functionalities across multiple scientific sources, including Scopus, ACM Digital Library, IEEE Xplore, GitHub, and Zenodo, thus enabling both academic and grey literature collection within a unified interface.
Furthermore, the project introduces full support for backward and forward snowballing. Using Scopus as the primary provider, the system retrieves citing and cited documents, handles API-specific constraints, combines Scopus Abstract Retrieval with fallbacks to CrossRef when necessary, and computes citation counts for each processed article. These functionalities allow researchers to expand evidence bases systematically and minimize manual effort in citation chasing.
An additional contribution is the integration of reputable venue-ranking sources (CORE and SCIMAGO), enabling automated assessment of publication venues by name and year, with fuzzy matching and multi-year coverage checks.
Finally, the work includes an evaluation of the visualization layer and proposes improvements to ensure more effective data exploration and interpretability within the reference PyBibX repository.
Overall, the extended framework strengthens PyBibX as a practical tool for conducting structured, multivocal literature reviews, supporting both traditional bibliometric pipelines and exploratory analyses.