Academics | The Hong Kong University of Science and Technology

Cleaned Pathology Image-Caption Dataset (all)
T he pathological image and text dataset collected and organized from both public and non-public primarily includes pathological images and their corresponding textual annotations obtained from open sources such as social media platforms, forums, medical papers, pathology tutorial videos, and textbooks, as well as pathological reports and teaching notes from more specialized cooperative institutions and hospitals that are not publicly accessible. These image-text pairs have been applied with basic cleaning and curation, with descriptions outside the domain of pathology and low-readability textual annotations being removed. The dataset contains more precise multimodal knowledge and corresponding relationships defined in the area of pathology.