Academics | The Hong Kong University of Science and Technology

Pathology Image-Caption Evaluation
Pathology Image-Caption Evaluation
A t present, a part of the publicly available pathological image pairs data sets such as Quilt are obtained from crawlers on social networks. However, there are some problems in Quilt dataset, such as non-pathological images mixed in pictures, non-pathological descriptions in text, and incorrect correspondence between pictures and text. These data have adverse effects on the training of multimodal models, which makes the performance of the models on downstream tasks worse. Data filtering can filter out data at a high noise level, and models trained on clean subsets will get better performance on downstream tasks.