Product

A t present, a part of the publicly available pathological image pairs data sets such as Quilt are obtained from crawlers on social networks. However, there are some problems in Quilt dataset, such as non-pathological images mixed in pictures, non-pathological descriptions in text, and incorrect correspondence between pictures and text. These data have adverse effects on the training of multimodal models, which makes the performance of the models on downstream tasks worse. Data filtering can filter out data at a high noise level, and models trained on clean subsets will get better performance on downstream tasks.

Academics | The Hong Kong University of Science and Technology

香港科技大学(广州)- 安必平

医疗数据智能联合实验中心