The production process of pathological digital slides involves multiple critical steps, and potential quality issues in any of these steps may lead to defects such as image defocusing and tissue overlap. These abnormal regions result in the loss of pathological structural information, significantly compromising the accuracy and reliability of clinical diagnoses. Therefore, there is an urgent need to develop a rapid and efficient algorithmic framework to precisely identify and filter problematic regions, while further investigating the interference mechanisms and quantifying the impact of such low-quality image patches on the training of intelligent pathological analysis models.
Existing open-source pathology text-image paired datasets (e.g., Quilt-1m) are constructed by extracting frames from YouTube videos, though initial filtering strategies have been applied, significant noise (e.g., non-pathological images) remains. Training classifiers on datasets of varying scales and architectures demonstrates substantial performance disparities among different models. Experimental results further indicate that fine-tuning large models using an optimized dataset (filtered to exclude non-pathological data) significantly enhances their performance in downstream tasks.
Exploration of the development of vector retrieval technology, transitioning from traditional CPU-based indexing methods to modern GPU-accelerated solutions. Highlights various classic algorithms in Approximate Nearest Neighbor Search (ANNS) and their applications in large-scale data processing, particularly focusing on their crucial role in fast semantic search and real-time decision support in the medical field.
Data filtering can filter out data with high noise level
This project is focused on constructing a comprehensive benchmark to evaluate the performance of multimodal large language models (MLLMs) in breast cancer tasks.
The "Scaling Law for Pathology" project is aimed at exploring the efficacy of scaling laws within the specialized domain of pathology.
The high cost of labeling in the medical field has led to a lack of annotated data related to whole slide imaging (WSI), thereby limiting the performance of many downstream tasks, such as training and application of pathology CLIP.
KB-enhanced Pathology CLIP addresses the variability in performance of pathology foundation models across different branches of pathology.
Our research is focused on constructing a comprehensive benchmark to evaluate the performance
Acquiring high-quality training data is critical for the development of highly accurate and robust machine learning models, particularly as foundational models emerge.
Mining and analyzing private pathology data, and the rich information in the field of pathology is integrated by constructing refined pathology knowledge graph.