Lung Cancer Biomarker Detection: AI Pathology Model
Real-Time EGFR Prediction from Whole Slide Images Using Deep Learning
Abstract
Accurate and timely EGFR (Epidermal Growth Factor Receptor) mutation status determination is crucial for guiding treatment decisions in Non-Small Cell Lung Cancer (NSCLC). Current molecular testing methods, while accurate, can have significant turnaround times. here, we present EAGLE, a deep learning model capable of predicting EGFR mutation status directly from whole slide images (WSIs) of hematoxylin and eosin (H&E) stained tissue. EAGLE achieves high accuracy, comparable too standard molecular testing, and enables real-time prediction, considerably accelerating the clinical workflow. We demonstrate the prosperous implementation of EAGLE within a clinical pipeline at Memorial Sloan Kettering Cancer Center (MSKCC), showcasing its potential for immediate impact on patient care.
1. Introduction
Non-Small Cell Lung Cancer (NSCLC) is the leading cause of cancer-related mortality worldwide. Targeted therapies, particularly those directed against Epidermal Growth Factor Receptor (EGFR) mutations, have dramatically improved outcomes for a significant subset of NSCLC patients.1 However, the effectiveness of thes therapies hinges on accurate and timely identification of EGFR mutations. Current standard-of-care testing relies on molecular assays like polymerase chain reaction (PCR) or next-generation sequencing (NGS), which, while highly accurate, typically require several days to weeks for results.2 This delay can postpone the initiation of targeted therapy, potentially impacting patient prognosis.
The wealth of morphological facts contained within whole slide images (WSIs) of H&E stained tissue presents an opportunity to develop computational methods for rapid, predictive biomarker assessment. Deep learning, particularly convolutional neural networks (CNNs), has shown remarkable success in analyzing medical images and extracting clinically relevant features.3,4 Here, we introduce EAGLE (EGFR assessment via Gradient-guided Learning Engine), a deep learning model designed for real-time EGFR mutation prediction directly from WSIs, integrated into a clinical pipeline for accelerated biomarker assessment.2. results
2.1. EAGLE Model Architecture and Training
EAGLE is a deep learning model built upon a transformer-based architecture,optimized for analyzing high-resolution WSI data. To address the computational challenges associated with processing large images, we implemented a parallelized encoding strategy.The encoding process is distributed across 23 NVIDIA GPUs,each processing 96 tissue patches,effectively dividing the GPU memory burden. Encoded images are then aggregated on a seperate GPU using Gradient-guided Model Aggregation (GMA) to produce the final classification loss. Backpropagation distributes gradients to each process for synchronized updates. We utilized 16-bit float precision during patch encoding to enable larger batch sizes and accelerate training.
The model was trained on 24 NVIDIA H100-80GB GPUs for 20 epochs, completing in approximately 9.28 hours. At inference, EAGLE can operate efficiently on a single NVIDIA RTX 3090 GPU with 26 GB of memory. The median processing time per slide during inference is 68 seconds, demonstrating its suitability for real-time clinical application. Deployment on lower-capacity hardware is possible, albeit with a trade-off between memory consumption and inference speed.
2.2. Clinical Pipeline Implementation and Performance
We integrated EAGLE into a real-time clinical pipeline at MSKCC, designed to identify and process WSIs from primary LUAD (Lung Adenocarcinoma) specimens for EGFR prediction (Figure 3). MSKCC processes 90-110 NSCLC cases monthly requiring EGFR testing.The pipeline utilizes two automated “watcher” applications running on an hourly cadence: one to identify newly scanned slides and another to identify lung cancer cases sent for molecular analysis. Upon matching a slide to a relevant case, the slide is automatically transferred to the GPU compute infrastructure for immediate EAGLE inference. The first scanned WSI is utilized when multiple slides are available.
During a silent trial, we collected data on EAGLE predictions, rapid test results, and MSK-IMPACT (MSK’s complete genomic profiling platform) results. Timestamps for key events – rapid test accessioning, EAGLE prediction generation, rapid test result availability, and MSK-IMPACT result availability – were recorded to assess the performance of the EAGLE-assisted screening pipeline compared to the standard rapid test workflow.This allowed for a direct comparison of turnaround times and potential for accelerated clinical decision-making.
2.3. Software and Reporting Summary
The EAGLE model was developed using PyTorch (v.2.1.1+cu121), and the associated software pipelines were built with Python (v.3.8
