Deep Learning extracts Histopathological Patterns and accurately discriminates 28 Cancer and 14 Normal Tissue Types: Pan-cancer Computational Histopathology Analysis
Reporter: Aviva Lev-Ari, PhD, RN
3.5.1.1 Deep Learning extracts Histopathological Patterns and accurately discriminates 28 Cancer and 14 Normal Tissue Types: Pan-cancer Computational Histopathology Analysis, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 3: AI in Medicine
Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis
Yu Fu1, Alexander W Jung1, Ramon Viñas Torne1, Santiago Gonzalez1,2, Harald Vöhringer1, Mercedes Jimenez-Linan3, Luiza Moore3,4, and Moritz Gerstung#1,5 # to whom correspondence should be addressed 1) European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK. 2) Current affiliation: Institute for Research in Biomedicine (IRB Barcelona), Parc Científic de Barcelona, Barcelona, Spain. 3) Department of Pathology, Addenbrooke’s Hospital, Cambridge, UK. 4) Wellcome Sanger Institute, Hinxton, UK 5) European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
Correspondence:
Dr Moritz Gerstung European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) Hinxton, CB10 1SA UK. Tel: +44 (0) 1223 494636 E-mail: moritz.gerstung@ebi.ac.uk
Abstract
Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis
Here we use deep transfer learning to quantify histopathological patterns across 17,396 H&E stained histopathology image slides from 28 cancer types and correlate these with underlying genomic and transcriptomic data. Pan-cancer computational histopathology (PC-CHiP) classifies the tissue origin across organ sites and provides highly accurate, spatially resolved tumor and normal distinction within a given slide. The learned computational histopathological features correlate with a large range of recurrent genetic aberrations, including whole genome duplications (WGDs), arm-level copy number gains and losses, focal amplifications and deletions as well as driver gene mutations within a range of cancer types. WGDs can be predicted in 25/27 cancer types (mean AUC=0.79) including those that were not part of model training. Similarly, we observe associations with 25% of mRNA transcript levels, which enables to learn and localise histopathological patterns of molecularly defined cell types on each slide. Lastly, we find that computational histopathology provides prognostic information augmenting histopathological subtyping and grading in the majority of cancers assessed, which pinpoints prognostically relevant areas such as necrosis or infiltrating lymphocytes on each tumour section. Taken together, these findings highlight the large potential of PC-CHiP to discover new molecular and prognostic associations, which can augment diagnostic workflows and lay out a rationale for integrating molecular and histopathological data.
https://www.biorxiv.org/content/10.1101/813543v1
Key points
● Pan-cancer computational histopathology analysis with deep learning extracts histopathological patterns and accurately discriminates 28 cancer and 14 normal tissue types
● Computational histopathology predicts whole genome duplications, focal amplifications and deletions, as well as driver gene mutations
● Wide-spread correlations with gene expression indicative of immune infiltration and proliferation
● Prognostic information augments conventional grading and histopathology subtyping in the majority of cancers
Discussion
Here we presented PC-CHiP, a pan-cancer transfer learning approach to extract computational histopathological features across 42 cancer and normal tissue types and their genomic, molecular and prognostic associations. Histopathological features, originally derived to classify different tissues, contained rich histologic and morphological signals predictive of a range of genomic and transcriptomic changes as well as survival. This shows that computer vision not only has the capacity to highly accurately reproduce predefined tissue labels, but also that this quantifies diverse histological patterns, which are predictive of a broad range of genomic and molecular traits, which were not part of the original training task. As the predictions are exclusively based on standard H&E-stained tissue sections, our analysis highlights the high potential of computational histopathology to digitally augment existing histopathological workflows. The strongest genomic associations were found for whole genome duplications, which can in part be explained by nuclear enlargement and increased nuclear intensities, but seemingly also stems from tumour grade and other histomorphological patterns contained in the high-dimensional computational histopathological features. Further, we observed associations with a range of chromosomal gains and losses, focal deletions and amplifications as well as driver gene mutations across a number of cancer types. These data demonstrate that genomic alterations change the morphology of cancer cells, as in the case of WGD, but possibly also that certain aberrations preferentially occur in distinct cell types, reflected by the tumor histology. Whatever is the cause or consequence in this equation, these associations lay out a route towards genomically defined histopathology subtypes, which will enhance and refine conventional assessment. Further, a broad range of transcriptomic correlations was observed reflecting both immune cell infiltration and cell proliferation that leads to higher tumor densities. These examples illustrated the remarkable property that machine learning does not only establish novel molecular associations from pre-computed histopathological feature sets but also allows the localisation of these traits within a larger image. While this exemplifies the power of a large scale data analysis to detect and localise recurrent patterns, it is probably not superior to spatially annotated training data. Yet such data can, by definition, only be generated for associations which are known beforehand. This appears straightforward, albeit laborious, for existing histopathology classifications, but more challenging for molecular readouts. Yet novel spatial transcriptomic44,45 and sequencing technologies46 bring within reach spatially matched molecular and histopathological data, which would serve as a gold standard in combining imaging and molecular patterns. Across cancer types, computational histopathological features showed a good level of prognostic relevance, substantially improving prognostic accuracy over conventional grading and histopathological subtyping in the majority of cancers. It is this very remarkable that such predictive It is made available under a CC-BY-NC 4.0 International license. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. bioRxiv preprint first posted online Oct. 25, 2019; doi: http://dx.doi.org/10.1101/813543. The copyright holder for this preprint signals can be learned in a fully automated fashion. Still, at least at the current resolution, the improvement over a full molecular and clinical workup was relatively small. This might be a consequence of the far-ranging relations between histopathology and molecular phenotypes described here, implying that histopathology is a reflection of the underlying molecular alterations rather than an independent trait. Yet it probably also highlights the challenges of unambiguously quantifying histopathological signals in – and combining signals from – individual areas, which requires very large training datasets for each tumour entity. From a methodological point of view, the prediction of molecular traits can clearly be improved. In this analysis, we adopted – for the reason of simplicity and to avoid overfitting – a transfer learning approach in which an existing deep convolutional neural network, developed for classification of everyday objects, was fine tuned to predict cancer and normal tissue types. The implicit imaging feature representation was then used to predict molecular traits and outcomes. Instead of employing this two-step procedure, which risks missing patterns irrelevant for the initial classification task, one might directly employ either training on the molecular trait of interest, or ideally multi-objective learning. Further improvement may also be related to the choice of the CNN architecture. Everyday images have no defined scale due to a variable z-dimension; therefore, the algorithms need to be able to detect the same object at different sizes. This clearly is not the case for histopathology slides, in which one pixel corresponds to a defined physical size at a given magnification. Therefore, possibly less complex CNN architectures may be sufficient for quantitative histopathology analyses, and also show better generalisation. Here, in our proof-of-concept analysis, we observed a considerable dependence of the feature representation on known and possibly unknown properties of our training data, including the image compression algorithm and its parameters. Some of these issues could be overcome by amending and retraining the network to isolate the effect of confounding factors and additional data augmentation. Still, given the flexibility of deep learning algorithms and the associated risk of overfitting, one should generally be cautious about the generalisation properties and critically assess whether a new image is appropriately represented. Looking forward, our analyses revealed the enormous potential of using computer vision alongside molecular profiling. While the eye of a trained human may still constitute the gold standard for recognising clinically relevant histopathological patterns, computers have the capacity to augment this process by sifting through millions of images to retrieve similar patterns and establish associations with known and novel traits. As our analysis showed this helps to detect histopathology patterns associated with a range of genomic alterations, transcriptional signatures and prognosis – and highlight areas indicative of these traits on each given slide. It is therefore not too difficult to foresee how this may be utilised in a computationally augmented histopathology workflow enabling more precise and faster diagnosis and prognosis. Further, the ability to quantify a rich set of histopathology patterns lays out a path to define integrated histopathology and molecular cancer subtypes, as recently demonstrated for colorectal cancers47 .
Lastly, our analyses provide It is made available under a CC-BY-NC 4.0 International license. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
bioRxiv preprint first posted online Oct. 25, 2019; doi: http://dx.doi.org/10.1101/813543.
The copyright holder for this preprint proof-of-concept for these principles and we expect them to be greatly refined in the future based on larger training corpora and further algorithmic refinements.
SOURCE
https://www.biorxiv.org/content/biorxiv/early/2019/10/25/813543.full.pdf
Other related articles published in this Open Access Online Scientific Journal include the following:
CancerBase.org – The Global HUB for Diagnoses, Genomes, Pathology Images: A Real-time Diagnosis and Therapy Mapping Service for Cancer Patients – Anonymized Medical Records accessible to anyone on Earth
Reporter: Aviva Lev-Ari, PhD, RN
631 articles had in their Title the keyword “Pathology”
Leave a Reply