Posts Tagged ‘cancer progression’

Machine Learning (ML) in cancer prognosis prediction helps the researcher to identify multiple known as well as candidate cancer diver genes

Curator and Reporter: Dr. Premalata Pati, Ph.D., Postdoc

This image has an empty alt attribute; its file name is morethanthes.jpg
Seeing “through” the cancer with the power of data analysis — possible with the help of artificial intelligence. Credit: MPI f. Molecular Genetics/ Ella Maru Studio
Image Source: https://medicalxpress.com/news/2021-04-sum-mutations-cancer-genes-machine.html

Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low-risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) and Artificial Intelligence (AI) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions by predicting new algorithms.

In the majority of human cancers, heritable loss of gene function through cell division may be mediated as often by epigenetic as by genetic abnormalities. Epigenetic modification occurs through a process of interrelated changes in CpG island methylation and histone modifications. Candidate gene approaches of cell cycle, growth regulatory and apoptotic genes have shown epigenetic modification associated with loss of cognate proteins in sporadic pituitary tumors.

On 11th November 2020, researchers from the University of California, Irvine, has established the understanding of epigenetic mechanisms in tumorigenesis and publicized a previously undetected repertoire of cancer driver genes. The study was published in “Science Advances

Researchers were able to identify novel tumor suppressor genes (TSGs) and oncogenes (OGs), particularly those with rare mutations by using a new prediction algorithm, called DORGE (Discovery of Oncogenes and tumor suppressor genes using Genetic and Epigenetic features) by integrating the most comprehensive collection of genetic and epigenetic data.

The senior author Wei Li, Ph.D., the Grace B. Bell chair and professor of bioinformatics in the Department of Biological Chemistry at the UCI School of Medicine said

Existing bioinformatics algorithms do not sufficiently leverage epigenetic features to predict cancer driver genes, even though epigenetic alterations are known to be associated with cancer driver genes.

The Study

This study demonstrated how cancer driver genes, predicted by DORGE, included both known cancer driver genes and novel driver genes not reported in current literature. In addition, researchers found that the novel dual-functional genes, which DORGE predicted as both TSGs and OGs, are highly enriched at hubs in protein-protein interaction (PPI) and drug/compound-gene networks.

Prof. Li explained that the DORGE algorithm, successfully leveraged public data to discover the genetic and epigenetic alterations that play significant roles in cancer driver gene dysregulation and could be instrumental in improving cancer prevention, diagnosis and treatment efforts in the future.

Another new algorithmic prediction for the identification of cancer genes by Machine Learning has been carried out by a team of researchers at the Max Planck Institute for Molecular Genetics (MPIMG) in Berlin and the Institute of Computational Biology of Helmholtz Zentrum München combining a wide variety of data analyzed it with “Artificial Intelligence” and identified numerous cancer genes. They termed the algorithm as EMOGI (Explainable Multi-Omics Graph Integration). EMOGI can predict which genes cause cancer, even if their DNA sequence is not changed. This opens up new perspectives for targeted cancer therapy in personalized medicine and the development of biomarkers. The research was published in Nature Machine Intelligence on 12th April 2021.

In cancer, cells get out of control. They proliferate and push their way into tissues, destroying organs and thereby impairing essential vital functions. This unrestricted growth is usually induced by an accumulation of DNA changes in cancer genes—i.e. mutations in these genes that govern the development of the cell. But some cancers have only very few mutated genes, which means that other causes lead to the disease in these cases.

The Study

Overlap of EMOGI’s positive predictions with known cancer genes (KCGs) and candidate cancer genes
Image Source: https://static-content.springer.com/esm/art%3A10.1038%2Fs42256-021-00325-y/MediaObjects/42256_2021_325_MOESM1_ESM.pdf

The aim of the study has been represented in 4 main headings

  • Additional targets for personalized medicine
  • Better results by combination
  • In search of hints for further studies
  • Suitable for other types of diseases as well

The team was headed by Annalisa Marsico. The team used the algorithm to identify 165 previously unknown cancer genes. The sequences of these genes are not necessarily altered-apparently, already a dysregulation of these genes can lead to cancer. All of the newly identified genes interact closely with well-known cancer genes and be essential for the survival of tumor cells in cell culture experiments. The EMOGI can also explain the relationships in the cell’s machinery that make a gene a cancer gene. The software integrates tens of thousands of data sets generated from patient samples. These contain information about DNA methylations, the activity of individual genes and the interactions of proteins within cellular pathways in addition to sequence data with mutations. In these data, a deep-learning algorithm detects the patterns and molecular principles that lead to the development of cancer.

Marsico says

Ideally, we obtain a complete picture of all cancer genes at some point, which can have a different impact on cancer progression for different patients

Unlike traditional cancer treatments such as chemotherapy, personalized treatments are tailored to the exact type of tumor. “The goal is to choose the best treatment for each patient, the most effective treatment with the fewest side effects. In addition, molecular properties can be used to identify cancers that are already in the early stages.

Roman Schulte-Sasse, a doctoral student on Marsico’s team and the first author of the publication says

To date, most studies have focused on pathogenic changes in sequence, or cell blueprints, at the same time, it has recently become clear that epigenetic perturbation or dysregulation gene activity can also lead to cancer.

This is the reason, researchers merged sequence data that reflects blueprint failures with information that represents events in cells. Initially, scientists confirmed that mutations, or proliferation of genomic segments, were the leading cause of cancer. Then, in the second step, they identified gene candidates that are not very directly related to the genes that cause cancer.

Clues for future directions

The researcher’s new program adds a considerable number of new entries to the list of suspected cancer genes, which has grown to between 700 and 1,000 in recent years. It was only through a combination of bioinformatics analysis and the newest Artificial Intelligence (AI) methods that the researchers were able to track down the hidden genes.

Schulte-Sasse says “The interactions of proteins and genes can be mapped as a mathematical network, known as a graph.” He explained by giving an example of a railroad network; each station corresponds to a protein or gene, and each interaction among them is the train connection. With the help of deep learning—the very algorithms that have helped artificial intelligence make a breakthrough in recent years – the researchers were able to discover even those train connections that had previously gone unnoticed. Schulte-Sasse had the computer analyze tens of thousands of different network maps from 16 different cancer types, each containing between 12,000 and 19,000 data points.

Many more interesting details are hidden in the data. Patterns that are dependent on particular cancer and tissue were seen. The researchers were also observed this as evidence that tumors are triggered by different molecular mechanisms in different organs.

Marsico explains

The EMOGI program is not limited to cancer, the researchers emphasize. In theory, it can be used to integrate diverse sets of biological data and find patterns there. It could be useful to apply our algorithm for similarly complex diseases for which multifaceted data are collected and where genes play an important role. An example might be complex metabolic diseases such as diabetes.

Main Source

New prediction algorithm identifies previously undetected cancer driver genes


Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms


Other Related Articles published in this Open Access Online Scientific Journal include the following:

AI System Used to Detect Lung Cancer

Reporter: Irina Robu, PhD


Deep Learning extracts Histopathological Patterns and accurately discriminates 28 Cancer and 14 Normal Tissue Types: Pan-cancer Computational Histopathology Analysis

Reporter: Aviva Lev-Ari, PhD, RN


Evolution of the Human Cell Genome Biology Field of Gene Expression, Gene Regulation, Gene Regulatory Networks and Application of Machine Learning Algorithms in Large-Scale Biological Data Analysis

Curator & Reporter: Aviva Lev-Ari, PhD, RN


Cancer detection and therapeutics

Curator: Larry H. Bernstein, MD, FCAP


Free Bio-IT World Webinar: Machine Learning to Detect Cancer Variants

Reporter: Stephen J. Williams, PhD


Artificial Intelligence: Genomics & Cancer


Premalata Pati, PhD, PostDoc in Biological Sciences, Medical Text Analysis with Machine Learning


Read Full Post »

Author, reporter: Tilda Barliya PhD

Breast cancer is the second most common cancer worldwide after lung cancer, the fifth most  common cause of cancer death, and the leading  cause of cancer death in women. the global burden of  breast cancer exceeds all other cancers and the incidence  rates of breast cancer are increasing (1,2).

The heterogeneity of breast cancers makes them both a fascinating and challenging solid tumor to diagnose and treat. Here is a great review of the molecular pathology of breast cancer progression (3).

The molecular pathology of breast cancer progression” by Alessandro Bombonati  and Dennis C Sgroi.

Breast cancer is the most frequent carcinoma in females and the second most common cause of cancer related mortality in women. Approximately 54 000 and 207 000 new cases of in situ and invasive breast carcinoma, respectively. Overall, breast cancer incidence rates have levelled off since 1990, with a decrease of 3.5%/year from 2001 to 2004.  Most notably, during this same time period, breast cancer mortality rates have declined 24%, with the largest impact among young women and women with estrogen receptor (ER)-positive disease.

The decline in breast cancer mortality has been attributed to the combination of early detection with screening programmes and the advent of more efficacious adjuvant progression have aided in the discovery of novel pathway-specific targeted therapeutics, and the emergence of such effective therapeutics is currently driving the need for molecular-based, ‘patient-tailored’ treatment planning.

Proposed models of human breast cancer progression

An external file that holds a picture, illustration, etc.Object name is nihms247118f1.jpg Object name is nihms247118f1.jpg

Epidemiological and morp

hological observations led to the formulation of several linear models of breast cancer initiation, transformation and

progression. Figure 1

The ductal and lobular subtypes constitute the majority of all breast cancers worldwide, with the ductal subtype accounting for 40–75% of all diagnosed cases.

The classic model of breast cancer progression of the ductal type proposes thatneoplastic evolution initiates in normal epithelium (normal), progresses to flat epithelial atypia (FEA), advances to atypical ductalhyperplasia (ADH), evolves to ductal carcinoma in situ (DCIS) and culminates as invasive ductal carcinoma (IDC).

The model of lobular neoplasia proposes a multi-step progression from normal epithelium to atypicallobular hyperplasia, lobular carcinoma in situ (LCIS) and invasive lobular carcinoma (ILC).

The cell of origin of breast cancer: the clonal and stem cell hypotheses

The two leading models accounting for breast carcinogenesis are the sporadic clonal evolution model and the cancer stem cell (cSC) model. According to the sporadic clonal evolution hypothesis, any breast epithelial cell can be the target of random mutations. The cells with advantageous genetic and epigenetic alterations are selected over time to contribute to tumour progression. The third alternative cSC model postulates that only stem and progenitor cells (representing a small fraction of the tumor cells within the cancer) can initiate and maintain tumor progression. Figure 2.

Normal breast stem cells (nBSCs) are long-lived, tissue-resident cells capable of self-renewal activity and multi-lineage differentiation that can recapitulate the breast tubulolobular architecture that is composed of luminal and myoepithelial cells.

As normal breast cancer stem cells are long-time tissue residents, it has been proposed that such cells are candidates for accumulating genetic and epigenetic modifications. It has been further proposed that such molecular alterations result in deregulation of normal self-renewal, leading to the development of a cancer stem cell (cSC).

It is believed that the cSC undergoes asymmetrical division, maintaining the stem cell population while at the same time differentiating into committed progenitor(s) cells that give rise to the different breast cancer subtypes.

A second scenario, as it relates to breast cancer development, is one in which the cancer-initiating cells are derived from committed progenitor cells that spawn different breast cancer subtypes. Both scenarios are highly supported.

Molecular analysis of the different stages of breast cancer progression

An external file that holds a picture, illustration, etc.Object name is nihms247118f3.jpg Object name is nihms247118f3.jpg

Genomic and transcriptomic data in combination with morphological and immunohistochemical data stratify the majority of breast cancers into a “low-grade-like” molecular pathway and a “high-grade-like” molecular pathway. Figure 3. The low-grade-like pathway (left hand side) is characterized by recurrent chromosomal loss of 16q, gains of 1q, a low-grade-like gene expression signature, and the expression of estrogen and progesterone receptors (ER+ and PR+). The progression (vertical arrows) along this pathway (green rectangles) culminates with the formation of low and intermediate grade invasive ductal, (LG IDC and IG IDC) and invasive lobular carcinomas including both the classic (ILC) and the pleomorphic variant (pILC). The tumors arising from the low grade pathway are classified as luminal consisting of a continuum of gene expression frequently associated with the absence (luminal A) or presence of HER2 expression (luminal B). The vast majority of ILCs and pILCs and their precursors cluster together within the luminal subtype. The high grade-like gene expression molecular pathway (right hand side) is characterized by recurrent gain of 11q13 (+11q13), loss of 13q (13q−), expression of a high-grade-like gene expression signature, amplification of 17q12 (17q12AMP), and lack of estrogen and progesterone receptors expression (ER− and PR−). The progression along this pathway (red rectangles) includes intermediate and high grade ductal carcinomas that are stratified as HER2, or basal-like, depending on the expression/amplification of HER2. The molecular apocrine subtype, characterized by the lack of ER expression and presence of AR expression, arises from the high grade pathway. The model also depicts intra-pathway tumor grade progression (horizontal arrows).

Although the genomic and transcriptomic data presented in this review support the divergent model of breast cancer progression, the clinical experience indicates that tumors within each pathway are still fairly heterogeneous with respect to clinical outcome suggesting that even this advanced molecular progression scheme is oversimplified.

The future application of massively parallel sequencing technologies to the preinvasive stages of breast cancer will assist in assessing intratumoral heterogeneity during the transition from preinvasive to invasive breast cancer, and may assist in identifying early tumor initiating genetic events.


Over the past decade the integration of numerous genomic and transcriptomic analyses of the various stages of breast cancer has generated multiple novel insights in the complex process of breast cancer progression.

  • First, human breast cancer appears to progress along two distinct molecular genetic pathways that strongly associate with tumor grade.
  • Second, in the epithelial and non-epithelial components of the tumor microenvironment, the greatest molecular alterations (at the gene expression level) occur prior to local invasion.
  • Third, in the epithelial compartment, no major additional gene expression changes occur between the preinvasive and invasive stages of breast cancer.
  • Fourth, the non-epithelial compartment of the tumor micromilieu undergoes dramatic epigenetic and gene expression alterations occur during the transition form preinvasive to invasive disease. Despite these significant advances, we have only begun to scratch the surface of this multifaceted biological process. With the advent of additional novel high-throughput genetic, epigenetic and proteomic technologies, it is anticipated that the next decade of breast cancer research will gain an equally paralleled appreciation for the complexity breast cancer progression. It is with great hope that knowledge gained from such studies will provide for more effective strategies to not only treat, but also prevent breast cancer.


1. http://www.nature.com/nrclinonc/journal/v7/n12/pdf/nrclinonc.2010.192.pdf

2. Jemal, a. et al. CA Cancer J. Clin. 60, 277–300; 2010

3. Alessandro Bombonati and Dennis C Sgro. The molecular pathology of breast cancer progression. J Pathol 2011; 223: 307–317.



4. Rodney C. Richie and John O. Swanson. Breast Cancer: A Review of the Literature. J Insur Med 2003;35:85–101.


Read Full Post »