Machine Learning (ML) in cancer prognosis prediction helps the researcher to identify multiple known as well as candidate cancer diver genes
Curator and Reporter: Dr. Premalata Pati, Ph.D., Postdoc

Image Source: https://medicalxpress.com/news/2021-04-sum-mutations-cancer-genes-machine.html
Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low-risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) and Artificial Intelligence (AI) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions by predicting new algorithms.
In the majority of human cancers, heritable loss of gene function through cell division may be mediated as often by epigenetic as by genetic abnormalities. Epigenetic modification occurs through a process of interrelated changes in CpG island methylation and histone modifications. Candidate gene approaches of cell cycle, growth regulatory and apoptotic genes have shown epigenetic modification associated with loss of cognate proteins in sporadic pituitary tumors.
On 11th November 2020, researchers from the University of California, Irvine, has established the understanding of epigenetic mechanisms in tumorigenesis and publicized a previously undetected repertoire of cancer driver genes. The study was published in “Science Advances”
Researchers were able to identify novel tumor suppressor genes (TSGs) and oncogenes (OGs), particularly those with rare mutations by using a new prediction algorithm, called DORGE (Discovery of Oncogenes and tumor suppressor genes using Genetic and Epigenetic features) by integrating the most comprehensive collection of genetic and epigenetic data.
The senior author Wei Li, Ph.D., the Grace B. Bell chair and professor of bioinformatics in the Department of Biological Chemistry at the UCI School of Medicine said
Existing bioinformatics algorithms do not sufficiently leverage epigenetic features to predict cancer driver genes, even though epigenetic alterations are known to be associated with cancer driver genes.
The Study
This study demonstrated how cancer driver genes, predicted by DORGE, included both known cancer driver genes and novel driver genes not reported in current literature. In addition, researchers found that the novel dual-functional genes, which DORGE predicted as both TSGs and OGs, are highly enriched at hubs in protein-protein interaction (PPI) and drug/compound-gene networks.
Image Source: https://advances.sciencemag.org/content/suppl/2020/11/09/6.46.eaba6784.DC1/aba6784_SM.pdf
Prof. Li explained that the DORGE algorithm, successfully leveraged public data to discover the genetic and epigenetic alterations that play significant roles in cancer driver gene dysregulation and could be instrumental in improving cancer prevention, diagnosis and treatment efforts in the future.
Another new algorithmic prediction for the identification of cancer genes by Machine Learning has been carried out by a team of researchers at the Max Planck Institute for Molecular Genetics (MPIMG) in Berlin and the Institute of Computational Biology of Helmholtz Zentrum München combining a wide variety of data analyzed it with “Artificial Intelligence” and identified numerous cancer genes. They termed the algorithm as EMOGI (Explainable Multi-Omics Graph Integration). EMOGI can predict which genes cause cancer, even if their DNA sequence is not changed. This opens up new perspectives for targeted cancer therapy in personalized medicine and the development of biomarkers. The research was published in Nature Machine Intelligence on 12th April 2021.
In cancer, cells get out of control. They proliferate and push their way into tissues, destroying organs and thereby impairing essential vital functions. This unrestricted growth is usually induced by an accumulation of DNA changes in cancer genes—i.e. mutations in these genes that govern the development of the cell. But some cancers have only very few mutated genes, which means that other causes lead to the disease in these cases.
The Study

Image Source: https://static-content.springer.com/esm/art%3A10.1038%2Fs42256-021-00325-y/MediaObjects/42256_2021_325_MOESM1_ESM.pdf
The aim of the study has been represented in 4 main headings
- Additional targets for personalized medicine
- Better results by combination
- In search of hints for further studies
- Suitable for other types of diseases as well
The team was headed by Annalisa Marsico. The team used the algorithm to identify 165 previously unknown cancer genes. The sequences of these genes are not necessarily altered-apparently, already a dysregulation of these genes can lead to cancer. All of the newly identified genes interact closely with well-known cancer genes and be essential for the survival of tumor cells in cell culture experiments. The EMOGI can also explain the relationships in the cell’s machinery that make a gene a cancer gene. The software integrates tens of thousands of data sets generated from patient samples. These contain information about DNA methylations, the activity of individual genes and the interactions of proteins within cellular pathways in addition to sequence data with mutations. In these data, a deep-learning algorithm detects the patterns and molecular principles that lead to the development of cancer.
Marsico says
Ideally, we obtain a complete picture of all cancer genes at some point, which can have a different impact on cancer progression for different patients
Unlike traditional cancer treatments such as chemotherapy, personalized treatments are tailored to the exact type of tumor. “The goal is to choose the best treatment for each patient, the most effective treatment with the fewest side effects. In addition, molecular properties can be used to identify cancers that are already in the early stages.
Roman Schulte-Sasse, a doctoral student on Marsico’s team and the first author of the publication says
To date, most studies have focused on pathogenic changes in sequence, or cell blueprints, at the same time, it has recently become clear that epigenetic perturbation or dysregulation gene activity can also lead to cancer.
This is the reason, researchers merged sequence data that reflects blueprint failures with information that represents events in cells. Initially, scientists confirmed that mutations, or proliferation of genomic segments, were the leading cause of cancer. Then, in the second step, they identified gene candidates that are not very directly related to the genes that cause cancer.
Clues for future directions
The researcher’s new program adds a considerable number of new entries to the list of suspected cancer genes, which has grown to between 700 and 1,000 in recent years. It was only through a combination of bioinformatics analysis and the newest Artificial Intelligence (AI) methods that the researchers were able to track down the hidden genes.
Schulte-Sasse says “The interactions of proteins and genes can be mapped as a mathematical network, known as a graph.” He explained by giving an example of a railroad network; each station corresponds to a protein or gene, and each interaction among them is the train connection. With the help of deep learning—the very algorithms that have helped artificial intelligence make a breakthrough in recent years – the researchers were able to discover even those train connections that had previously gone unnoticed. Schulte-Sasse had the computer analyze tens of thousands of different network maps from 16 different cancer types, each containing between 12,000 and 19,000 data points.
Many more interesting details are hidden in the data. Patterns that are dependent on particular cancer and tissue were seen. The researchers were also observed this as evidence that tumors are triggered by different molecular mechanisms in different organs.
Marsico explains
The EMOGI program is not limited to cancer, the researchers emphasize. In theory, it can be used to integrate diverse sets of biological data and find patterns there. It could be useful to apply our algorithm for similarly complex diseases for which multifaceted data are collected and where genes play an important role. An example might be complex metabolic diseases such as diabetes.
Main Source
New prediction algorithm identifies previously undetected cancer driver genes
https://advances.sciencemag.org/content/6/46/eaba6784
Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms
https://www.nature.com/articles/s42256-021-00325-y#citeas
Other Related Articles published in this Open Access Online Scientific Journal include the following:
AI System Used to Detect Lung Cancer
Reporter: Irina Robu, PhD
https://pharmaceuticalintelligence.com/2019/06/28/ai-system-used-to-detect-lung-cancer/
Deep Learning extracts Histopathological Patterns and accurately discriminates 28 Cancer and 14 Normal Tissue Types: Pan-cancer Computational Histopathology Analysis
Reporter: Aviva Lev-Ari, PhD, RN
Evolution of the Human Cell Genome Biology Field of Gene Expression, Gene Regulation, Gene Regulatory Networks and Application of Machine Learning Algorithms in Large-Scale Biological Data Analysis
Curator & Reporter: Aviva Lev-Ari, PhD, RN
Cancer detection and therapeutics
Curator: Larry H. Bernstein, MD, FCAP
https://pharmaceuticalintelligence.com/2016/05/02/cancer-detection-and-therapeutics/
Free Bio-IT World Webinar: Machine Learning to Detect Cancer Variants
Reporter: Stephen J. Williams, PhD
Artificial Intelligence: Genomics & Cancer
https://pharmaceuticalintelligence.com/ai-in-genomics-cancer/
Premalata Pati, PhD, PostDoc in Biological Sciences, Medical Text Analysis with Machine Learning
Leave a Reply