The Cancer Genome Atlas (TCGA) | Leaders in Pharmaceutical Business Intelligence Group, LLC, Doing Business As LPBI Group, Newton, MA

Posts Tagged ‘The Cancer Genome Atlas (TCGA)’

New Potential for Presonalized Medicine

Posted in Big Data, Cancer and Current Therapeutics, Cancer Informatics, Intelligent Information Systems, tagged Dialogue for Reverse Engineering Assessments and Methods (DREAM), head and neck squamous cell carcinoma (HNSCC), International Cancer Genome Consortium (ICGC), intratumor heterogeneity, Minimum Event Distance for Intra-tumour Copy-number Comparisons (MEDICC), mutant-allele tumor heterogeneity (MATH), The Cancer Genome Atlas (TCGA), treatment projected prognosis on November 20, 2015| Leave a Comment »

New Potential for Presonalized Medicine

Larry H. Bernstein, MD, FCAP, Curator

LPBI

Updated 11/22/2015

New Tool to Identify Tumor Heterogeneity Could Help Pave Way for Personalized Cancer Therapies and Help Pathologists Add Value for Oncologists

November 20, 2015

Ohio State University study shows correlation between genetic variability among cancer cells within tumors and the survival of patients with head-and-neck cancers

Anatomic pathologists and clinical laboratories may gain a tool to identify tumorheterogeneity. This would enable them to ultimately guide personalized cancer therapies if a new method for measuring genetic variability within a tumor and predicting outcomes is confirmed in future studies.

Scientists Seek Cause of Resistance to Cancer Treatment

The new tool was dubbed “MATH” by researchers at The Ohio State University Comprehensive Cancer Center–Arthur G. James Cancer Hospital and Richard J. Solove Research Institute (OSUCCC–James). MATH is the scoring method they developed and stands for mutant-allele tumor heterogeneity. MATH was used to measure the genetic variability among cancer cells within tumors from 305 patients with head and neck squamous cell carcinoma (HNSCC), treated at multiple institutions, from The Cancer Genome Atlas.

In announcing the study results, OSUCCC-James stated that cancers that showed high genetic variability—called “intra-tumor heterogeneity”—correlated with lower patient survival.

James Rocco, MD, PhD, Professor in the Department of Otolaryngology-Head and Neck Surgery at The Ohio State University Wexner Medical Center, led the research team that developed a new method for measuring genetic variability within a tumor. The team showed that high MATH (mutant-allele tumor heterogeneity) scores correlated to lower patient survival. the team used MATH values “to document a relation between intra-tumor heterogeneity and overall survival in any type of cancer.”

Genetic Variability Linked with Treatment Failure

Their findings were published in the February 2015 issue of the journal PLOS Medicine.

“Genetic variability within tumors is likely why people fail treatment,” Rocco said in the statement. “In patients who have high heterogeneity tumors it is likely that there are several clusters of underlying mutations—in the same tumor—driving the cancer. So their tumors are likely to have some cells that are already resistant to any particular therapy.”

Medical News Today reported that each 10% increase in MATH score corresponded to an 8.8% increased likelihood of death.

“Our retrospective analysis showed that patients with high heterogeneity tumors were more than twice as likely to die compared to patients with low heterogeneity tumors,” Rocco told Medical News Today. “This type of information could refine the dialog about how we tackle cancer by helping us predict a patient’s treatment success and justify clinical decisions based on the unique makeup of a patient’s tumor.”

MATH Scores of Tumor Heterogeneity in Clinical Settings to Guide Diagnostics

Until now, oncologists have been reluctant to use “tumor heterogeneity to guide clinical care decisions or assess disease prognosis because there is no single, easy-to-implement method of doing so in clinical practice,” reported the OSUCCC-James statement. The MATH score, however, overcomes that issue since it can be computed from whole-exome sequencing data obtained from a single formalin-fixed, paraffin-embedded tumor sample.

It is pathologists who take tumor tissues and produce formalin-fixed, paraffin-embedded samples in their histology laboratories. Thus, as further clinical studies confirm that the use of the MATH tool can produce useful diagnostic and prognostic information for oncologists, pathologist will be perfectly positioned to add MATH to their menu of pathology services.

In a guest editorial in PLOS Medicine , Andrew H. Beck, MD, PhD, of Beth Israel Deaconess Medical Center and Harvard Medical School in Boston, pointed out that Rocco’s MATH score “approach may be more easily translated into clinical use, as compared with approaches requiring multiregion sampling and more complex computational algorithms for the assessment of intratumoral heterogeneity.”

Beck also discussed the important role large sets of cancer samples have in cancer research and in the development of improved personalized therapies for the disease. He observed that open access to large-scale datasets from large populations of cancer patients is “critically important” for devising computational methods for using cancer heterogeneity in clinical settings during the diagnostic process.

“The continuing generation of high-quality, open-access Omics datasets from large populations of cancer patients will be critically important to enable the development of computational methods to translate knowledge of cancer heterogeneity into new diagnostics and improved clinical outcomes for cancer patients,” Beck wrote.

Researchers Suggest MATH Should Be Biomarker for Treatment Decision-making

While their results must be confirmed in further studies and with other cancers, Rocco’s team believes their scoring method holds great promise as prognostic tool.

“These findings suggest that MATH should be considered a biomarker for survival in HNSCC and other tumor types, and raise the possibility that clinicians could use MATH values to decide on the best treatment for individual patients and to choose patients for inclusion in clinical trials,” they wrote in PLOS Medicine. Pathologists, particularly in academic pathology departments, might want to track the ongoing development of MATH and how it could be used in patient care.

—Andrea Downing Peck

Related Information:

Intra-tumor Genetic Heterogeneity and Mortality in Head and Neck Cancer: Analysis of Data from The Cancer Genome Atlas

New Genomics Tool Could Help Predict Tumor Aggressiveness, Treatment Outcomes

OMICs Data Analysis

Functional, or systems, analysis of OMICs data aims to identify the pathways and biological processes most affected in the datasets and point out the key genes as potential biomarkers and drug targets.

http://lsresearch.thomsonreuters.com/pages/challenges/4/omics-data-analysis

Functional analysis is widely applicable in personalized and translational medicine, drug repositioning, and patient stratification, among many practice areas.

OMICs data analysis relies on quality knowledge base of pathway maps, protein interactions, functional ontologies, gene-disease associations, for example, and advanced analytical algorithms for enrichment, interactome and network analysis. Thomson Reuters offers arguably the most comprehensive systems analysis platform on the market. Highlights:

High-quality systems biology content (pathway maps, protein interactions, disease biomarkers, functional ontology) will ensure accurate and reliable analysis results. Biomarkers Graphic >>
Validated data analysis algorithms combine power of statistical approaches and comprehensiveness of functional methods for generation of strong hypotheses for your research. Network Page Graphic >>
Easy-to-use interfaces, convenient exports and one-click workflows will get you up to speed with systems biology in no time. Innovative Tools Graphic >>
Experienced data analysts and disease biologists will help your team with analysis and interpretation of your data bringing an ultimate pathway analysis expertise to your side. OMICs Report Graphic >>

OMICs Data Analysis Table

Open Access to Large Scale Datasets Is Needed to Translate Knowledge of Cancer Heterogeneity into Better Patient Outcomes

Andrew H. Beck

Published: Feb 24, 2015 DOI: http://dx.doi.org:/10.1371/journal.pmed.1001794

Citation: Beck AH (2015) Open Access to Large Scale Datasets Is Needed to Translate Knowledge of Cancer Heterogeneity into Better Patient Outcomes.
PLoS Med 12(2): e1001794. http://dx.doi.org:/10.1371/journal.pmed.1001794

Cancer is a heterogeneous disease, which is comprised of a collection of diseases traditionally categorized by tissue type of origin. A distinct set of etiologic causes, treatments, and prognoses are associated with different cancers, and even within a given tissue type, cancer shows significant variability in molecular and clinical features across patients. This interpatient heterogeneity is a major rationale for large-scale research efforts (such as The Cancer Genome Atlas [TCGA] and the International Cancer Genome Consortium [ICGC]) to comprehensively profile the molecular landscape of patient cancer samples across all major cancers [1,2]. These efforts have been bolstered by the recent development of new genomic [3] and computational [4] technologies to enable increasingly detailed and comprehensive analyses of the molecular landscape of solid cancers. It is hoped that the comprehensive molecular characterization of large sets of cancer samples will lead to the identification of new therapeutic targets and the development of improved personalized therapies for cancer patients.

A major challenge in cancer therapy is the development of resistance to molecularly targeted therapies. Although targeted therapies may show initial benefit in the subset of patients carrying a targeted molecular alteration, most patients will nevertheless go on to develop resistance for most advanced solid cancers. Identifying and overcoming drug resistance represents one of the most significant challenges facing cancer researchers today [5]. It is increasingly recognized that cancer is not only a heterogeneous disease across patients but also a heterogeneous disease within individual patients, with different regions of a tumor showing different molecular features at the DNA, RNA, and protein levels [6–9]. This intratumoral molecular heterogeneity is hypothesized to be a major cause of drug resistance and treatment failure in cancer [10]. However, the clinical significance of intratumoral molecular heterogeneity is not yet well-defined, and assessment of intratumoral molecular heterogeneity is not currently used in clinical cancer medicine for assessing disease prognosis or guiding therapy. Two recent research articles published in PLOS Medicine show the potential clinical utility of measuring intratumoral genetic heterogeneity in clinical cancer samples.

In one, James Brenton, Florian Markowetz, and colleagues applied the Minimum Event Distance for Intra-tumour Copy-number Comparisons (MEDICC) algorithm they recently developed for phylogenetic quantification of intratumoral genetic heterogeneity from multiregion DNA copy number profiling data [11] to predict treatment resistance in high-grade serous ovarian cancer [12]. Their analysis suggests that multiregion tumor sampling, DNA copy number profiling, and quantification of intratumoral genetic heterogeneity with the MEDICC algorithm could be a useful approach for predicting patient survival in ovarian cancer, in which higher levels of heterogeneity associated with decreased survival. This study provides data to support the long-standing hypothesis regarding treatment resistance and intratumoral genetic heterogeneity [10]. Although these results are promising, the developed approach requires sampling multiple distinct regions of tumor, which would be more expensive and complex than molecular profiling from a single tissue sample. It is not yet known how much tumor sampling will be required to adequately quantify intratumoral heterogeneity in the clinic or if measuring intratumoral heterogeneity from multiple tumor samples will outperform other molecular approaches (e.g., prognostic expression signatures [13,14]) for predicting response to therapy in ovarian cancer. These are important research questions that will need to be answered prior to clinical translation.

The second study comes from James Rocco and colleagues [15]. Previously, these investigators used a publicly available data set of whole exome sequencing data in head and neck squamous cell carcinoma (HNSCC) from Stransky et al. [16] to develop a simple quantitative measure of intratumoral heterogeneity (mutant-allele tumor heterogeneity [MATH]) and showed that MATH scores were higher in poor outcome classes of HNSCC [17]. In the current study, the authors used publicly available whole exome sequencing data provided by TCGA and showed that the MATH score is associated with prognosis in HNSCC and contributes additional prognostic information beyond that provided by traditional clinical and molecular features. Since the MATH score can be computed from whole exome sequencing data obtained from a single tumor sample (which is a data type that can be obtained from formalin-fixed, paraffin-embedded tumor tissue, as is routinely collected in pathology laboratories [18]), this approach may be more easily translated into clinical use, as compared with approaches requiring multiregion sampling and more complex computational algorithms for the assessment of intratumoral heterogeneity. Nonetheless, establishing the utility of the MATH score as an effective prognostic and/or predictive biomarker in HNSCC will require additional studies of the MATH score on well-controlled clinical cohorts comprised of homogeneously treated patients with tumors at specific head and neck anatomic locations. It is important to note that the development and application of MATH for assessing prognosis in HNSCC was based entirely on the analysis of publically available clinically annotated whole exome sequencing data, which demonstrates the value in making these data open to the community.

The continuing generation of high-quality, open-access Omics data sets from large populations of cancer patients will be critically important to enable the development of computational methods to translate knowledge of cancer heterogeneity into new diagnostics and improved clinical outcomes for cancer patients. As one step towards this goal, the DREAM (Dialogue for Reverse Engineering Assessments and Methods) consortium will use open innovation crowd sourcing to identify top-performing computational methods for inferring genetic heterogeneity from next-generation sequencing data provided by a large multi-institutional community of cancer genomics projects, including the ICGC and TCGA [19]. If successful, this open innovation competition may identify a set of best-in-class methods for measuring intratumoral genetic heterogeneity in cancer.

In parallel with these advances in computational methods for inferring intratumoral heterogeneity from genomics data, genomics technologies for measuring intratumoral heterogeneity at increasingly fine levels of granularity continue to improve. For example, recent advances in single-cell sequencing of DNA have provided detailed portraits of intratumoral genetic heterogeneity and clonal evolution in cancer [20,21], and recent advances in single-cell RNA sequencing [22], in situ RNA sequencing [23,24], and highly multiplexed next-generation immunohistochemistry [25–28] enable characterization of intratumoral heterogeneity in gene expression at a single cell level with subcellular resolution. Thus, there are now many options—both molecular and computational—for measuring and analyzing intratumoral molecular heterogeneity from clinical cancer samples.

Establishing the clinical utility of these new approaches for measuring intratumoral molecular heterogeneity will require applying these methods to large sets of archival tumor samples from randomized trials of cancer therapeutics [29] and high-quality prospective observational studies [30]. To maximize the value of the data that would be produced from such an undertaking, it is critical that infrastructure be created and supported to enable sharing of the Omics and clinical data with a large community of cancer researchers and data scientists. Ensuring open access to high-quality datasets will ensure that the largest possible community of researchers is able to address the most important problems in cancer medicine today. And in generating and sharing these data widely, we will massively increase our chances of effectively translating knowledge of intratumoral heterogeneity into meaningful advances for cancer patients.

References

1.Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, et al. (2010) International network of cancer genome projects. Nature 464: 993–998. doi: 10.1038/nature08987. pmid:20393554
2.Garraway LA, Lander ES (2013) Lessons from the cancer genome. Cell 153: 17–37. doi: 10.1016/j.cell.2013.03.002. pmid:23540688
3.Meyerson M, Gabriel S, Getz G (2010) Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 11: 685–696. doi: 10.1038/nrg2841. pmid:20847746
4.Ding L, Wendl MC, McMichael JF, Raphael BJ (2014) Expanding the computational toolbox for mining cancer genomes. Nat Rev Genet 15: 556–570. doi: 10.1038/nrg3767. pmid:25001846
5.Garraway LA, Jänne PA (2012) Circumventing cancer drug resistance in the era of personalized medicine. Cancer Discov 2: 214–226. doi: 10.1158/2159-8290.CD-12-0012. pmid:22585993
6.Burrell RA, McGranahan N, Bartek J, Swanton C (2013) The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501: 338–345. doi: 10.1038/nature12625. pmid:24048066
7.Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, et al. (2012) Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 366: 883–892. doi: 10.1056/NEJMoa1113205. pmid:22397650
8.Bashashati A, Ha G, Tone A, Ding J, Prentice LM, et al. (2013) Distinct evolutionary trajectories of primary high-grade serous ovarian cancers revealed through spatial mutational profiling. J Pathol 231: 21–34. doi: 10.1002/path.4230. pmid:23780408
9.De Bruin EC, McGranahan N, Mitter R, Salm M, Wedge DC, et al. (2014) Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science (80-) 346: 251–256. doi: 10.1126/science.1253462
10.Burrell RA, Swanton C (2014) Tumour heterogeneity and the evolution of polyclonal drug resistance. Mol Oncol 8: 1095–1111. doi: 10.1016/j.molonc.2014.06.005. pmid:25087573

….. more

New Genomics Tool Could Help Predict Tumor Aggressiveness, Treatment Outcomes

APRIL 16, 2015

OSUCCC – James researchers Edmund Mroz, PhD, and James Rocco, MD, PhD, developed the MATH method.

http://cancer.osu.edu/news-and-media/news/new-genomics-tool-could-help-predict-tumor-aggressiveness-treatment-outcomes

COLUMBUS, Ohio — A new method for measuring genetic variability within a tumor might one day help doctors identify patients with aggressive cancers that are more likely to resist therapy, according to a study led by researchers now at The Ohio State University Comprehensive Cancer Center – Arthur G. James Cancer Hospital and Richard J. Solove Research Institute (OSUCCC – James).

Researchers used a new scoring method they developed called MATH (mutant-allele tumor heterogeneity) to measure the genetic variability among cancer cells within tumors from 305 patients with head and neck cancer. High MATH scores corresponded to tumors with many differences among the gene mutations present in different cancer cells.

Cancers that showed high genetic variability – called “intra-tumor heterogeneity” – correlated with lower patient survival. If prospective studies verify the findings, MATH scores could help identify the most effective treatment for patients and predict a patient’s prognosis.

Researchers have long hypothesized that multiple sub-populations of mutated cells within a single cancer lead to worse clinical outcomes; however, oncologists do not use tumor heterogeneity to guide clinical care decisions or assess disease prognosis because there is no single, easy-to-implement method of doing so in clinical practice.

Intra-tumor Genetic Heterogeneity and Mortality in Head and Neck Cancer: Analysis of Data from The Cancer Genome Atlas

Edmund A. Mroz, Aaron M. Tward, Rebecca J. Hammon, Yin Ren, James W. Rocco

Published: Feb 10, 2015 DOI: http://dx.doi.org:/10.1371/journal.pmed.1001786

9 Jun 2015: The PLOS Medicine Staff (2015) Correction: Intra-tumor Genetic Heterogeneity and Mortality in Head and Neck Cancer: Analysis of Data from The Cancer Genome Atlas. PLoS Med 12(6): e1001844.
http://dx.doi.org:/10.1371/journal.pmed.1001844 View correction

Although the involvement of intra-tumor genetic heterogeneity in tumor progression, treatment resistance, and metastasis is established, genetic heterogeneity is seldom examined in clinical trials or practice. Many studies of heterogeneity have had prespecified markers for tumor subpopulations, limiting their generalizability, or have involved massive efforts such as separate analysis of hundreds of individual cells, limiting their clinical use. We recently developed a general measure of intra-tumor genetic heterogeneity based on whole-exome sequencing (WES) of bulk tumor DNA, called mutant-allele tumor heterogeneity (MATH). Here, we examine data collected as part of a large, multi-institutional study to validate this measure and determine whether intra-tumor heterogeneity is itself related to mortality.

Methods and Findings

Clinical and WES data were obtained from The Cancer Genome Atlas in October 2013 for 305 patients with head and neck squamous cell carcinoma (HNSCC), from 14 institutions. Initial pathologic diagnoses were between 1992 and 2011 (median, 2008). Median time to death for 131 deceased patients was 14 mo; median follow-up of living patients was 22 mo. Tumor MATH values were calculated from WES results. Despite the multiple head and neck tumor subsites and the variety of treatments, we found in this retrospective analysis a substantial relation of high MATH values to decreased overall survival (Cox proportional hazards analysis: hazard ratio for high/low heterogeneity, 2.2; 95% CI 1.4 to 3.3). This relation of intra-tumor heterogeneity to survival was not due to intra-tumor heterogeneity’s associations with other clinical or molecular characteristics, including age, human papillomavirus status, tumor grade and TP53 mutation, and N classification. MATH improved prognostication over that provided by traditional clinical and molecular characteristics, maintained a significant relation to survival in multivariate analyses, and distinguished outcomes among patients having oral-cavity or laryngeal cancers even when standard disease staging was taken into account. Prospective studies, however, will be required before MATH can be used prognostically in clinical trials or practice. Such studies will need to examine homogeneously treated HNSCC at specific head and neck subsites, and determine the influence of cancer therapy on MATH values. Analysis of MATH and outcome in human-papillomavirus-positive oropharyngeal squamous cell carcinoma is particularly needed.

Conclusions

To our knowledge this study is the first to combine data from hundreds of patients, treated at multiple institutions, to document a relation between intra-tumor heterogeneity and overall survival in any type of cancer. We suggest applying the simply calculated MATH metric of heterogeneity to prospective studies of HNSCC and other tumor types.

Editors’ Summary

Background

Normally, the cells in human tissues and organs only reproduce (a process called cell division) when new cells are needed for growth or to repair damaged tissues. But sometimes a cell somewhere in the body acquires a genetic change (mutation) that disrupts the control of cell division and allows the cell to grow continuously. As the mutated cell grows and divides, it accumulates additional mutations that allow it to grow even faster and eventually from a lump, or tumor (cancer). Other mutations subsequently allow the tumor to spread around the body (metastasize) and destroy healthy tissues. Tumors can arise anywhere in the body—there are more than 200 different types of cancer—and about one in three people will develop some form of cancer during their lifetime. Many cancers can now be successfully treated, however, and people often survive for years after a diagnosis of cancer before, eventually, dying from another disease.

Why Was This Study Done?

The gradual acquisition of mutations by tumor cells leads to the formation of subpopulations of cells, each carrying a different set of mutations. This “intra-tumor heterogeneity” can produce tumor subclones that grow particularly quickly, that metastasize aggressively, or that are resistant to cancer treatments. Consequently, researchers have hypothesized that high intra-tumor heterogeneity leads to worse clinical outcomes and have suggested that a simple measure of this heterogeneity would be a useful addition to the cancer staging system currently used by clinicians for predicting the likely outcome (prognosis) of patients with cancer. Here, the researchers investigate whether a measure of intra-tumor heterogeneity called “mutant-allele tumor heterogeneity” (MATH) is related to mortality (death) among patients with head and neck squamous cell carcinoma (HNSCC)—cancers that begin in the cells that line the moist surfaces inside the head and neck, such as cancers of the mouth and the larynx (voice box). MATH is based on whole-exome sequencing (WES) of tumor and matched normal DNA. WES uses powerful DNA-sequencing systems to determine the variations of all the coding regions (exons) of the known genes in the human genome (genetic blueprint).

What Did the Researchers Do and Find?

The researchers obtained clinical and WES data for 305 patients who were treated in 14 institutions, primarily in the US, after diagnosis of HNSCC from The Cancer Genome Atlas, a catalog established by the US National Institutes of Health to map the key genomic changes in major types and subtypes of cancer. They calculated tumor MATH values for the patients from their WES results and retrospectively analyzed whether there was an association between the MATH values and patient survival. Despite the patients having tumors at various subsites and being given different treatments, every 10% increase in MATH value corresponded to an 8.8% increased risk (hazard) of death. Using a previously defined MATH-value cutoff to distinguish high- from low-heterogeneity tumors, compared to patients with low-heterogeneity tumors, patients with high-heterogeneity tumors were more than twice as likely to die (a hazard ratio of 2.2). Other statistical analyses indicated that MATH provided improved prognostic information compared to that provided by established clinical and molecular characteristics and human papillomavirus (HPV) status (HPV-positive HNSCC at some subsites has a better prognosis than HPV-negative HNSCC). In particular, MATH provided prognostic information beyond that provided by standard disease staging among patients with mouth or laryngeal cancers.

What Do These Findings Mean?

By using data from more than 300 patients treated at multiple institutions, these findings validate the use of MATH as a measure of intra-tumor heterogeneity in HNSCC. Moreover, they provide one of the first large-scale demonstrations that intra-tumor heterogeneity is clinically important in the prognosis of any type of cancer. Before the MATH metric can be used in clinical trials or in clinical practice as a prognostic tool, its ability to predict outcomes needs to be tested in prospective studies that examine the relation between MATH and the outcomes of patients with identically treated HNSCC at specific head and neck subsites, that evaluate the use of MATH for prognostication in other tumor types, and that determine the influence of cancer treatments on MATH values. Nevertheless, these findings suggest that MATH should be considered as a biomarker for survival in HNSCC and other tumor types, and raise the possibility that clinicians could use MATH values to decide on the best treatment for individual patients and to choose patients for inclusion in clinical trials.

Additional Information

Please access these websites via the online version of this summary athttp://dx.doi.org/10.1371/journal.pmed.1001786.

The US National Cancer Institute (NCI) provides information about cancer and how it develops and about head and neck cancer (in English and Spanish)
Cancer Research UK, a not-for-profit organization, provides general information aboutcancer and how it develops, and detailed information about head and neck cancer; the Merseyside Regional Head and Neck Cancer Centre provides patient stories about HNSCC
Wikipedia provides information about tumor heterogeneity, and about whole-exome sequencing (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
Information about The Cancer Genome Atlas is available
A PLOS Blog entry by Jessica Wapner explains more about MATH

SJ Williams, PhD

There are two very important criteria which is located in the papers: First these are data from WES sequencing form the TCGA database therefore the method makes an assumption on INTRA tumoral heterogeneity as there the algorithm test cases were based on whole tumor and not compared to temporal or spatial distribution (a simple solution would be to compare the sequencing results from Dr. Sawyers studies with this algorithm). The model also assumes that the distribution of loci mutants predicts the temporal accumulation of mutants during a clonal evolution. Secondly the authors segregate out HPV positive and negative Head and neck cancers and curious why this was the observed case: is this algorithm good at analyzing the clonal evolution of cancers containing indels or just point mutants. Interesting if they do a larger prospective study where they compare their algorithm versus the multi-core biopsy method. The editor is correct is justifying the need for further larger studies, especially for tumors like lung which is hard or dangerous to biopsy.

Mood Regulation Subject to Mixed Serotonin Signals
http://www.genengnews.com/gen-news-highlights/mood-regulation-subject-to-mixed-serotonin-signals/81252009/

Gene drive, an emerging technology for ecosystem management, is being considered for a range of applications. For example, it could be used to render mosquito populations unable to transmit malaria. Prominent gene-drive researchers are calling for open, well-informed discussion of the technology, which has far-reaching implications for the shared environment, well in advance of any field tests.[Columbia University Department of Psychiatry]

A new study indicates different serotonin-producing brain regions can have opposing effects on emotional behaviors. According to this study, two brain regions in particular, the dorsal raphe nucleus (DRN) and the median raphe nucleus (MRN), appear to have a yin-and-yang relationship when it comes to mood regulation.

Specifically, one region’s serotonergic activity can offset the other region’s serotonergic activity. This finding, which emerged from pharmacogenetic research conducted at Columbia University, provides new insights into the development of mood disorders and may aid in designing improved therapies.

The Columbia University research effort was led by Mark S. Ansorge, Ph.D. “Our study breaks with the simplistic view that ‘more is good and less is bad,’ when it comes to serotonin for mood regulation,” he said. “Rather, it tells us that a more nuanced view is necessary.”

The study’s details appeared November 19 in Cell Reports, in an article entitled, “Activity of Raphé Serotonergic Neurons Controls Emotional Behaviors.” The article noted that even though serotonin signaling has a well-established role in mood regulation, the causal relationships between serotonergic neuronal activity and behavior remain unclear.

To explore these relationships, Dr. Ansorge’s team used a technique called pharmacogenetics to control the activity of serotonergic neurons in the DRN and MRN in both normal mice and in a mouse model of depression- and anxiety-like behavior. (The model was created by giving mice the drug fluoxetine shortly after birth, which produces long-lasting behavioral changes.)

“[Selectively] increasing serotonergic neuronal activity in wild-type mice is anxiogenic and reduces floating in the forced-swim test, whereas inhibition has no effect on the same measures,” wrote the authors of the Cell Reports article. “In a developmental mouse model of altered emotional behavior, increased anxiety and depression-like behaviors correlate with reduced dorsal raphé and increased median raphé serotonergic activity. These mice display blunted responses to serotonergic stimulation and behavioral rescues through serotonergic inhibition.”

In addition, the researchers identified opposing consequences of dorsal versus median raphé serotonergic neuron inhibition on floating behavior. This observation, the researchers surmised, could mean that median raphé hyperactivity increases anxiety, whereas a low dorsal/median raphé serotonergic activity ratio increases depression-like behavior.

Activity of Raphé Serotonergic Neurons Controls Emotional Behaviors

http://www.cell.com/cell-reports/abstract/S2211-1247(15)01250-4

Anne Teissier, Alexei Chemiakine, Benjamin Inbar, Sneha Bagchi, Russell S. Ray, et al. http://dx.doi.org/10.1016/j.celrep.2015.10.061

Figure thumbnail fx1

http://www.cell.com/cms/attachment/2040587978/2054165588/fx1.jpg

•Increasing 5-HT neuronal activity increases anxiety-like behavior
•Low DR/MR 5-HTergic activity correlates with altered emotional behavior in PNFLX mice
•Reducing 5-HT neuronal activity normalizes emotional behavior in PNFLX mice
•MR and DR 5-HT neuronal activity exert opposing consequences on floating behavior

Despite the well-established role of serotonin signaling in mood regulation, causal relationships between serotonergic neuronal activity and behavior remain poorly understood. Using a pharmacogenetic approach, we find that selectively increasing serotonergic neuronal activity in wild-type mice is anxiogenic and reduces floating in the forced-swim test, whereas inhibition has no effect on the same measures. In a developmental mouse model of altered emotional behavior, increased anxiety and depression-like behaviors correlate with reduced dorsal raphé and increased median raphé serotonergic activity. These mice display blunted responses to serotonergic stimulation and behavioral rescues through serotonergic inhibition. Furthermore, we identify opposing consequences of dorsal versus median raphé serotonergic neuron inhibition on floating behavior, together suggesting that median raphé hyperactivity increases anxiety, whereas a low dorsal/median raphé serotonergic activity ratio increases depression-like behavior. Thus, we find a critical role of serotonergic neuronal activity in emotional regulation and uncover opposing roles of median and dorsal raphé function.

Read Full Post »

Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for Non-Small Cell Lung Cancer

Posted in Artificial Intelligence in CANCER, Biological Networks, Biomarkers & Medical Diagnostics, CANCER BIOLOGY & Innovations in Cancer Therapy, Cancer Prevention: Research & Programs, ChatGPT, GPT-4, Computational Biology/Systems and Bioinformatics, Gene Regulation and Evolution, Genome Biology, Health Economics and Outcomes Research, Machine Learning, Personalized and Precision Medicine & Genomic Research, tagged antitumor therapy, Artificial intelligence, Bioinformatics, Cancer - General, Cell Biology, Computational Biology/Systems and Bioinformatics, DNA, DNA Sequencing, driver mutations, gene expression, genetics, genome, genomics, GPT, informatics, InfraNodus, knowledge graph, Lung cancer, mutation, mutational analysis, National Institutes of Health, Non-small cell lung cancer, Personalized medicine, Proceedings of the National Academy of Sciences of the United States of America, Prostate cancer, research, smoking, The Cancer Genome Atlas (TCGA), The Clinical Lung Cancer Genome Project (CLCGP), whole exome sequencing on September 5, 2014| Leave a Comment »

Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for Non-Small Cell Lung Cancer

Curator, Writer: Stephen J. Williams, Ph.D.

UPDATED 08/11/2025

Human Curation vs. AI tools: ChatGPT & Knowledge Graphs [KG] Output: A case study for the following original curation:

Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for Non-Small Cell Lung Cancer

https://pharmaceuticalintelligence.com/2014/09/05/multiple-lung-cancer-genomic-projects-suggest-new-targets-research-directions-for-non-small-cell-lung-cancer/

This update was performed by the following methods:

A. GPT 5 Text analysis and Reasoning

B. Insertion of Knowledge Graph on topic Curation of Genomic Analysis from Non Small Cell Lung Cancer Studies from Nodus Labs using InfraNodus software

C. Domain Knowledge Expert evaluation of the Update outcomes

This article has the following Structure:

Part A: Introduction to LLM, Knowledge Graph software InfraNodus, ChatGPT5 and Background Information on curated material for Test Case

Part B: InfraNodus Analysis of manual curation and Knowledge Graph Creation

Part C: Chat GPT 5 Analysis of Manually Curated Material

Part D: Curation entitled Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for Non-Small Cell Lung Cancer originally published on 09/05/2014

Results of Article Update with GPT 5

1. GPT5 alone was not able to understand the goal of the article, namely to determine knowledge gaps in a particular research area involving 5 genomic studies on lung cancer patients

2. GPT5 alone was not able to group concepts or comonalities between biological pathways unless supplied with a manually curated list of KEGG pathways from a list of mutated genes. However this precluded any effect that fusion proteins had on the analysis and so GPT5 would only concentrate on mutated genes commonly found in literature

3. GPT was not able to access some of the open Access databases like NCBI Gene Ontology database

Results of Article Update with KnowledgeGraph presentation to GPT 5

4. As the Knowledge Graph understood the importance of fusion proteins and transversions, the knowledgegraph augmented the GPT analysis and so enriched the known pathways as well as could correctly identify the less represented pathways in the knowledge graph

5. This led to the identification of many novel signaling pathways not identified in the original analysis, and was able to perform this task with ease and speed

6. GPT with InfraNodus Analysis was able to propose pertinent questions for future research (the goal of the original curation) such as:

How does the interaction between [[EGFR]] mutations and sex-specific gene alterations, including [[RBM10]], influence treatment outcomes in lung adenocarcinoma?
How does the intersection of mutational patterns from smoking influence pathway activation in NSCLC, and can identifying these interactions improve targeted therapy development?

Novelty in comparison to Original article published on 09/05/2014

7. it appears that manual curation is necessary to assist in the building of relevant knowledge graphs in the biomedical fields to augment generative AI analysis

8. by itself, generative AI is not optimized for inference of higher concepts from biomedical text, and therefore, at this point, requires the input from human curators developing domain-specific knowledge graphs

9. The combination of ChatGPT5 and Knowledge graphs of this manually curated biomedical text added a further layer of complexity of gaps of knowledge not seen in the original curations including the need to study noncanonical signaling pathways like WNT and Hedgehog in smoker versus nonsmoker cohorts of lung cancer patients

A Comparison of Manual Expert-Curative and an LLM-based analysis of Knowledge Gaps in Non Small Lung Cancer Whole Exome Sequencing Studies and a Use Case Example of Chat GPT 5

Part A: Introduction to LLM, Knowledge Graph software InfraNodus, ChatGPT5 and Background Information on curated material for Test Case

The development of Large Language Models (LLMs), together with development of knowledge graphs, have facilitated the ability to analyze text and determine the relationships among the various concepts contained within series of texts. These concepts and relationships can be visualized, and new insights inferred from these visualizations. As a result, this type of analysis suggests new directions and lines of research.

Alternatively, these types of visualizations can also reveal gaps in knowledge which should be addressed. A new type of LLM and visualization tools have been developed to understand the gaps in knowledge in biomedical text.

Nodus Labs InfrNodus AI Knowledge Graph Software Tools Allow Text Relationship Visualization and Integrated AI Functionality

Infranodus makes knowlegde graphs from text and then is able to visualize the relationships between concepts (or nodes). In doing so, the tool also highlights the various knowledge gaps (or large differences between nodes) which can be used to investigate new hypotheses and research directions of previously univestigated relationships between concepts. This generates new research questions, in which these gaps can be used as prompts in the software’s integrated AI tool. The AI tool, much like a GPT, returns recommendations for research to be conducted in the area.

https://infranodus.com/

In addition, the InfraNodus software can detect if text is too biased on a particular concept or conclusion, and using a GPT3 or GPT4, can determine if the nodes are too dispersed and will recommend which gaps should be focused on.

The software can upload any biomedical text in various formats

A full demonstration is on their website but a good summary is found on their Youtube site at

https://www.youtube.com/watch?v=wCEhiIJsmrg

A couple of use cases include

AI-Assisted Thinking & Insight Generation:
AI Writing & Creative Thinking
Mind Mapping
Brainstorming
Knowledge Graphs & Personal Notes
Introspection & Self-Reflection
Marketing & Consulting:
Market Research
Customer Reviews, Voice of Customer
Search Engine Optimization
Qualitative Research & Thematic Analysis
Innovation & Trend Research
Research, Text Visualization & Analysis:
Text Network Analysis
Text Mining & Topic Modeling
Overview and Summarization

Previously we had manually curated and analyzed the knowledge gaps from a series of publications on whole exome sequencing of biopsied tumors from cohorts of non small lung cancer patients. This curation (from 2016) is seen in the lower half of this updated link below and I separated with a bar and highlighted in Yellow as Text for AI Analysis.

https://pharmaceuticalintelligence.com/2014/09/05/multiple-lung-cancer-genomic-projects-suggest-new-targets-research-directions-for-non-small-cell-lung-cancer/

A literature analysis of the driver mutations found in five NSLC exome sequencing projects:

Comprehensive genomic characterization of squamous cell lung cancers. Nature 2012, 489(7417):519-525.
A genomics-based classification of human lung tumors. Science translational medicine 2013, 5(209):209ra153.
Govindan R, Ding L, Griffith M, Subramanian J, Dees ND, Kanchi KL, Maher CA, Fulton R, Fulton L, Wallis J et al: Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 2012, 150(6):1121-1134.
Imielinski M, Berger AH, Hammerman PS, Hernandez B, Pugh TJ, Hodis E, Cho J, Suh J, Capelletti M, Sivachenko A et al: Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 2012, 150(6):1107-1120.
Peifer M, Fernandez-Cuesta L, Sos ML, George J, Seidel D, Kasper LH, Plenker D, Leenders F, Sun R, Zander T et al: Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nature genetics 2012, 44(10):1104-1110.

were performed.

The purpose of this analysis was to uncover biological functions related to the sets of mutated genes with limited research publications in the area of non small cell lung cancer. The identification of such biological functions would represent a gap in knowledge in this disease. In addition, this analysis attempted to find new lines of research or potential new biotargets to investigate for lung cancer therapy.

However this manual method is time consuming and may miss relationships not defined in a GO ontology or gene knowledgebases.

Therefore we turned to an AI-driven approach:

Using InfraNodus ability to develop a knowledge graph based on our curation and determine if the AI platform could infer knowledge gaps
Utilize Chat GPT5 to analyze the same curated set to determine if OpenAI analysis would lead to the similar analysis from curated material
Determine if combining a knowledge graph within GPT would lead to a higher level of analysis

See below (Part D) of this update for the curated studies which were included in this analysis and the text which was entered into both InfraNodus and Chat GPT5.

As a summary, it seems that manual curation is necessary to assist in the building of relevant knowledge graphs in the biomedical fields to augment generative AI analysis. In addition, it appears that , by itself, generative AI is not optimized for inference of higher concepts from biomedical text, and therefore, at this point, requires the input from human curators developing domain-specific knowledge graphs.

Part B. InfraNodus Analysis of manual curation and Knowledge Graph Creation

Methods:

Text of the curation was copied and directly pasted into the text analysis module of InfraNodus. There was no editing of words however genes in the curation were linked to their GeneCard entry. GeneCards is a database run by the Weizmann Institute. InfraNodus utilizes a combination of LLMs and its own GraphRAG system to provide insights from text analysis. While it leverages various models, including those from OpenAI and Anthropic, it’s not limited to a single LLM. Instead, InfraNodus integrates these models within its GraphRAG framework, which enhances their capabilities by adding a relational understanding of the context through a knowledge graph.

InfraNodus then autogenerates a knowledge graph and returns entities and relationships between entities. InfraNodus offers the opportunity to modify the knowledge graph however for this analysis we used the first graph InfraNodus generated. Inspection of this graph (as shown below) was deemed reasonable.

Results

The knowledge graph of the input text is shown below:

InfraNodus generated Knowledge Graph of 5 WES Non Smal Cell Lung Cancer studies involving smokers and non smokers

Four main concepts were returned: tumors, genes, literature, and mutations.

A snapshot of the Analysis window is given below. It should be noted that InfraNodus felt there needed to be more connections between Pathway and Mutational Patterns.

An InfraNodus reposrt with Knowlege Graph on Whole Exome Sequencing studies in NSCLC to determine mutational spectrum in smokers versus non smokers

Auto generated summary report

Context name: text_250808T0144

Created on: aug 7, 2025 9:47 pm

Last updated on: aug 7, 2025 10:10 pm

Main concepts:

[[tumors]], analysis, [[mutations]], identify, [[lung]], [[genes]]

Main topics:

Tumor Genomics: [[tumors]] [[lung]] reveal
Genetic Alterations: identify [[genes]] study
Pathway Analysis: analysis pathway literature
Mutation Patterns: [[mutations]] [[egfr]] [[rbm10]]

Structural gap (topics to connect):

Pathway Analysis: analysis pathway
Smoking Influence: mutational [[smoking]]

Topical connectors:

alk clinical [[egfr]] mutational pathway [[paper]] found key literature study [[genomic]] reveal [[transversion]]

Top relations / ngrams:

1) [[lung]] [[tumors]]

2) alk fusion

3) link function

4) eml alk

5) function [[gene_ontology]]

Modulary: 0.47

Relations:

InfraNodus identified 744 relations between entities (nodes)

A list of some of the more frequent are given here:

source	target	occurrences	weight	betweenness
[[lung]]	[[tumors]]	8	24	0.4676
analysis	pathway	5	12	0.2291
significantly	[[genes]]	5	9	0.1074
significantly	[[mutated]]	4	12	0.0281
[[mutated]]	[[genes]]	4	12	0.0847
[[transversion]]	high	3	12	0.0329
[[smoking]]	history	3	10	0.0352
study	identify	3	9	0.2051
mutational	pattern	3	9	0.0921
[[rbm10]]	[[mutations]]	3	8	0.1776
literature	analysis	3	7	0.2218
[[egfr]]	[[mutations]]	3	7	0.2139
[[transversion]]	group	3	7	0.0259
enriched	cohort	3	6	0.0219
[[whole_exome_sequencing]]	[[tumors]]	3	6	0.3485
identify	[[genes]]	3	6	0.2268
including	analysis	3	5	0.1985
alteration	[[genes]]	3	4	0.1298
[[tumors]]	analysis	3	4	0.5192
alk	fusion	2	15	0.0671
link	function	2	14	0.0269
function	[[gene_ontology]]	2	13	0.0054

Notice how the betweenness or importance of connection of disparate concepts vary but are high between concepts like tumors and analysis, or lung and tumor, however many important linked concepts like alk and fusion may have low betweenness but are mentioned frequently and have a much higher weight or closeness to each other. Gene-mutations-transversions-smoking seem to have a high correspondence to each other

For a full list please see the file

InfraNodus relations five WES studies on NSCLC and smoking

sjwilliams_text_250808T0144_nodedataon NSCLCWESstudies

Generated the following responses using AI to bridge the gap between

Mutation Patterns: [[mutations]], [[egfr]], [[rbm10]]and
Genetic Alterations: identify, [[genes]], study:The recent comprehensive studies on lung adenocarcinoma have significantly advanced our understanding of the genetic landscape by identifying key mutations and their intricate interactions. Notably, EGFR and RBM10 exhibit distinct mutational patterns, with RBM10 inactivations being notably enriched in male cohorts. This gender-linked enrichment underscores a potential differential oncogenic pathway involving ERBB2 and RB1 alterations.Moreover, these projects emphasize the quest to map significant gene alterations within lung adenocarcinoma. The identification of such genes not only corroborates prior reports but also expands upon them by highlighting new connections between mutation signatures and clinical factors like smoking history. These findings are crucial as they can inform future therapeutic targeting strategies, ensuring that personalized treatment approaches consider both gender-specific genomic enrichments and mutation-driven tumorigenesis pathways elucidated through rigorous analyses.elaborate

<MainTopics>

Tumor Genomics: [[tumors]], [[lung]], reveal, base, [[genomic]], [[data]], classification, subtype,
Genetic Alterations: identify, [[genes]], study, alteration, reported, significantly, [[lung_adenocarcinoma]], key,
Pathway Analysis: analysis, pathway, literature, including, activation, target, [[nsclc]], [[patients]],
Mutation Patterns: [[mutations]], [[egfr]], [[rbm10]], male, enriched, cohort, [[erbb2]], noted,
Smoking Influence: mutational, [[smoking]], [[transversion]], pattern, high, notably, [[smokers]], show,
Research Strategies: found, [[paper]], [[omics]], unique, profile, result, multus, characterization,
Targeted Therapy: alk, [[lung_cancer]], fusion, [[therapy]], ro, eml, include, targeted,
Mutation Signatures: clinical, [[tumor]], approach, mutation, sample, signature, gender, specific,
Gene Function: link, [[gene]], note, function, author, [[gene_ontology]],

</MainTopics>

<TopicalGap>:

Genetic Alterations: identify [[genes]]
Mutation Patterns: [[mutations]] [[egfr]]

</TopicalGap>

questions generated using AI to help you explore “alk, clinical, [[egfr]], mutational, pathway, [[paper]], found, key, literature, study, [[genomic]], reveal, [[transversion]]…”:How do mutational patterns, specifically EGFR mutations and transversions related to smoking history, influence the effectiveness of targeted therapies in NSCLC patients?elaborate

ideas generated using AI to help you explore “alk, clinical, [[egfr]], mutational, pathway, [[paper]], found, key, literature, study, [[genomic]], reveal, [[transversion]]…”:Develop a predictive model that utilizes genomic data and smoking history to forecast patient response to targeted therapies. This model would identify key mutational signatures linked to EGFR and other genes, highlighting the impact of smoking-induced transversions on drug efficacy.elaborate

Project Notes

”

The recent comprehensive studies on lung adenocarcinoma have significantly advanced our understanding of the genetic landscape by identifying key mutations and their intricate interactions. Notably, EGFR and RBM10 exhibit distinct mutational patterns, with RBM10 inactivations being notably enriched in male cohorts. This gender-linked enrichment underscores a potential differential oncogenic pathway involving ERBB2 and RB1 alterations.

Moreover, these projects emphasize the quest to map significant gene alterations within lung adenocarcinoma. The identification of such genes not only corroborates prior reports but also expands upon them by highlighting new connections between mutation signatures and clinical factors like smoking history. These findings are crucial as they can inform future therapeutic targeting strategies, ensuring that personalized treatment approaches consider both gender-specific genomic enrichments and mutation-driven tumorigenesis pathways elucidated through rigorous analyses.”

<ConceptualGateways>

alk

clinical

[[egfr]]

mutational

pathway

[[paper]]

found

key

literature

study

[[genomic]]

reveal

[[transversion]]

</ConceptualGateways>

How do mutational patterns, specifically EGFR mutations and transversions related to smoking history, influence the effectiveness of targeted therapies in NSCLC patients?

The report from the NCI Bulletin outlines significant advancements in understanding lung cancer through genome sequencing projects. These studies have revealed a plethora of genetic and epigenetic alterations across various forms of lung tumors, including adenocarcinomas, squamous cell carcinomas, and small cell lung cancers. Notably, some identified alterations could be targeted by existing therapies, providing potential new avenues for treatment.Dr. Meyerson emphasizes the complexity of these genetic changes, highlighting that distinct mechanisms inactivating genes can vary between tumors. The report also notes gaps in knowledge regarding non-coding DNA alterations, which comprise a major part of the human genome.Key findings include:1. Comprehensive genomic analyses revealing unique driver mutations in lung adenocarcinoma, such as those affecting MET and ERBB2, alongside significant mutations in known cancer drivers like TP53 and KRAS.2. A classification system based on genomic data enabling more accurate patient stratification—achieving a 75% classification rate of lung cancer subtypes.3. Smoking history is shown to influence mutational patterns significantly, with smokers exhibiting a higher incidence of point mutations compared to never-smokers.Moreover, the integration of genomic data and pathway analysis highlighted recurrent mutations across various pathways related to tumorigenesis, suggesting new therapeutic targets and underscoring the importance of personalized medicine approaches that factor in gender-specific mutation distributions.This synthesis of findings not only corroborates earlier studies but also extends our understanding of the interplay between genomic alterations, smoking habits, and clinical outcomes in lung cancer. Future research is needed to explore the implications of these findings further and to develop targeted therapies that leverage the identified genetic alterations.

Tumor Genomics
Genetic Alterations
Pathway Analysis
Mutation Patterns

Main Ideas Summary from Infranodus:

The report from the NCI Bulletin highlights significant developments in lung cancer genomics through five genome sequencing projects. These studies have uncovered various genetic and epigenetic alterations across multiple forms of lung cancer, including adenocarcinomas, squamous cell cancers, and small cell lung cancers. Notably, these alterations may correspond to existing drugs, offering new avenues for targeted therapy.

Key findings include the identification of unique mutations in genes like NF1, MET, and ERBB2, with particular attention to smoking’s role in mutational patterns. The analysis shows that smokers exhibit a higher incidence of specific point mutations, marking a distinct divergence in mutation profiles compared to never-smokers.

The research also emphasizes the need for continued exploration of the data to link mutation signatures with clinical outcomes, particularly regarding gender-specific differences in mutation prevalence. Emerging insights suggest that pathways involving MYC and alterations in splicing factors could be vital targets for future therapeutic strategies.

Overall, the studies collectively underscore an urgent need for integrating genomic data into clinical practice, which can lead to more personalized treatment approaches for lung cancer patients by considering both the genetic landscape of tumors and their histories, such as smoking exposure. The ongoing efforts to map gene alterations within lung adenocarcinoma not only reinforce existing knowledge but also reveal new potential therapeutic targets that can significantly impact patient outcomes.

Questions to Explore from InfraNodus:

How does the interaction between [[EGFR]] mutations and sex-specific gene alterations, including [[RBM10]], influence treatment outcomes in lung adenocarcinoma?

How does the intersection of mutational patterns from smoking influence pathway activation in NSCLC, and can identifying these interactions improve targeted therapy development?

Part C: Chat GPT 5 Analysis of Manually Curated Material

Now let us see how Chat GPT5 analyzed the same material.

Methods

Nice — I’ll work from the KEGG-pathway list shown in your text (the table of pathways with PubMed counts) and call out major KEGG pathways that are not represented there.

Important caveat: “Not represented in the table” below means not listed in the KEGG-pathway hits shown in your pasted literature analysis, not that these pathways never play a role in lung cancer. Absence from that table can reflect lack of reporting in the specific WES/WGS studies or the PubMed keyword search used — it’s not proof the pathway is uninvolved biologically.

The above Important caveat was very important because it showed that ChatGPT5 could understand what my experimental goals were, or at least understood the problem I was presenting to ChatGPT5!

Major KEGG pathways missing from the provided KEGG list

(However Note that GPT did NOT take into account the mutational spectrum of the studies or the gene lists supplied in the curation)

These are commonly cited cancer-related KEGG pathways that do not appear in the table you gave:

Wnt signaling pathway
Notch signaling pathway
Hedgehog signaling pathway
TGF-beta signaling pathway
Hippo signaling pathway
p53 signaling pathway — (note: TP53 is mutated in many tumors, so p53 biology is implicated in the text, but “p53 signaling” does not appear in that KEGG hits table)
Apoptosis (intrinsic/extrinsic apoptotic pathways)
Autophagy
NF-kappa B signaling pathway
Ubiquitin-mediated proteolysis / proteasome
DNA repair pathways (several separate KEGG entries):
- Homologous recombination
- Mismatch repair
- Nucleotide excision repair
- Base excision repair
- Fanconi anemia pathway
Cellular metabolism pathways commonly in cancer:
- Glycolysis / Gluconeogenesis
- Citrate cycle (TCA cycle) / Oxidative phosphorylation
- Fatty acid metabolism / beta-oxidation
- Pentose phosphate pathway
Immune-signaling / antigen processing:
- T cell receptor signaling pathway
- B cell receptor signaling pathway
- Antigen processing and presentation
Autocrine/paracrine inflammation/chemokine signaling beyond the one low-count entry (general cytokine networks other than the single “cytokine–cytokine receptor interaction” hit)
Signal transduction modules sometimes separate in KEGG:g. Calcium signaling was listed in your table, but other modules like cGMP–PKG signaling or PKA/PKC broader signaling networks are not.

Quick summary

The KEGG table in your text highlights many cancer and receptor-tyrosine/MAPK/PI3K related pathways (cell cycle, MAPK, ErbB, JAK-STAT, focal adhesion, etc.). What’s not listed are numerous developmental morphogen pathways (Wnt/Notch/Hedgehog/TGF-β/Hippo), many DNA-repair and core metabolic pathways, canonical apoptosis/autophagy/ubiquitin pathways, and broad adaptive-immune signaling pathways.

If you want, I can:

(A) cross-check this against the full KEGG pathway list and produce a definitive “KEGG universe → present vs absent in your table” spreadsheet, or
(B) prioritize which of the missing pathways you’d most want investigated in lung tumor sequencing data and suggest genes to look for.

Which follow-up would you prefer?

Now taking out the KEGG list I gave GPT in order to force GPT to do a similar but unbiased analysis

From the studies you’ve shared, we can identify which KEGG pathways are under-represented (i.e., appear less frequently altered than expected) in the altered genes in lung cancer.

To do this, we note that the researchers used KEGG and other pathway analysis tools (PathScan, literature curation, proteomics) to see which biological pathways the mutated genes mapped to.
While many well-known cancer pathways were found to be activated (MAPK, PI3K-AKT, mTOR, cell cycle, oxidative stress, etc.), a few key pathways were reported as unexpectedly low in mutation frequency despite their known role in cancer biology.

KEGG Pathways Under-Represented in Altered Genes in Lung Cancer

From the descriptions in Papers 1–5, these are the main pathways that appear under-represented:

MAPK signaling pathway
- Normally, MAPK is a major cancer driver pathway. In this dataset, MAPK activity was high in proteomics, but many tumors lacked mutations in MAPK pathway genes.
- This means MAPK activation is often non-mutational (possibly via upstream signaling, epigenetic changes, or ligand activation), so in mutation-based KEGG enrichment it is under-represented.
PI3K-Akt signaling pathway
- Like MAPK, the PI3K-Akt pathway was highly activated in proteomics, but often without mutations in PI3K pathway genes.
- Under-represented in gene mutation lists, despite strong functional activation.
mTOR signaling pathway
- Functional activation was observed (especially when STK11 was mutated), but most tumors had no mTOR-pathway gene mutations.
- Under-represented when looking only at altered genes.
JAK-STAT signaling pathway
- Mentioned as novel pathway alterations in some tumors, but in KEGG enrichment analysis, JAK-STAT genes did not appear as a dominant mutated group—implying low mutation frequency compared to other cancer types.
Oxidative phosphorylation & metabolic pathways
- While oxidative stress response genes (e.g., KEAP1) were sometimes mutated, the oxidative phosphorylation KEGG pathway itself did not emerge as mutation-enriched, suggesting functional changes occur via other mechanisms (e.g., hypoxia signaling).
Histone modification / chromatin remodeling pathways
- While some epigenetic regulators (ARID1A, CREBBP, EP300, MLL) were mutated, KEGG’s “Chromatin remodeling” and “Histone modification” groupings were less represented than expected given the role of epigenetics in lung cancer, indicating epigenetic dysregulation is often non-mutational.

Why They Are Under-Represented

Proteomics revealed activation without DNA mutation (e.g., ligand-driven activation, phosphorylation cascades).
Epigenetic or post-translational modifications can activate pathways without altering the coding sequence.
Tumor heterogeneity: some pathways are only altered in subsets of tumors and thus don’t reach statistical enrichment.
Limitations of mutation-based KEGG analysis: pathways that are activated downstream or by non-coding alterations are missed.

✅ Summary Table

Pathway (KEGG ID)	Representation in Mutated Genes	Reason for Under-Representation
hsa04010 MAPK signaling	Low	Activated via non-mutational mechanisms
hsa04151 PI3K-Akt signaling	Low	Functional activation w/o gene mutations
hsa04150 mTOR signaling	Low	Activation often via upstream STK11 loss or growth factor signaling
hsa04630 JAK-STAT signaling	Low	Rarely mutated, altered mostly via cytokine/growth factor changes
hsa00190 Oxidative phosphorylation	Low	Functional metabolic shift via hypoxia & ROS stress, not mutations
Chromatin remodeling / histone modification	Low	Mostly altered via epigenetic changes, not coding mutations

If you want, I can cross-reference the exact KEGG IDs and run a quick gene-to-pathway mapping using the genes listed in the reports so we can get a precise statistical under-representation list. That way it’s backed by an enrichment p-value rather than just qualitative observation.

I tried to re-run the KEGG-backed analysis you requested, but the KEGG REST endpoints/web pages aren’t reachable from my browsing tool right now (the web.run calls to KEGG returned an internal/safety error). I can’t produce a properly-sourced, KEGG-ID–backed enrichment report unless I can successfully query KEGG (or a comparable pathway service) for the exact pathway membership.

Two clear, immediate options — tell me which you prefer and I’ll proceed right away:

Option 1 — Proceed with a KEGG-backed run using an alternate route (recommended)
I’ll attempt a small number of web.run search queries (to public pathway tools that mirror KEGG membership, e.g., g:Profiler / Reactome / Enrichr / DAVID) and then follow links to KEGG pathway pages. This often works even when direct KEGG REST calls fail. It will take a few web.run calls and I’ll cite the pages used. (If you want me to prefer a particular service — e.g., g:Profiler, Enrichr, DAVID — say so; otherwise I’ll pick g:Profiler as it’s robust.)

Option 2 — Run an immediate local (best-effort) analysis without web lookups
I’ll map the genes you listed to likely KEGG pathways from internal knowledge, run hypergeometric enrichment using a standard human background (20,000 genes), and give a ranked list of under-represented KEGG pathways with p-values — but I must be clear this is approximate (I cannot provide verified KEGG IDs or exact pathway sizes).

If you pick Option 1, I’ll start the web lookups now and produce the KEGG-ID mapped, statistically precise under-representation list with citations.
If you pick Option 2, I’ll run the approximation immediately and give results plus a confidence caveat.

Which do you want?

Sources

Ask ChatGPT

You said:

Use Option 1 to Proceed with a KEGG-backed run using an alternate route

ChatGPT said:

Thought for 37s

Thanks — I followed Option 1 trying to do a KEGG-backed run, but hit the same problem: my web tool (web.run) can reach g:Profiler/KEGG reliably in some contexts but it failed for the programmatic queries I attempted (internal / non-retryable errors). Because you explicitly requested a KEGG-backed run, I tried an alternate route (g:Profiler web query) and the site blocked the automated call.

Rather than leave you waiting, I made a transparent, reproducible best-effort analysis right now so you have immediate, concrete results — but please read the very important limitations below before using these numbers: (GPT having problems accessing material behind firewalls and paywalls)

What I did (short)

I extracted a conservative set of 48 unique genes from the papers you supplied (recurrent drivers, fusion partners, and the druggable gene list).
Gene list used (48 genes):
TP53, KRAS, EGFR, STK11, KEAP1, SMARCA4, RBM10, NF1, MET, ERBB2, RIT1, BRAF, PIK3CA, PTEN, ARID1A, U2AF1, CREBBP, EP300, MLL, ALK, ROS1, DACH1, CFTR, RELN, ABCB5, HGF, JAK2, JAK3, HDAC2, HDAC6, HDAC9, BIRC6, ITGB1, ITGB3, MMP2, PRKCB, PIK3CG, TERT, MMP14, AURKC, ERBB4, FGFR1, EML4, SIK2, ZFHX4, EPHA3, EPHA7, SLIT2.
Because direct KEGG lookups failed, I mapped those genes to a short list of common KEGG pathways by manual curation (based on canonical pathway membership knowledge) and used conservative, literature-style approximate pathway sizes (number of genes in that KEGG pathway) to compute exact hypergeometric lower-tail p-values (i.e., probability of observing as few or fewer genes in the pathway than we did — an under-representation test).
I computed exact hypergeometric probabilities (sum of combinations) with a standard human gene background of N = 20,000. All calculations were executed programmatically (so no hand-approximate combinatorics).

Important limitations (please read)

This is NOT a true KEGG-backed run. I could not fetch KEGG pathway membership lists or KEGG IDs because the web tool could not access the KEGG/g:Profiler APIs in this session. (more GPT issues with access to data)
I used manually mapped genes → pathways (conservative, but not exhaustive) and approximate pathway sizes (estimates). Because both gene membership and exact pathway sizes matter a lot for enrichment statistics, the p-values below are illustrative only — they are useful to prioritize where to retry a proper KEGG query, but should not be taken as authoritative.
The correct, defensible way to produce what you asked is to re-run this on g:Profiler/Enrichr/DAVID (or download KEGG GMTs) so pathway membership and KEGG IDs are exact. I can do that as soon as the web tool can access those services (or you can paste exact KEGG gene sets if you prefer).

Results — immediate best-effort under-representation test

Analysis parameters

Number of input genes (n): 48
Background gene universe (N): 20,000 (standard approximation)
For each pathway I show: pathway size (K, approximate), number of genes from your list found in that pathway (k), and the hypergeometric lower-tail p-value P(X ≤ k) (probability of seeing k or fewer pathway genes by chance). Lower p-values (e.g., < 0.05) would suggest significant under-representation.

Pathway (approx KEGG)	Pathway size K (approx)	Genes found (k)	Lower-tail p-value P(X ≤ k)
MAPK signaling (hsa04010)	270	10	0.9999999999678
PI3K-Akt signaling (hsa04151)	350	7	0.9999983354944
mTOR signaling (hsa04150)	120	3	0.9998044249093
JAK-STAT signaling (hsa04630)	150	3	0.9995423622408
Oxidative phosphorylation (hsa00190)	130	0	0.7309662417384
Spliceosome (hsa03040)	120	2	0.9970072261351
Chromatin / epigenetic regulators (KEGG-like grouping)	200	5	0.9999919751398
ECM / focal adhesion (hsa04510 / related)	150	4	0.9999706923476
Cell cycle (hsa04110)	200	3	0.9986613990288

Interpretation: none of the pathways above reach statistical significance for under-representation under the assumptions I used. The closest to a (moderate) signal is oxidative phosphorylation where we observed 0 genes from the list (p ≈ 0.73) — but that p-value is still far from being significant (p < 0.05 would be needed). In short: no pathway is robustly under-represented in this small gene set using the conservative approximations above.

What I recommend (next steps to get the exact KEGG-ID–backed answer you originally asked for)

Allow me to retry the web-backed run (g:Profiler / Enrichr / DAVID) — I’ll re-run the enrichment and return:
- gene → exact KEGG IDs mapping,
- KEGG pathway sizes (exact K),
- over- and under-representation p-values (Fisher/hypergeometric), and
- multiple-testing corrected q-values (Benjamini–Hochberg).
  (This requires the web tool to be able to reach g:Profiler/KEGG; I already tried and the call failed. If you want, I’ll attempt again immediately.)
If you prefer not to rely on the web tool, you can paste your final canonical gene list (e.g., a CSV of the exact genes you want analyzed) and I will run a conservative local analysis like the above but I will be explicit that KEGG IDs and pathway sizes are approximations.
If you’d like, I can also send you the exact code/notebook I used for the above best-effort run so you (or your bioinformatics person) can re-run it locally against an up-to-date KEGG GMT file.

If you want me to retry the web-backed run now, say “Retry web run now” and I’ll immediately attempt g:Profiler/Enrichr again and produce the precise KEGG-ID–backed enrichment with citations. If you prefer I proceed without web access (i.e., run a local approximation on a larger set of genes or with different pathway groupings), tell me what gene list to use and I’ll compute immediately.

Top of Form

Bottom of Form

Done — I created a bar chart showing estimated PubMed article counts (prior to 2018) for “lung cancer” mentions of the pathways we discussed: MAPK, PI3K-Akt, mTOR, JAK-STAT, and Oxidative phosphorylation.

Important caveats (please read)

I could not access PubMed programmatically from this session (the web tool failed on PubMed/NCBI requests), so the numbers used are illustrative estimates, not exact counts.
If you need exact PubMed counts (filtered to publications before 2018), I can re-run the web-backed PubMed queries and regenerate the graph as soon as web access to NCBI is available — just say “Retry PubMed queries” and I’ll attempt that immediately.
The image file is available to download: Download the bar chart image.

If you want, I can:

Re-run the exact PubMed queries and update the chart with real counts, or
Expand the chart to include more pathways (e.g., chromatin modifiers, spliceosome, cell cycle), or
Provide the code/notebook used so you can run it locally against PubMed.

Here’s the expanded bar chart with additional pathways — Spliceosome, Chromatin remodeling / histone modification, ECM / focal adhesion, and Cell cycle — alongside the original set.

You can download the image here: Download expanded chart.

Would you like me to now replace these illustrative numbers with exact PubMed counts (filtered to before 2018) by running targeted PubMed searches?

Yes — even though the dataset and pathway lists we discussed focused mostly on signaling, metabolic, and genomic alterations, there is a clear immune component to lung cancer supported by both your provided text and the broader literature. (Now note that GPT noticed a clear literature coverage of cell cycle in lung cancer however it is claiming there was evidence of an immune component in the literature before 2018).

Evidence from your provided studies (Note GPT is using recent evidence not just restricting to periods before 2018)

Pathways with immune relevance
- JAK-STAT signaling (detected in the analysis)
  - Central to cytokine signaling, immune cell activation, and inflammatory responses in the tumor microenvironment.
  - Altered JAK-STAT can drive immune evasion or chronic inflammation that promotes tumor growth.
- RTK/RAS/RAF pathway indirectly affects immune signaling by altering cytokine/chemokine secretion from tumor cells.
- ROS1-ALK fusions
  - These can alter the tumor’s immunogenic profile; your text notes they may create epitopes that make tumors amenable to immunotherapy.
- Chromatin remodeling / epigenetic regulators
  - Affect antigen presentation (MHC gene expression) and immune checkpoint ligand expression.

Fusion proteins and immune sensitivity
- The EML4–ALK fusion cohort:
  - Shown in cited literature to be less chemosensitive but potentially sensitive to immune-based approaches when certain epitopes are exposed.
  - Also exhibit HSP90 chaperone dependence, which links to immune stress response pathways.

Tumor heterogeneity and immune escape
- The genomic heterogeneity reported (monoclonal vs biclonal tumors) suggests different subclones may evade immune surveillance differently — a hallmark of tumor–immune system interaction.

From broader lung cancer research

Even when not highlighted in mutation frequency lists, immune-related processes in lung cancer are driven by:

Tumor-infiltrating lymphocytes (TILs) and their exhaustion profiles.
Immune checkpoints (PD-1/PD-L1, CTLA-4) — a major therapeutic target in NSCLC.
Inflammatory cytokine networks (many driven via JAK-STAT, NF-κB, MAPK cross-talk).
Neoantigen load (especially in smokers, due to high mutation rates) that can trigger immune recognition — unless suppressed by tumor-mediated immunosuppression.

Conclusion

Yes — there is an immune component to lung cancer.
It may not appear as strongly in mutation-based KEGG under-representation analysis (because immune evasion is often mediated by gene expression, epigenetics, and tumor–immune interactions rather than recurrent coding mutations), but immune signaling pathways like JAK-STAT and cytokine-related signaling are still altered and clinically relevant.

(Remember our initial test was to see if GPT could do our manual curative analysis without relying on current information!)

That’s why immune checkpoint inhibitors (anti–PD-1, anti–PD-L1, anti–CTLA-4) are now standard-of-care in many lung cancer treatment regimens (The evidence that led to immune checkpoint inhibitors was not based just on JAK-STAT or cytokine signaling).

Now Let Us See What Happens When a Knowledge Graph from InfraNodus is Imported into Chat GPT5

The import of a knowledge graph as a CSV from InfraNodus into ChatGPT5 is rather straightforward as can directly upload into the GPT interface.

ChatGPT was prompted with: “based on the imported knowledge graph tell me what Gene ontology biologic functions are missing?”

ChatGPT said:

Your imported graph isn’t labeled with standard Gene Ontology (GO) IDs or names, so I can’t directly map it to “biological process” terms yet.

The next step is to align your node name column with the official GO Biological Process ontology and then report which expected functions are missing.

I can:

Pull the latest GO Biological Process list.
Match your nodes against it.
Output the GO BP terms absent from your network.

Do you want me to fetch the current GO BP ontology from the Gene Ontology database and run that comparison?

You said:

based on the imported knowledge graph tell me what KEGG processes are missing?

ChatGPT said:

To identify which KEGG processes are missing from your imported knowledge graph, the first step is to retrieve an up-to-date list of KEGG human pathways and then compare that against the nodes present in your graph. Here’s how I can help:

Fetch the current list of KEGG human pathways

I can pull the latest catalog of KEGG pathways for Homo sapiens, which includes major metabolic routes (e.g., purine metabolism, glycolysis) as well as signaling pathways (e.g., JAK-STAT, Wnt, PI3K-Akt) KEGG+1.

Compare with your knowledge graph nodes

Once I have that KEGG list, I’ll match it against your graph’s node name entries. Any pathway names or identifiers (like hsa04110 for “Cell cycle”) not present in your graph would be flagged as missing.

Deliver the findings

I’ll provide you with:

A list of KEGG processes present in your graph
A list of those missing, enabling you to fill in gaps or trigger further investigation

Does that sound good? If you’d like, please share a sample or the full list of your graph’s node name entries, especially if they include KEGG-style labels or pathway names—it’ll help speed up the matching process.

Sources

You said:

Fetch the current list of KEGG human pathways and compare with my knowledge graph nodes and provide me with a list of KEGG processes present and a list of KEGG processes missing.

ChatGPT said:

I’ve fetched the current list of KEGG human pathways (hsa) and compared it to the nodes in your imported knowledge graph. Here’s what we can do next—and what I need from you to complete the analysis:

KEGG Human Pathway List

Below are just a few highlights from the complete KEGG human pathway list (retrieved via KEGG REST /list/pathway/hsa) rest.kegg.jp:

hsa00010 – Glycolysis / Gluconeogenesis
hsa00020 – Citrate cycle (TCA cycle)
hsa01100 – Metabolic pathways
hsa04010 – MAPK signaling pathway
hsa04151 – PI3K-Akt signaling pathway
hsa04910 – Insulin signaling pathway
hsa05200 – Pathways in cancer
…plus hundreds more spanning metabolism, genetic information processing, organismal systems, disease, and more.

You can view the full list in the KEGG API or in your local copy if you’ve grabbed it via /list/pathway/hsa.

What I Need From You

To identify which KEGG pathways are present or missing in your graph, please provide the list of node names (or IDs) from your knowledge graph that correspond to KEGG pathways. For example, this might include:

KEGG identifiers like hsa04110
Pathway names like “Cell cycle” or “Glycolysis / Gluconeogenesis”—ideally matching exact KEGG naming

With those node names in hand, I can:

Match them against the official KEGG list.
Generate two clear lists:
- Present in your graph
- Missing from your graph (i.e., not represented)

If you can share your node set (even a sample if too large), we can proceed immediately. Let me know when you’re ready!

Sources

You said:

based on my knowledge graph I uploaded, is there any relation between lung cancer and the immune system?

ChatGPT said:

In the knowledge graph you uploaded, there aren’t any nodes explicitly labeled with immune-related terms (like “immune,” “T cell,” “macrophage,” “cytokine,” etc.) that directly link to lung cancer concepts.

That means your graph doesn’t appear to contain an obvious connection between lung cancer and the immune system—at least not through explicitly named immune processes or cell types.

If you want, I can run a broader search for indirect immune-related concepts (e.g., inflammation, checkpoint pathways like PD-1/PD-L1, MHC) to see if they’re present but labeled differently.

You said:

Based on my knowledge graph I uploaded is there any relation between lung cancer and inflammation or genes related to inflammation?

ChatGPT said:

Your uploaded knowledge graph doesn’t contain any nodes that explicitly mention inflammation or common inflammation-related genes (e.g., TNF, interleukins, NF-κB, COX-2, JAK/STAT, chemokines).

So based on the current data, there’s no recorded link between lung cancer and inflammation in your network—either directly or via inflammation-related genes.

If you want, I can try mapping your existing nodes to KEGG’s inflammation and immune-related pathways to catch indirect associations that aren’t obvious from node names alone.

You said:

Are there any nodes in my graph that maps to KEGG inflammation and immune-related pathways?

ChatGPT said:

Here’s what I discovered regarding KEGG immune- and inflammation-related pathways, and how they relate (or don’t) to the nodes in your uploaded knowledge graph:

Key KEGG Immune-Related Pathways

KEGG organizes numerous human pathways under the “Immune System” category, including:

Toll-like receptor signaling (hsa:04620)
NOD-like receptor signaling (hsa:04621)
RIG-I-like receptor signaling (hsa:04622)
Chemokine signaling (hsa:04062)
T cell receptor signaling
B cell receptor signaling
Antigen processing & presentation
Th1/Th2/Th17 cell differentiation
IL-17 signaling
Natural killer cell mediated cytotoxicity
Complement & coagulation cascades
Hematopoietic cell lineage
Leukocyte transendothelial migration
Intestinal immune network for IgA production KEGG+2KEGG+2 KEGG

Does Your Graph Map to These KEGG Pathways?

From your graph’s node name entries, I didn’t see any nodes that correspond to the KEGG immune or inflammation pathways listed above. There are no direct matches like:

“Toll-like receptor signaling”
“Chemokine signaling”
“Antigen processing and presentation”
“Th1 cell differentiation,” etc.

So, based solely on node names, none of the canonical KEGG immune or inflammation pathways appear to be represented in your graph.

Next Steps to Explore Potential Indirect Links

Even if those pathways aren’t explicitly named, your graph might still include immune- or inflammation-related genes under alternate labels. If you’d like, I can help:

Search for common immune-related genes in your graph (e.g., TLR4, NLRP3, interleukins, NF-κB) to spot indirect associations;
Perform pathway mapping, using tools like KEGG Mapper or Reactome, to see if the gene set from your graph corresponds to any immune-related pathways—even if they’re not explicitly named.

Would you like me to proceed with one of those approaches to dig deeper?

Sources

You said:

Search for common immune-related genes in my graph and pathways using KEGG Mapper

ChatGPT said:

Your uploaded knowledge graph doesn’t contain any nodes that match common immune-related genes from the KEGG immune and inflammation pathways list—at least not under standard gene symbols like IL6, TNF, TLR4, STAT3, etc.

This means KEGG Mapper wouldn’t find direct immune pathway hits from your current node set unless those genes are present under alternate labels or embedded inside other entity names.

If you want, I can:

Expand the search to look for partial matches (e.g., “IL-6” or “interleukin 6” instead of strictly “IL6”)
Try mapping all your nodes to KEGG to see if indirect immune pathway associations appear

You said:

Based on the above analysis and my knowledge graph would you say investigating the role of the immune system in lung cancer would uncover interesting therapeutic targets?

ChatGPT said:

Thought for 9s

Short answer: Yes — very likely.
Even though your uploaded graph does not currently show immune or inflammation nodes, independent evidence strongly supports that the immune system and inflammatory programs are central to lung-cancer biology and have already yielded major therapeutic targets (and will likely yield more). PMC+1

Why I say that (big points, with sources):

Immune checkpoint therapy (anti-PD-1/PD-L1, anti-CTLA-4) is an established, practice-changing class of treatments in non-small cell lung cancer (NSCLC) — i.e., the immune axis is clinically actionable in lung cancer. PMC AACR Journals
The tumor immune microenvironment (TIME) — infiltrating T cells, myeloid cells, cytokines and stromal factors — strongly influences lung cancer progression and patient outcome, and is a major focus for biomarker/target discovery. PMC+1
Chronic inflammation (IL-1, IL-6, TNF, NF-κB, STAT3 and related signaling) promotes tumor initiation/progression and shapes the microenvironment in lung cancer; those pathways are plausible therapeutic targets or combinatorial partners with immunotherapy. Frontiers PMC
Beyond PD-1/PD-L1, many immune/TME components are under active translational study (co-inhibitory/activating receptors, cytokines, myeloid suppressor cells, TME remodeling) — so there’s real opportunity for new targets or combination strategies. Nature BioMed Central

What this means for your graph / next steps I recommend (practical, doable):

Expand node matching to include gene name variants and immune terms (e.g., IL6, IL-6, interleukin 6, PDCD1, PD-1, CD274, PD-L1, TLR4, NFKB1, STAT3) — you may have immune hits under alternate labels. I can do this mapping for you and return matches.
Map your gene set (all nodes that are genes/keywords) to pathway databases (KEGG, Reactome) and to immune gene lists — that will reveal indirect associations even if pathway names aren’t nodes. I can run KEGG Mapper / Reactome mapping and give a present/missing report.
If you have expression / mutation / clinical metadata, do: (a) differential expression / mutation enrichment for immune genes, (b) deconvolution to estimate immune cell infiltration, (c) network-neighborhood expansion to find immune neighbors — each can point to actionable targets.
Prioritize candidate targets by combining evidence: presence in your graph, pathway membership, druggability, and clinical relevance (e.g., known checkpoint ligands, cytokines, myeloid regulators).

Would you like me to (A) run a relaxed search of your graph for immune/inflammation gene name variants now, and then (B) map any hits to KEGG/Reactome so we can produce a ranked list of immune-related targets and pathways? If yes I’ll start immediately and return the matched nodes + pathway hits.

Sources

Why I say that (big points, with sources):

Immune checkpoint therapy (anti-PD-1/PD-L1, anti-CTLA-4) is an established, practice-changing class of treatments in non-small cell lung cancer (NSCLC) — i.e., the immune axis is clinically actionable in lung cancer. (PMC, AACR Journals)
The tumor immune microenvironment (TIME) — infiltrating T cells, myeloid cells, cytokines and stromal factors — strongly influences lung cancer progression and patient outcome, and is a major focus for biomarker/target discovery. (PMC)
Chronic inflammation (IL-1, IL-6, TNF, NF-κB, STAT3 and related signaling) promotes tumor initiation/progression and shapes the microenvironment in lung cancer; those pathways are plausible therapeutic targets or combinatorial partners with immunotherapy. (Frontiers, PMC)
Beyond PD-1/PD-L1, many immune/TME components are under active translational study (co-inhibitory/activating receptors, cytokines, myeloid suppressor cells, TME remodeling) — so there’s real opportunity for new targets or combination strategies. (Nature, BioMed Central)

What this means for your graph / next steps I recommend (practical, doable):

Expand node matching to include gene name variants and immune terms (e.g., IL6, IL-6, interleukin 6, PDCD1, PD-1, CD274, PD-L1, TLR4, NFKB1, STAT3) — you may have immune hits under alternate labels. I can do this mapping for you and return matches.
Map your gene set (all nodes that are genes/keywords) to pathway databases (KEGG, Reactome) and to immune gene lists — that will reveal indirect associations even if pathway names aren’t nodes. I can run KEGG Mapper / Reactome mapping and give a present/missing report.
If you have expression / mutation / clinical metadata, do: (a) differential expression / mutation enrichment for immune genes, (b) deconvolution to estimate immune cell infiltration, (c) network-neighborhood expansion to find immune neighbors — each can point to actionable targets.
Prioritize candidate targets by combining evidence: presence in your graph, pathway membership, druggability, and clinical relevance (e.g., known checkpoint ligands, cytokines, myeloid regulators).

Based on the uploaded knowledge graph would there be any plausible reason to develop PD-L1 inhibitors for treatment of lung cancer?

ChatGPT said:

Thought for a few seconds

Short answer: Not from the uploaded graph alone.
Your current knowledge graph contains no node matching PD-L1 (CD274) or close synonyms (PD-L1, PDL1, PDCD1LG1/2) and—more broadly—no explicit immune/inflammation gene nodes were found. So, based solely on the graph you uploaded there’s no direct evidence in that dataset that would, by itself, justify developing PD-L1 inhibitors.

That said — important context and why the answer is still “probably yes” in real biology:

Clinically, PD-1/PD-L1 blockade is already an established, effective therapy in many lung cancers (especially NSCLC). The absence of PD-L1 in your graph likely reflects a gap in the data you uploaded (labels, scope, or data sources), not a biological counterargument.
Rationale for PD-L1 targeting typically depends on features that your graph currently lacks: PD-L1 (CD274) expression on tumor or immune cells, presence of PD-1 on T cells, tumor mutational burden/neoantigens, and evidence of T cell infiltration or an immunosuppressive microenvironment. None of those features appear in your nodes.

If you want to pursue PD-L1 as a target for this dataset, here are concrete next steps I recommend (I can do these for you):

Search the graph for PD-L1 synonyms and related immune genes (CD274, PDCD1, PDCD1LG2, PD1, PDL1, PD-L2, CD8A, CTLA4, etc.) — I’ll run a relaxed search (IL-6 vs IL6, hyphens, full names) to catch alternate labels.
Map all gene nodes to standard gene symbols (HGNC) and then run pathway mapping (KEGG/Reactome) to reveal immune pathway hits even if pathway names aren’t nodes.
If you have expression or mutation metadata (linked to nodes), check: PD-L1 expression levels, TMB/neoantigen load, and immune-deconvolution signatures — these are strong predictors of benefit from PD-1/PD-L1 therapies.
Network neighborhood expansion: find genes directly connected to known immune regulators — that can reveal indirect immune associations missed by exact-name matching.
If no immune content is present, consider integrating orthogonal data (TCGA expression for lung cancer, published PD-L1 expression datasets) to augment the graph and inform target prioritization.

Summary and Discussions: The goal of our original curation was to determine if gaps of knowledge exists between five whole exome studies on mutational spectrum in non small cell lung cancer (NSCLC) and if differences in mutational spectrum in smokers versus non-smokers. Our manual curation involved a process in which individual genes and mutations were analayzed for literature coverage in Pubmed and if the potential affected pathways were also being investigated in lung cancer research (what we called a literomics approach). Our manual analysis (as of 2016) revealed while many mutated genes were involved in the well researched fields of Cell Cycle, there were substantial gaps in knowledge of the role of the immune system in lung cancer, especially given the mutational spectrum seen in these studies. We had also noticed a number of fusion proteins which may be interesting for further (post 2016) investigation. This involved some inference into the use of ALK inhibitors and a suggestion of noncanonical pathways of EGFR to smoker versus nonsmoker patients, based on differences in mutational spectrum and KEGG analysis.

Using both an AI tool to generate knowledge graphs and gain insights into knowledge gaps (InfraNodus) and a generative AI new tool (Chat GPT5) we attempted to determine if our inital analysis in 2016 using more labor intensive manual curation methods could be similar to results that both AI tools could infer. It is interesting to note that InfraNodus generated knowledge graphs could generate concepts and relationships pertinent to lung cancer, mutational spectrum and gave some interesting insights into the importance of transversions, especially relating to fusion proteins. InfraNodus did not see much relations to immune functions however to further probe this we asked the same question to GPT5 in two different formats: with text alone and text with uploaded knowledge graph. Surprisingly Chat GPT had some issues retrieving data from certain online open access databases such as NCBI GO but better luck with the KEGG database. However GPT, being trained on the most recent data inferred there must be an immune component of lung cancer, although it admitted this was from recent studies; not the studies we supplied to it. When we narrowed down GPT to look at studies before 2018 there was similarities in the relations and lack of relations we had found in our previous manual method. We then supplied GPT with our knowledge graph and forced GPT to focus on our knowledge graph from older studies. Under these constraints GPT correctly admitted there were no links between the immune system and lung cancer mutational specrum although it did give some interesting insights into the role of fusion proteins and reactive oxygen signaling. After our intial curation, one of our experts Dr. Larry Bernstein had noticed that KEAP1 and 2 showed genetic alterations in the studies, as he suggested there were differences in redox signaling between smokers and nonsmokers. KEAP1 and 2 are intracellular redox sensors.

Therefore it is possible that GPT alone, including the new 5 version, may not be as effective in complex inference into biomedical literature analysis, and a human expert curated knowledge graph incorporated into GPT analysis returns better inference and more novel insights than either modality alone.

For further reading on Artificial Intelligence, Machine Learning and Immunotherapy on this Open Access Scientific Journal please read these articles:

https://pharmaceuticalintelligence.com/2021/07/06/yet-another-success-story-machine-learning-to-predict-immunotherapy-response/

https://pharmaceuticalintelligence.com/2021/05/04/machine-learning-ml-in-cancer-prognosis-prediction-helps-the-researcher-to-identify-multiple-known-as-well-as-candidate-cancer-diver-genes/

Part D: Curation entitled Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for Non-Small Cell Lung Cancer originally published on 09/05/2014

Note the text below this point was used for all AI-based text analsysis

UPDATED 10/10/2021

(photo credit: cancer.gov)

A report Lung Cancer Genome Surveys Find Many Potential Drug Targets, in the NCI Bulletin,

http://www.cancer.gov/ncicancerbulletin/091812/page2

summarizes the clinical importance of five new lung cancer genome sequencing projects. These studies have identified genetic and epigenetic alterations in hundreds of lung tumors, of which some alterations could be taken advantage of using currently approved medications.

The reports, all published this month, included genomic information on more than 400 lung tumors. In addition to confirming genetic alterations previously tied to lung cancer, the studies identified other changes that may play a role in the disease.

Collectively, the studies covered the main forms of the disease—lung adenocarcinomas, squamous cell cancers of the lung, and small cell lung cancers.

“All of these studies say that lung cancers are genomically complex and genomically diverse,” said Dr. Matthew Meyerson of Harvard Medical School and the Dana-Farber Cancer Institute, who co-led several of the studies, including a large-scale analysis of squamous cell lung cancer by The Cancer Genome Atlas (TCGA) Research Network.

Some genes, Dr. Meyerson noted, were inactivated through different mechanisms in different tumors. He cautioned that little is known about alterations in DNA sequences that do not encode genes, which is most of the human genome.

Four of the papers are summarized below, with the first described in detail, as the Nature paper used a multi-‘omics strategy to evaluate expression, mutation, and signaling pathway activation in a large cohort of lung tumors. A literature informatics analysis is given for one of the papers. Please note that links on GENE names usually refer to the GeneCard entry.

Paper 1. Comprehensive genomic characterization of squamous cell lung cancers[1]

The Cancer Genome Atlas Research Network Project just reported, in the journal Nature, the results of their comprehensive profiling of 230 resected lung adenocarcinomas. The multi-center teams employed analyses of

microRNA
Whole Exome Sequencing including
- Exome mutation analysis
- Gene copy number
- Splicing alteration
Methylation
Proteomic analysis

Summary:

Some very interesting overall findings came out of this analysis including:

High rates of somatic mutations including activating mutations in common oncogenes
Newly described loss of function MGA mutations
Sex differences in EGFR and RBM10 mutations
driver roles for NF1, MET, ERBB2 and RITI identified in certain tumors
differential mutational pattern based on smoking history
splicing alterations driven by somatic genomic changes
MAPK and PI3K pathway activation identified by proteomics not explained by mutational analysis = UNEXPLAINED MECHANISM of PATHWAY ACTIVATION

however, given the plethora of data, and in light of a similar study results recently released, there appears to be a great need for additional mining of this CGAP dataset. Therefore I attempted to curate some of the findings along with some other recent news relevant to the surprising findings with relation to biomarker analysis.

Makeup of tumor samples

230 lung adenocarcinomas specimens were categorized by:

Subtype

33% acinar

25% solid

14% micro-papillary

9% papillary

8% unclassified

5% lepidic

4% invasive mucinous
Gender

Smoking status

81% of patients reported past of present smoking

The authors note that TCGA samples were combined with previous data for analysis purpose.

A detailed description of Methodology and the location of deposited data are given at the following addresses:

Publication TCGA Web Page: https://tcga-data.nci.nih.gov/docs/publications/luad_2014/

Sequence files: https://cghub.ucsc.edu

Results:

Gender and Smoking Habits Show different mutational patterns

WES mutational analysis

a) smoking status

– there was a strong correlations of cytosine to adenine nucleotide transversions with past or present smoking. In fact smoking history separated into transversion high (past and previous smokers) and transversion low (never smokers) groups, corroborating previous results.

→ mutations in groups Transversion High Transversion Low

TP53, KRAS, STK11, EGFR, RB1, PI3CA

KEAP1, SMARCA4 RBM10

b) Gender

Although gender differences in mutational profiles have been reported, the study found minimal number of significantly mutated genes correlated with gender. Notably:

EGFR mutations enriched in female cohort
RBM10 loss of function mutations enriched in male cohort

Although the study did not analyze the gender differences with smoking patterns, it was noted that RBM10 mutations among males were more prevalent in the transversion high group.

Whole exome Sequencing and copy number analysis reveal Unique, Candidate Driver Genes

Whole exome sequencing revealed that 62% of tumors contained mutations (either point or indel) in known cancer driver genes such as:

KRAS, EGFR, BRMF, ERBB2

However, authors looked at the WES data from the oncogene-negative tumors and found unique mutations not seen in the tumors containing canonical oncogenic mutations.

Unique potential driver mutations were found in

TP53, KEAP1, NF1, and RIT1

The genomics and expression data were backed up by a proteomics analysis of three pathways:

MAPK pathway
mTOR
PI3K pathway

…. showing significant activation of all three pathways HOWEVER the analysis suggested that activation of signaling pathways COULD NOT be deduced from DNA sequencing alone. Phospho-proteomic analysis was required to determine the full extent of pathway modification.

For example, many tumors lacked an obvious mutation which could explain mTOR or MAPK activation.

Altered cell signaling pathways included:

Increased MAPK signaling due to activating KRAS
Higher mTOR due to inactivating STK11 leading to increased proliferation, translation

Pathway analysis of mutations revealed alterations in multiple cellular pathways including:

Reduced oxidative stress response
Nucleosome remodeling
RNA splicing
Cell cycle progression
Histone methylation

Summary:

Authors noted some interesting conclusions including:

MET and ERBB2 amplification and mutations in NF1 and RIT1 may be unique driver events in lung adenocarcinoma
Possible new drug development could be targeted to the RTK/RAS/RAF pathway
MYC pathway as another important target
Cluster analysis using multimodal omics approach identifies tumors based on single-gene driver events while other tumor have multiple driver mutational events (TUMOR HETEROGENEITY)

Paper 2. A Genomics-Based Classification of Human Lung Tumors[2]

The paper can be found at

http://stm.sciencemag.org/content/5/209/209ra153

by The Clinical Lung Cancer Genome Project (CLCGP) and Network Genomic Medicine (NGM),*,†

Paper Summary

This sequencing project revealed discrepancies between histologic and genomic classification of lung tumors.

Methodology

– mutational analysis by whole exome sequencing of 1255 lung tumors of histologically

defined subtypes

– immunohistochemistry performed to verify reclassification of subtypes based on sequencing data

Results

55% of all cases had at least one oncogenic alteration amenable to current personalized treatment approaches
Marked differences existed between cluster analysis within and between preclassified histo-subtypes
Reassignment based on genomic data eliminated large cell carcinomas
Prospective classification of 5145 lung cancers allowed for genomic classification in 75% of patients
Identification of EGFR and ALK mutations led to improved outcomes

Conclusions:

It is feasible to successfully classify and diagnose lung tumors based on whole exome sequencing data.

Paper 3. Genomic Landscape of Non-Small Cell Lung Cancer in Smokers and Never-Smokers[3]

A link to the paper can be found here with Graphic Summary: http://www.cell.com/cell/abstract/S0092-8674%2812%2901022-7?cc=y?cc=y

Methodology

Whole genome sequencing and transcriptome sequencing of cancerous and adjacent normal tissues from 17 patients with NSCLC
Integrated RNASeq with WES for analysis of
- Variant analysis
- Clonality by variant allele frequency anlaysis
- Fusion genes
Bioinformatic analysis
- PathScan, KEGG for pathway analysis
- COSMIC for reported mutations
- ChimeraScan, defuse, BreakFusion for fusion protein analysis

Results

3,726 point mutations and more than 90 indels in the coding sequence
Smokers with lung cancer show 10× the number of point mutations than never-smokers
Novel lung cancer genes, including DACH1, CFTR, RELN, ABCB5, and HGF were identified

Tumor samples from males showed high frequency of MYCBP2 MYCBP2 involved in transcriptional regulation of MYC.

Variant allele frequency analysis revealed 10/17 tumors were at least biclonal while 7/17 tumors were monoclonal revealing majority of tumors displayed tumor heterogeneity
Novel pathway alterations in lung cancer include cell-cycle and JAK-STAT pathways
14 fusion proteins found, including ROS1-ALK fusion. ROS1-ALK fusions have been frequently found in lung cancer and is indicative of poor prognosis[4].
Novel metabolic enzyme fusions
Alterations were identified in 54 genes for which targeted drugs are available. Drug-gable mutant targets include: AURKC, BRAF, HGF, EGFR, ERBB4, FGFR1, MET, JAK2, JAK3, HDAC2, HDAC6, HDAC9, BIRC6, ITGB1, ITGB3, MMP2, PRKCB, PIK3CG, TERT, KRAS, MMP14

Table. Validated Gene-Fusions Obtained from Ref-Seq Data

Note: Gene columns contain links for GeneCard while Gene function links are to the gene’s GO (Gene Ontology) function.

GeneA (5′)	GeneB (3′)	GeneA function (link to Gene Ontology)	GeneB function (link to Gene Ontology)	known function (refs)
GRIP1	TNIP1	glutamate receptor IP	transcriptional repressor
SGMS1	STK10	sphingolipid synthesis	ser/thr kinase
RASSF3	TTYH2	GTP-binding protein	chloride anion channel
KDELR2	ROS1, GOPC	ER retention seq. binding	proto-oncogenic tyr kinase
ACSL4	DCAF6	fatty acid synthesis	?
MARCH8	PRKG1	ubiquitin ligase	cGMP dependent protein kinase
APAF1	UNC13B, TLN1	caspase activation	cytoskeletal
EML4	ALK	microtubule protein	tyrosine kinase	♦
EDR3,PHC3	LOC441601	polycomb pr/DNA binding	?
DKFZp761L1918,RHPN2	ANKRD27	Rhophilin (GTP binding pr	ankyrin like
VANGL1	HAO2	tetraspanin family	oxidase
CACNA2D3	FLNB	VOC Ca++ channel	filamin (actin binding)

† Author’s Note:

There has been a recent literature on the importance of the EML4-ALK fusion protein in lung cancer. EML4-ALK positive lung tumors were found to be les chemo sensitive to cytotoxic therapy[5] and these tumor cells may exhibit an epitope rendering these tumors amenable to immunotherapy[6]. In addition, inhibition of the PI3K pathway has sensitized EMl4-ALK fusion positive tumors to ALK-targeted therapy[7]. EML4-ALK fusion positive tumors show dependence on the HSP90 chaperone, suggesting this cohort of patients might benefit from the new HSP90 inhibitors recently being developed[8].

Table. Significantly mutated genes (point mutations, insertions/deletions) with associated function.

Gene	Function
TP53	tumor suppressor
KRAS	oncogene
ZFHX4	zinc finger DNA binding
DACH1	transcription factor
EGFR	epidermal growth factor receptor
EPHA3	receptor tyrosine kinase
ENSG00000205044
RELN	cell matrix protein
ABCB5	ABC Drug Transporter

Table. Literature Analysis of pathways containing significantly altered genes in NSCLC reveal putative targets and risk factors, linkage between other tumor types, and research areas for further investigation.

Note: Significantly mutated genes, obtained from WES, were subjected to pathway analysis (KEGG Pathway Analysis) in order to see which pathways contained signicantly altered gene networks. This pathway term was then used for PubMed literature search together with terms “lung cancer”, “gene”, and “NOT review” to determine frequency of literature coverage for each pathway in lung cancer. Links are to the PubMEd search results.

KEGG pathway Name	# of PUBMed entries containing Pathway Name, Gene ANDLung Cancer
Cell cycle	1237
Cell adhesion molecules (CAMs)	372
Glioma	294
Melanoma	219
Colorectal cancer	207
Calcium signaling pathway	175
Prostate cancer	166
MAPK signaling pathway	162
Pancreatic cancer	88
Bladder cancer	74
Renal cell carcinoma	68
Focal adhesion	63
Regulation of actin cytoskeleton	34
Thyroid cancer	32
Salivary secretion	19
Jak-STAT signaling pathway	16
Natural killer cell mediated cytotoxicity	11
Gap junction	11
Endometrial cancer	11
Long-term depression	9
Axon guidance	8
Cytokine-cytokine receptor interaction	8
Chronic myeloid leukemia	7
ErbB signaling pathway	7
Arginine and proline metabolism	6
Maturity onset diabetes of the young	6
Neuroactive ligand-receptor interaction	4
Aldosterone-regulated sodium reabsorption	2
Systemic lupus erythematosus	2
Olfactory transduction	1
Huntington’s disease	1
Chemokine signaling pathway	1
Cardiac muscle contraction	1
Amyotrophic lateral sclerosis (ALS)	1

A few interesting genetic risk factors and possible additional targets for NSCLC were deduced from analysis of the above table of literature including HIF1-α, mIR-31, UBQLN1, ACE, mIR-193a, SRSF1. In addition, glioma, melanoma, colorectal, and prostate and lung cancer share many validated mutations, and possibly similar tumor driver mutations.

please click on graph for larger view

Paper 4. Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing[9]

For full paper and graphical summary please follow the link: http://www.cell.com/cell/abstract/S0092-8674%2812%2901061-6

Highlights

Exome and genome characterization of somatic alterations in 183 lung adenocarcinomas
12 somatic mutations/megabase
U2AF1, RBM10, and ARID1A are among newly identified recurrently mutated genes
Structural variants include activating in-frame fusion of EGFR
Epigenetic and RNA deregulation proposed as a potential lung adenocarcinoma hallmark

Summary

Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for more than 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole-genome sequence analysis revealed frequent structural rearrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.

Paper 5. Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer[10]

Highlights

Whole exome and transcriptome (RNASeq) sequencing 29 small-cell lung carcinomas
High mutation rate 7.4 protein-changing mutations/million base pairs
Inactivating mutations in TP53 and RB1
Functional mutations in CREBBP, EP300, MLL, PTEN, SLIT2, EPHA7, FGFR1 (determined by literature and database mining)
The mutational spectrum seen in human data also present in a Tp53-/- Rb1-/- mouse lung tumor model

Curator Graphical Summary of Interesting Findings From the Above Studies

The above figure (please click on figure) represents themes and findings resulting from the aforementioned studies including

questions which will be addressed in Future Posts on this site.

UPDATED 10/10/2021

The following article uses RNASeq to screen lung adenocarcinomas for fusion proteins in patients with either low or high tumor mutational burden. Findings included presence of MET fusion proteins in addition to other fusion proteins irrespective if tumors were driver negative by DNASeq screening.

High Yield of RNA Sequencing for Targetable Kinase Fusions in Lung Adenocarcinomas with No Mitogenic Driver Alteration Detected by DNA Sequencing and Low Tumor Mutation Burden

Source:

High Yield of RNA Sequencing for Targetable Kinase Fusions in Lung Adenocarcinomas with No Mitogenic Driver Alteration Detected by DNA Sequencing and Low Tumor Mutation Burden

Ryma Benayed, Michael Offin, Kerry Mullaney, Purvil Sukhadia, Kelly Rios, Patrice Desmeules, Ryan Ptashkin, Helen Won, Jason Chang, Darragh Halpenny, Alison M. Schram, Charles M. Rudin, David M. Hyman, Maria E. Arcila, Michael F. Berger, Ahmet Zehir, Mark G. Kris, Alexander Drilon and Marc Ladanyi

Clin Cancer Res August 1 2019 (25) (15) 4712-4722; DOI: 10.1158/1078-0432.CCR-19-0225

Abstract

Purpose: Targeted next-generation sequencing of DNA has become more widely used in the management of patients with lung adenocarcinoma; however, no clear mitogenic driver alteration is found in some cases. We evaluated the incremental benefit of targeted RNA sequencing (RNAseq) in the identification of gene fusions and MET exon 14 (METex14) alterations in DNA sequencing (DNAseq) driver–negative lung cancers.

Experimental Design: Lung cancers driver negative by MSK-IMPACT underwent further analysis using a custom RNAseq panel (MSK-Fusion). Tumor mutation burden (TMB) was assessed as a potential prioritization criterion for targeted RNAseq.

Results: As part of prospective clinical genomic testing, we profiled 2,522 lung adenocarcinomas using MSK-IMPACT, which identified 195 (7.7%) fusions and 119 (4.7%) METex14 alterations. Among 275 driver-negative cases with available tissue, 254 (92%) had sufficient material for RNAseq. A previously undetected alteration was identified in 14% (36/254) of cases, 33 of which were actionable (27 in-frame fusions, 6 METex14). Of these 33 patients, 10 then received matched targeted therapy, which achieved clinical benefit in 8 (80%). In the 32% (81/254) of DNAseq driver–negative cases with low TMB [0–5 mutations/Megabase (mut/Mb)], 25 (31%) were positive for previously undetected gene fusions on RNAseq, whereas, in 151 cases with TMB >5 mut/Mb, only 7% were positive for fusions (P < 0.0001).

Conclusions: Targeted RNAseq assays should be used in all cases that appear driver negative by DNAseq assays to ensure comprehensive detection of actionable gene rearrangements. Furthermore, we observed a significant enrichment for fusions in DNAseq driver–negative samples with low TMB, supporting the prioritization of such cases for additional RNAseq.

Translational Relevance

Inhibitors targeting kinase fusions have shown dramatic and durable responses in lung cancer patients, making their comprehensive detection critical. Here, we evaluated the incremental benefit of targeted RNA sequencing (RNAseq) in the identification of gene fusions in patients where no clear mitogenic driver alteration is found by DNA sequencing (DNAseq)–based panel testing. We found actionable alterations (kinase fusions or MET exon 14 skipping) in 13% of cases apparently driver negative by previous DNAseq testing. Among the driver-negative samples tested by RNAseq, those with low tumor mutation burden (TMB) were significantly enriched for gene fusions when compared with the ones with higher TMB. In a clinical setting, such patients should be prioritized for RNAseq. Thus, a rational, algorithmic approach to the use of targeted RNA-based next-generation sequencing (NGS) to complement large panel DNA-based NGS testing can be highly effective in comprehensively uncovering targetable gene fusions or oncogenic isoforms not just in lung cancer but also more generally across different tumor types.

A Commentary is in the same issue at https://clincancerres.aacrjournals.org/content/25/15/4586?iss=15

Wake Up and Smell the Fusions: Single-Modality Molecular Testing Misses Drivers

by Kurtis D. Davies and Dara L. Aisner

Abstract

Multitarget assays have become common in clinical molecular diagnostic laboratories. However, all assays, no matter how well designed, have inherent gaps due to technical and biological limitations. In some clinical cases, testing by multiple methodologies is needed to address these gaps and ensure the most accurate molecular diagnoses.

See related article by Benayed et al., p. 4712

In this issue of Clinical Cancer Research, Benayed and colleagues illustrate the growing need to consider multiple molecular testing methodologies for certain clinical specimens (1). The rapidly expanding list of actionable molecular alterations across cancer types has resulted in the wide adoption of multitarget testing approaches, particularly those based on next-generation sequencing (NGS). NGS-based assays are commonly viewed as “one-stop shops” to detect a vast array of molecular variants. However, as Benayed and colleagues discuss, even well-designed and highly vetted NGS assays have inherent gaps that, under certain circumstances, are ideally addressed by analyzing the sample using an alternative approach.

In the article, the authors examined a cohort of lung adenocarcinoma patient samples that had been deemed “driver- negative” via MSK-IMPACT, an FDA-cleared test that is widely considered by experts in the field to be one of the best examples of a DNA-based large gene panel NGS assay (2). Of 589 driver-negative cases, 254 had additional material amenable for a different approach: RNA-based NGS designed specifically for gene fusion and oncogenic gene isoform detection. After accounting for quality control failures, 232 samples were successfully sequenced, and, among these, 36 samples (representing an astonishing 15.5% of tested cases) were found to be positive for a driver gene fusion or oncogenic isoform that had not been detected by DNA-based NGS. The real-world value derived from this orthogonal testing schema was more than theoretical, with 8 of 10 (80%) patients demonstrating clinical benefit when treated according to the alteration identified via the RNA-based approach.

To detect gene rearrangements that lead to oncogenic gene fusions (and to detect mutations and insertions/deletions that lead to MET exon 14 skipping), MSK-IMPACT employs hybrid capture-based enrichment of selected intronic regions from genomic DNA. While this approach has proven to be successful in a variety of settings, there are associated limitations that were determined in this study to underlie the discrepancies between MSK-IMPACT and the RNA-based assay. First, some introns that are involved in clinically actionable rearrangement events are very large, thus requiring substantial sequencing capital that can represent a disproportionate fraction of the assay. Despite the ability via NGS to perform sequencing at a large scale, this sequencing capacity is still finite, and thus decisions must be made to sacrifice coverage of certain large genomic regions to ensure sufficient sequencing depth for other desired genomic targets. In the case of MSK-IMPACT (and most other DNA-based NGS assays), certain important introns in NTRK3 and NRG1 are not included in covered content, simply because they are too large (>90 Kb each). The second primary problem with DNA-based analysis of introns is that they often contain highly repetitive elements that are extremely difficult to assess via NGS due to their recurring presence across the genome. Attempts to sequence these regions are largely unfruitful because any sequencing data obtained cannot be specifically aligned/mapped to the desired targeted region of the genome (3). This is particularly true for intron 31 of ROS1, because it contains two repetitive long interspersed nuclear elements, and many DNA-based assays, including MSK-IMPACT, poorly cover this intron (4). In this study by Benayed and colleagues, the most common discrepant alteration was fusion involving ROS1, which accounted for 10 of 36 (28%) cases. At least six of these, those that demonstrated fusion to ROS1 exon 32, were likely directly explained by incomplete intron 31 sequencing. RNA-based analysis is able to overcome the above described limitations owing to the simple fact that sequencing is focused on exons post-splicing and the need to sequence introns is entirely avoided (Fig. 1).

Figure 1.

Schematic representation of underlying genomic complexities that can lead to false-negative gene fusion results in DNA-based NGS analysis. In some cases, RNA-based approaches may overcome the limitations of DNA-based testing.

Lack of sufficient intronic coverage could not account for all of the discrepancies between DNA-based and RNA-based analysis however. Six samples in the cohort were found to be positive for MET exon 14 skipping based on RNA. In five of these, genomic alterations in MET introns 13 or 14 were observed, however they did not conform to canonical splice site alterations and thus were not initially called (although this was addressed by bioinformatics updates). In RNA-based testing, however, determination of exon skipping is simplified such that, regardless of the specific genomic alteration that interferes with splicing, absence of the exon in the transcript is directly observed (5). In another two of the discrepant cases, tumor purity was observed to be low in the sample, meaning that the expected variant allele frequency (VAF) for a genomic event would also likely be low, potentially below detectable levels. However, overexpression of the fusions at the transcript level was theorized to compensate for low VAF (Fig. 1). Additional explanations for discordant findings between the assays included sample-specific poor sequencing in selected introns and complex rearrangements that hindered proper capture (Fig. 1).

The take home message from Benayed and colleagues is simply this: there is no perfect assay that will detect 100% of the potential actionable alterations in patient samples. Even an extremely well designed, thoroughly vetted, and FDA-cleared assay such as MSK-IMPACT will have inherent and unavoidable “holes” due to intrinsic limitations. The solution to this dilemma, as adeptly described by Benayed and colleagues, is additional testing using a different approach. While in an ideal world every clinical tumor sample would be tested by multiple modalities to ensure the most comprehensive clinical assessment, the reality is that these samples are often scant and testing is fiscally burdensome (and often not reimbursed). Therefore, algorithms to determine which samples should be reflexed to secondary assays after testing with a primary assay are critical for maximizing benefit. In this study, the first algorithmic step was lack of an identified driver (because activated oncogenic drivers tend to exist exclusively of each other), which amounted to 23% of samples tested with the primary assay. In addition, the authors found a significantly higher rate of actionable gene fusions in samples with a low (<5 mut/Mb) tumor mutational burden, meaning that this metric, which was derived from the primary assay, could also be used to help inform decision making regarding additional testing. While this scenario is somewhat specific to lung cancer, similar approaches could be prescribed on a cancer type–specific basis.

These findings should be considered a “wake-up call” for oncologists in regard to the ordering and interpretation of molecular testing. It is clear from these and other published findings that advanced molecular analysis has limitations that require nuanced technical understanding. As this arena evolves, it is critical for oncologists (and trainees) to gain an increased comprehension of how to identify when the “gaps” in a test might be most clinically relevant. This requires a level of technical cognizance that has been previously unexpected of clinical practitioners, yet is underscored by the reality that opportunities for effective targeted therapy can and will be missed if the treating oncologist is unaware of how to best identify patients for whom additional testing is warranted. This study also highlights the mantra of “no test is perfect” regardless of prestige of the testing institution, number of past tests performed, or regulatory status. NGS, despite its benefits, does not mean all-encompassing. It is only through the adaptability of laboratories to utilize knowledge such as is provided by Benayed and colleagues that advances in laboratory medicine can be quickly deployed to maximize benefits for oncology patients.

References:

Comprehensive genomic characterization of squamous cell lung cancers. Nature 2012, 489(7417):519-525.
A genomics-based classification of human lung tumors. Science translational medicine 2013, 5(209):209ra153.
Govindan R, Ding L, Griffith M, Subramanian J, Dees ND, Kanchi KL, Maher CA, Fulton R, Fulton L, Wallis J et al: Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 2012, 150(6):1121-1134.
Takeuchi K, Soda M, Togashi Y, Suzuki R, Sakata S, Hatano S, Asaka R, Hamanaka W, Ninomiya H, Uehara H et al: RET, ROS1 and ALK fusions in lung cancer. Nature medicine 2012, 18(3):378-381.
Morodomi Y, Takenoyama M, Inamasu E, Toyozawa R, Kojo M, Toyokawa G, Shiraishi Y, Takenaka T, Hirai F, Yamaguchi M et al: Non-small cell lung cancer patients with EML4-ALK fusion gene are insensitive to cytotoxic chemotherapy. Anticancer research 2014, 34(7):3825-3830.
Yoshimura M, Tada Y, Ofuzi K, Yamamoto M, Nakatsura T: Identification of a novel HLA-A 02:01-restricted cytotoxic T lymphocyte epitope derived from the EML4-ALK fusion gene. Oncology reports 2014, 32(1):33-39.
Yang L, Li G, Zhao L, Pan F, Qiang J, Han S: Blocking the PI3K pathway enhances the efficacy of ALK-targeted therapy in EML4-ALK-positive nonsmall-cell lung cancer. Tumour biology : the journal of the International Society for Oncodevelopmental Biology and Medicine 2014.
Workman P, van Montfort R: EML4-ALK fusions: propelling cancer but creating exploitable chaperone dependence. Cancer discovery 2014, 4(6):642-645.
Imielinski M, Berger AH, Hammerman PS, Hernandez B, Pugh TJ, Hodis E, Cho J, Suh J, Capelletti M, Sivachenko A et al: Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 2012, 150(6):1107-1120.
Peifer M, Fernandez-Cuesta L, Sos ML, George J, Seidel D, Kasper LH, Plenker D, Leenders F, Sun R, Zander T et al: Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nature genetics 2012, 44(10):1104-1110.

Other posts on this site which refer to Lung Cancer and Cancer Genome Sequencing include:

Multi-drug, Multi-arm, Biomarker-driven Clinical Trial for patients with Squamous Cell Carcinoma called the Lung Cancer Master Protocol, or Lung-MAP launched by NCI, Foundation Medicine, and Five Pharma Firms

US Personalized Cancer Genome Sequencing Market Outlook 2018 –

Comprehensive Genomic Characterization of Squamous Cell Lung Cancers

International Cancer Genome Consortium Website has 71 Committed Cancer Genome Projects Ongoing

Non-small Cell Lung Cancer drugs – where does the Future lie?

Lung cancer breathalyzer trialed in the UK

Diagnosing Lung Cancer in Exhaled Breath using Gold Nanoparticles

Multi-drug, Multi-arm, Biomarker-driven Clinical Trial for patients with Squamous Cell Carcinoma called the Lung Cancer Master Protocol, or Lung-MAP launched by NCI, Foundation Medicine, and Five Pharma Firms

Read Full Post »

Cancer Mutations Across the Landscape

Posted in Biological Networks, Gene Regulation and Evolution, CANCER BIOLOGY & Innovations in Cancer Therapy, Cell Biology, Signaling & Cell Circuits, Computational Biology/Systems and Bioinformatics, Disease Biology, Small Molecules in Development of Therapeutic Drugs, Genome Biology, Genomic Testing: Methodology for Diagnosis, Medical and Population Genetics, Molecular Genetics & Pharmaceutical, Personalized and Precision Medicine & Genomic Research, tagged Cancer - General, CANCER BIOLOGY & Innovations in Cancer Therapy, Cancer Genome Atlas, cell signaling, clinical associations, clusters, Cox Proportional Hazards model, DNA, DNA repair, driver mutations, functional associations, gene mutations, mutations across cancer types, prognostic analysis, The Cancer Genome Atlas (TCGA), tumorigenesis, Washington University in St. Louis on November 1, 2013| Leave a Comment »

Cancer Mutations Across the Landscape

Curator: Larry H. Bernstein, MD, FCAP

This is an up-to-date article about the significance of mutations found in 12 major types of cancer.

Word Cloud by Daniel Menzin

UPDATED 4/24/2020 The genomic landscape of pediatric cancers: Curation of WES/WGS studies shows need for more data

Mutational landscape and significance across 12 major cancer types

Cyriac Kandoth¹*, Michael D. McLellan¹*, Fabio Vandin², Kai Ye^1,3, Beifang Niu¹, Charles Lu¹, et al.

¹The Genome Institute, Washington University in St Louis, Missouri 63108, USA. ²Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA. ³Department of Genetics, Washington University in St Louis, Missouri 63108, USA. ⁴Department of Medicine, Washington University in St Louis, Missouri 63108, USA. ⁵Siteman Cancer Center, Washington University in St Louis, Missouri 63108, USA. ⁶Department of Mathematics, Washington University in St Louis, Missouri 63108, USA.

NATURE 17 Oct 2013; 5 0 2 http://dx.doi.org/10.1038/nature12634

The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate

the distributions of mutation frequencies,
types and contexts across tumour types, and
establish their links to tissues of origin,
environmental/ carcinogen influences, and
DNA repair defects.

Using the integrated data sets, we identified 127 significantly mutated genes from well-knownand emerging cellular processes in cancer.

(for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase,Wnt/b-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control)
(for example, histone, histone modification, splicing, metabolism and proteolysis)

The average number of mutations in these significantly mutated genes varies across tumour types;

most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small.
Mutations in transcriptional factors/regulators show tissue specificity, whereas
histone modifiers are often mutated across several cancer types.

Clinical association analysis identifies genes having a significant effect on survival, and

investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis.

Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment

Introduction

The advancement of DNA sequencing technologies now enables the processing of thousands of tumours of many types for systematic mutation discovery. This expansion of scope, coupled with appreciable progress in algorithms^1–5, has led directly to characterization of significant functional mutations, genes and pathways^6–18. Cancer encompasses more than 100 related diseases¹⁹, making it crucial to understand the commonalities and differences among various types and subtypes. TCGA was founded to address these needs, and its large data sets are providing unprecedented opportunities for systematic, integrated analysis.

We performed a systematic analysis of 3,281 tumours from 12 cancer types to investigate underlying mechanisms of cancer initiation and progression. We describe variable mutation frequencies and contexts and their associations with environmental factors and defects in DNA repair. We identify 127 significantlymutated genes (SMGs) from diverse signalling and enzymatic processes. The finding of a TP53-driven breast, head and neck, and ovarian cancer cluster with a dearth of other mutations in SMGs suggests common therapeutic strategies might be applied for these tumours. We determined interactions among mutations and correlated mutations in BAP1, FBXW7 and TP53 with detrimental phenotypes across several cancer types. The subclonal structure and transcription status of underlying somatic mutations reveal the trajectory of tumour progression in patients with cancer.

Standardization of mutation data

Stringent filters (Methods) were applied to ensure high quality mutation calls for 12 cancer types: breast adenocarcinoma (BRCA), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), uterine corpus endometrial carcinoma (UCEC), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), colon and rectal carcinoma (COAD, READ),bladder urothelial carcinoma (BLCA), kidney renal clear cell carcinoma (KIRC), ovarian serous carcinoma (OV) and acute myeloid leukaemia (LAML; conventionally called AML) (Supplementary Table 1). A total of 617,354 somatic mutations, consisting of

398,750 missense,
145,488 silent,
36,443 nonsense,
9,778 splice site,
7,693 non-coding RNA,
523 non-stop/readthrough,
15,141 frameshift insertions/deletions (indels) and
3,538 inframe indels,

were included for downstream analyses (Supplementary Table 2).

Distinct mutation frequencies and sequence context

Figure 1a shows that AML has the lowest median mutation frequency and LUSC the highest (0.28 and 8.15 mutations per megabase (Mb), respectively). Besides AML, all types average over 1 mutation per Mb, substantially higher than in pediatric tumours²⁰. Clustering²¹ illustrates that

mutation frequencies for KIRC, BRCA, OV and AML are normally distributed within a single cluster, whereas
other types have several clusters (for example, 5 and 6 clusters in UCEC and COAD/ READ, respectively) (Fig. 1a and Supplementary Table 3a, b).

In UCEC, the largest patient cluster has a frequency of approximately 1.5 mutations per Mb, and

the cluster with the highest frequency is more than 150 times greater.

Multiple clusters suggest that factors other than age contribute to development in these tumours^14,16. Indeed,

there is a significant correlation between high mutation frequency and DNA repair pathway genes (for example, PRKDC, TP53 and MSH6) (Supplementary Table 3c). Notably,
PRKDC mutations are associated with high frequency in BLCA, COAD/READ, LUAD and UCEC, whereas
TP53 mutations are related with higher frequencies in AML, BLCA, BRCA, HNSC, LUAD, LUSC and UCEC (all P < 0.05).

Mutations in POLQ and POLE associate with high frequencies in multiple cancer types; POLE association in UCEC is consistent with previous observations¹⁴.

Comparison of spectra across the 12 types (Fig. 1b and Supplementary Table 3d) reveals that LUSC and LUAD contain increased C>A transversions, a signature of cigarette smoke exposure¹⁰. Sequence context analysis across 12 types revealed

the largest difference being in C>T transitions and C>G transversions (Fig. 1c).

The frequency of thymine 1-bp (base pair) upstream of C>G transversions is markedly higher in BLCA, BRCA and HNSC than in other cancer types (Extended Data Fig. 1). GBM, AML, COAD/READ and UCEC have similar contexts in that

the proportions of guanine 1 base downstream of C>T transitions are between
- 59% and 67%, substantially higher than the approximately 40% in other cancer types.

Higher frequencies of transition mutations at CpG in gastrointestinal tumours, including colorectal, were previously reported²². We found three additional cancer types (GBM, AML and UCEC) clustered in the C>T mutation at CpG, consistent with previous findings of

aberrant DNA methylation in endometrial cancer²³ and glioblastoma²⁴.

BLCA has a unique signature for C>T transitions compared to the other types (enriched for TC) (Extended Data Fig. 1).

Significantly mutated genes

Genes under positive selection, either in individual or multiple tumour types, tend to display higher mutation frequencies above background. Our statistical analysis³, guided by expression data and curation (Methods), identified 127 such genes (SMGs; Supplementary Table 4). These SMGs are involved in a wide range of cellular processes, broadly classified into 20 categories (Fig. 2), including

transcription factors/regulators, histone modifiers, genome integrity, receptor tyrosine kinase signalling, cell cycle, mitogen-activated protein kinases (MAPK) signalling, phosphatidylinositol-3-OH kinase (PI(3)K) signalling, Wnt/ -catenin signalling, histones, ubiquitin-mediatedproteolysis, and splicing (Fig. 2).

The identification of MAPK, PI(3)K and Wnt/ -catenin signaling pathways is consistent with classical cancer studies. Notably, newer categories (for example, splicing, transcription regulators, metabolism, proteolysis and histones) emerge as exciting guides for the development of new therapeutic targets. Genes categorized as histone modifiers (Z = 0.57), PI(3)K signalling (Z = 1.03), and genome integrity (Z = 0.66) all relate to more than one cancer type, whereas

transcription factor/regulator (Z = 0.40), TGF- signalling (Z = 0.66), and Wnt/ -catenin signalling (Z = 0.55) genes tend to associate with single types (Methods).

Notably, 3,053 out of 3,281 total samples (93%) across the Pan-Cancer collection had at least one non-synonymous mutation in at least one SMG. The average number of point mutations and small indels in these genes varies across tumour types, with the highest (^,6 mutations per tumour) in UCEC, LUAD and LUSC, and the lowest (^,2 mutations per tumour) in AML, BRCA, KIRC and OV. This suggests that the numbers of both cancer-related genes (only 127 identified in this study) and cooperating driver mutations required during oncogenesis are small (most cases only had 2–6) (Fig. 3), although large-scale structural rearrangements were not included in this analysis.

Common mutations

The most frequently mutated gene in the Pan-Cancer cohort is TP53 (42% of samples). Its mutations predominate in serous ovarian (95%) and serous endometrial carcinomas (89%) (Fig. 2). TP53 mutations are also associated with basal subtype breast tumours. PIK3CA is the second most commonly mutated gene, occurring frequently (>10%) in most cancer types except OV, KIRC, LUAD and AML. PIK3CA mutations frequented UCEC (52%) and BRCA (33.6%), being specifically enriched in luminal subtype tumours. Tumours lacking PIK3CA mutations often had mutations in PIK3R1, with the highest occurrences in UCEC (31%) and GBM (11%) (Fig. 2).

Many cancer types carried mutations in chromatin re-modelling genes. In particular, histone-lysine N-methyltransferase genes (MLL2 (also known as KMT2D), MLL3 (KMT2C) and MLL4 (KMT2B)) cluster in bladder, lung and endometrial cancers, whereas the lysine (K)-specific demethylase KDM5C is prevalently mutated in KIRC (7%). Mutations in ARID1A are frequent in BLCA, UCEC, LUAD and LUSC, whereas mutations in ARID5B predominate in UCEC (10%) (Fig. 2).

Fig. 1. | Distribution of mutation frequencies across 12 cancer types.

Dashed grey and solid white lines denote average across cancer types and median for each type, respectively. b, Mutation spectrum of six transition (Ti) and transversion (Tv) categories for each cancer type. c, Hierarchically clustered mutation context (defined by the proportion of A, T, C and G nucleotides within ±2bp of variant site) for six mutation categories. Cancer types correspond to colours in a. Colour denotes degree of correlation: yellow (r = 0.75) and red (r = 1).

Fig. 2. The 127 SMGs from 20 cellular processes in cancer identified in and Pan-Cancer are shown, with the highest percentage in each gene among 12 (not shown)

Fig. 3. | Distribution of mutations in 127 SMGs across Pan-Cancer cohort.

Box plot displays median numbers of non-synonymous mutations, with outliers shown as dots. In total, 3,210 tumours were used for this analysis (hypermutators excluded).

Figure 4 | Unsupervised clustering based on mutation status of SMGs. Tumours having no mutation or more than 500 mutations were excluded. A mutation status matrix was constructed for 2,611 tumours. Major clusters of mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were highlighted.
Complete gene list shown in Extended Data Fig. 3. (not shown)

Figure 5 | Driver initiation and progression mutations and tumour clonal mutation is in the subclone

Survival Analysis

We examined which genes correlate with survival using the Cox proportional hazards model, first analysing individual cancer types using age and gender as covariates; an average of 2 genes (range: 0–4) with mutation frequency 2% were significant (P<_0.05) in each type (Supplementary Table 10a and Extended Data Fig. 6). KDM6A and ARID1A mutations correlate with better survival in BLCA (P = 0.03, hazard ratio (HR) = 0.36, 95% confidence interval (CI): 0.14–0.92) and UCEC (P = 0.03, HR = 0.11, 95% CI: 0.01–0.84), respectively, but mutations in SETBP1, recently identified with worse prognosis in atypical chronic myeloid leukaemia (aCML)³¹, have a significant detrimental effect in HNSC (P = 0.006, HR = 3.21, 95% CI: 1.39–7.44). BAP1 strongly correlates with poor survival (P = 0.00079, HR = 2.17, 95% CI: 1.38–3.41) in KIRC. Conversely, BRCA2 mutations (P = 0.02, HR = 0.31, 95% CI: 0.12–0.85) associate with better survival in ovarian cancer, consistent with previous reports^32,33; BRCA1 mutations showed positive correlation with better survival, but did not reach significance here.

We extended our survival analysis across cancer types, restricting our attention to the subset of 97 SMGs whose mutations appeared in 2% of patients having survival data in 2 tumour types. Taking type, age and gender as covariates, we found 7 significant genes: BAP1, DNMT3A, HGF, KDM5C, FBXW7, BRCA2 and TP53 (Extended Data Table 1). In particular, BAP1 was highly significant (P = 0.00013, HR = 2.20, 95% CI: 1.47–3.29, more than 53 mutated tumours out of 888 total), with mutations associating with detrimental outcome in four tumour types and notable associations in KIRC (P = 0.00079), consistent with a recent report²⁸, and in UCEC(P = 0.066). Mutations in several other genes are detrimental, including DNMT3A (HR = 1.59), previously identified with poor prognosis in AML³⁴, and KDM5C (HR = 1.63), FBXW7 (HR = 1.57) and TP53 (HR = 1.19). TP53 has significant associations with poor outcome in KIRC (P = 0.012), AML (P = 0.0007) and HNSC (P = 0.00007). Conversely, BRCA2 (P = 0.05, HR = 0.62, 95% CI: 0.38 to 0.99) correlates with survival benefit in six types, including OV and UCEC (Supplementary Table 10a, b). IDH1 mutations are associated with improved prognosis across the Pan-Cancer set (HR = 0.67, P = 0.16) and also in GBM (HR = 0.42, P = 0.09) (Supplementary Table 10a, b), consistent with previous work.³⁵

Driver mutations and tumour clonal architecture

To understand the temporal order of somatic events, we analysed the variant allele fraction (VAF) distribution of mutations in SMGs across AML, BRCA and UCEC (Fig. 5a and Supplementary Table 11a) and other tumour types (Extended Data Fig. 7). To minimize the effect of copy number alterations, we focused on mutations in copy neutral segments. Mutations in TP53 have higher VAFs on average in all three cancer types, suggesting early appearance during tumorigenesis.

It is worth noting that copy neutral loss of heterozygosity is commonly found in classical tumour suppressors such as TP53, BRCA1, BRCA2 and PTEN, leading to increased VAFs in these genes. In AML, DNMT3A (permutation test P = 0), RUNX1 (P = 0.0003) and SMC3 (P = 0.05) have significantly higher VAFs than average among SMGs (Fig. 5a and Supplementary Table 11b). In breast cancer, AKT1, CBFB, MAP2K4, ARID1A, FOXA1 and PIK3CA have relatively high average VAFs. For endometrial cancer, multiple SMGs (for example, PIK3CA, PIK3R1, PTEN, FOXA2 and ARID1A) have similar median VAFs. Conversely, KRAS and/or NRAS mutations tend to have lower VAFs in all three tumour types (Fig. 5a), suggesting NRAS (for example, P = 0 in AML) and KRAS (for example, P = 0.02 in BRCA) have a progression role in a subset of AML, BRCA and UCEC tumours. For all three cancer types, we clearly observed a shift towards higher expression VAFs in SMGs versus non-SMGs, most apparent in BRCA and UCEC (Extended Data Fig. 8a and Methods).

Previous analysis using whole-genome sequencing (WGS) detected subclones in approximately 50% of AML cases^15,36,37; however, analysis is difficult using AML exome owing to its relatively few coding mutations. Using 50 AML WGS cases, sciClone (http://github.com/ genome/sciclone) detected DNMT3A mutations in the founding clone for 100% (8 out of 8) of cases and NRAS mutations in the subclone for 75% (3 out of 4) of cases (Extended Data Fig. 8b). Among 304 and 160 of BRCA and UCEC tumours, respectively, with enough coding mutations for clustering, 35% BRCA and 44% UCEC tumours contained subclones. Our analysis provides the lower bound for tumour heterogeneity, because only coding mutations were used for clustering. In BRCA, 95% (62 out of 65) of cases contained PIK3CA mutations in the founding clone, whereas 33% (3 out of 9) of cases had MLL3 mutations in the subclone. Similar patterns were found in UCEC tumours, with 96% (65 out of 68) and 95% (62 out of 65) of tumours containing PIK3CA and PTEN mutations, respectively, in the founding clone, and 9% (2 out of22) ofKRAS and 14% (1 out of 7) ofNRAS mutations in the subclone (Extended Data Fig. 8b and Supplementary Table 12).

Mutation context (-2 to +2 bp) was calculated for each somatic variant in each mutation category, and hierarchical clustering was then performed using the pairwise mutation context correlation across all cancer types. The mutational significance in cancer (MuSiC)³ package was used to identify significant genes for both individual tumour types and the Pan-Cancer collective. An R function ‘hclust’ was used for complete-linkage hierarchical clustering across mutations and samples, and Dendrix³⁰ was used to identify sets of approximately mutual exclusive mutations. Cross-cancer survival analysis was based on the Cox proportional hazards model, as implemented in the R package ‘survival’ (http://cran.r-project.org/web/ packages/survival/), and the sciClone algorithm (http://github.com/genome/sci-clone) generated mutation clusters using point mutations from copy number neutral segments. A complete description of the materials and methods used to generate this data set and its results is provided in the Methods.

References (20 of 38)

Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Dees, N. D. et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
Roth, A. et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–913 (2012).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnol. 31, 213–219 (2013).
Jones, S. et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806 (2008).
Parsons, D. W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008).
Sjo¨blom, T. etal. The consensuscodingsequences of human breast and colorectal cancers. Science 314, 268–274 (2006).
The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008).
Wood, L. D. etal. The genomic landscapesof human breast and colorectal cancers. Science 318, 1108–1113 (2007).
The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).
The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Ellis, M. J. et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353–360 (2012).
The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).
Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).
Downing, J. R. et al. The Pediatric Cancer Genome Project. Nature Genet. 44, 619–622 (2012).

UPDATED 4/24/2020 The genomic landscape of pediatric cancers: Curation of WES/WGS studies shows need for more data

The genomic landscape of pediatric cancers: Implications for diagnosis and treatment

BY E. ALEJANDRO SWEET-CORDERO, JACLYN A. BIEGEL

SCIENCE15 MAR 2019 : 1170-1175

Source: https://science.sciencemag.org/content/363/6432/1170

Abstract

The past decade has witnessed a major increase in our understanding of the genetic underpinnings of childhood cancer. Genomic sequencing studies have highlighted key differences between pediatric and adult cancers. Whereas many adult cancers are characterized by a high number of somatic mutations, pediatric cancers typically have few somatic mutations but a higher prevalence of germline alterations in cancer predisposition genes. Also noteworthy is the remarkable heterogeneity in the types of genetic alterations that likely drive the growth of pediatric cancers, including copy number alterations, gene fusions, enhancer hijacking events, and chromoplexy. Because most studies have genetically profiled pediatric cancers only at diagnosis, the mechanisms underlying tumor progression, therapy resistance, and metastasis remain poorly understood. We discuss evidence that points to a need for more integrative approaches aimed at identifying driver events in pediatric cancers at both diagnosis and relapse. We also provide an overview of key aspects of germline predisposition for cancer in this age group.

Approximately 300,000 children from infancy to age 14 are diagnosed with cancer worldwide every year (1). Some of the cancer types affecting the pediatric population are also seen in adolescents and young adults (AYA), but it has become increasingly clear that cancers in the latter age group have unique biological characteristics that can affect prognosis and therapy (2). Pediatric and AYA cancer patients present with a heterogeneous set of diseases that can be broadly subclassified as leukemias, brain tumors, and non–central nervous system (CNS) solid tumors. These subgroups contain numerous distinct clinical entities, many of which are still poorly characterized from a molecular standpoint.

Recent large-scale genomic analyses have increased our understanding of the genetic drivers of pediatric cancer and have helped to identify new clinically relevant subtypes. These studies have also underscored the distinct nature of the genetic alterations in pediatric and AYA cancers versus adult cancers. Of particular note, the number of somatic mutations in most pediatric cancers is substantially lower than that in adult cancers (3, 4). Exceptions are tumors in children who carry germline mutations that compromise repair of DNA damage (5). For many pediatric cancers, driver events are conditioned on the developmental stage in which the tumor arises. For example, a mutation occurring in one developmental compartment (e.g., a muscle stem cell) may lead to cancer, whereas the same mutation in another compartment does not (6). Pediatric cancer genomes are also characterized by specific patterns of copy number alterations and structural alterations [chromoplexy (7), chromothripsis (8)] that are prognostic indicators in several cancer subtypes. Gene fusion events have long been recognized as oncogenic drivers in many pediatric cancers; however, advanced sequencing technologies have revealed that the number of fusion partners is greater than previously thought, and that previously undetected gene rearrangements may also function as drivers. Finally, germline mutations in a wide spectrum of genes that predispose to cancer appear to play a greater role in pediatric cancer than previously appreciated (9, 10).

Somatic alterations in pediatric cancers

Genome landscape studies

Early large-scale sequencing studies of pediatric cancers identified novel driver genes while also underscoring the overall low mutational burden (11–14). Whole exome sequencing studies of Wilms tumor, T-cell acute lymphoblastic leukemia (TALL), and acute myeloid leukemia (CML) identified some recurring mutations such as

FLT3-IDT
WT1
NUP98-NST1 gene fusion

however many of the driver genes were subtype specific. Other fusion events were seen (by RNASeq) such as

EWS-FL1
Bcr-Abl
MYB-QK1

as well as multiple epigenetic events such as methylations.

REFERENCES

E. Steliarova-Foucher, M. Colombet, L. A. G. Ries, F. Moreno, A. Dolya, F. Bray, P. Hesseling, H. Y. Shin, C. A. Stiller, IICC-3 contributors, International incidence of childhood cancer, 2001-10: A population-based registry study. Lancet Oncol. 18, 719–731 (2017). 10.1016/S1470-2045(17)30186-9pmid:28410997
2. V. Tricoli, D. G. Blair, C. K. Anders, W. A. Bleyer, L. A. Boardman, J. Khan, S. Kummar, B. Hayes-Lattin, S. P. Hunger, M. Merchant, N. L. Seibel, M. Thurin, C. L. Willman, Biologic and clinical characteristics of adolescent and young adult cancers: Acute lymphoblastic leukemia, colorectal cancer, breast cancer, melanoma, and sarcoma. Cancer 122, 1017–1028 (2016). 10.1002/cncr.29871pmid:26849082
3. S. Lawrence, P. Stojanov, P. Polak, G. V. Kryukov, K. Cibulskis, A. Sivachenko, S. L. Carter, C. Stewart, C. H. Mermel, S. A. Roberts, A. Kiezun, P. S. Hammerman, A. McKenna, Y. Drier, L. Zou, A. H. Ramos, T. J. Pugh, N. Stransky, E. Helman, J. Kim, C. Sougnez, L. Ambrogio, E. Nickerson, E. Shefler, M. L. Cortés, D. Auclair, G. Saksena, D. Voet, M. Noble, D. DiCara, P. Lin, L. Lichtenstein, D. I. Heiman, T. Fennell, M. Imielinski, B. Hernandez, E. Hodis, S. Baca, A. M. Dulak, J. Lohr, D.-A. Landau, C. J. Wu, J. Melendez-Zajgla, A. Hidalgo-Miranda, A. Koren, S. A. McCarroll, J. Mora, B. Crompton, R. Onofrio, M. Parkin, W. Winckler, K. Ardlie, S. B. Gabriel, C. W. M. Roberts, J. A. Biegel, K. Stegmaier, A. J. Bass, L. A. Garraway, M. Meyerson, T. R. Golub, D. A. Gordenin, S. Sunyaev, E. S. Lander, G. Getz, G. Getz, Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). 10.1038/nature12213pmid:23770567
B. Vogelstein, N. Papadopoulos, V. E. Velculescu, S. Zhou, L. A. Diaz Jr.., K. W. Kinzler, Cancer genome landscapes. Science 339, 1546–1558 (2013). 10.1126/science.1235122pmid:23539594
5. B. Campbell, N. Light, D. Fabrizio, M. Zatzman, F. Fuligni, R. de Borja, S. Davidson, M. Edwards, J. A. Elvin, K. P. Hodel, W. J. Zahurancik, Z. Suo, T. Lipman, K. Wimmer, C. P. Kratz, D. C. Bowers, T. W. Laetsch, G. P. Dunn, T. M. Johanns, M. R. Grimmer, I. V. Smirnov, V. Larouche, D. Samuel, A. Bronsema, M. Osborn, D. Stearns, P. Raman, K. A. Cole, P. B. Storm, M. Yalon, E. Opocher, G. Mason, G. A. Thomas, M. Sabel, B. George, D. S. Ziegler, S. Lindhorst, V. M. Issai, S. Constantini, H. Toledano, R. Elhasid, R. Farah, R. Dvir, P. Dirks, A. Huang, M. A. Galati, J. Chung, V. Ramaswamy, M. S. Irwin, M. Aronson, C. Durno, M. D. Taylor, G. Rechavi, J. M. Maris, E. Bouffet, C. Hawkins, J. F. Costello, M. S. Meyn, Z. F. Pursell, D. Malkin, U. Tabori, A. Shlien, Comprehensive Analysis of Hypermutation in Human Cancer. Cell 171, 1042–1056.e10 (2017). 10.1016/j.cell.2017.09.048pmid:29056344
6. Chen, A. Pappo, M. A. Dyer, Pediatric solid tumor genomics and developmental pliancy. Oncogene 34, 5207–5215 (2015). 10.1038/onc.2014.474pmid:25639868
S. C. Baca, D. Prandi, M. S. Lawrence, J. M. Mosquera, A. Romanel, Y. Drier, K. Park, N. Kitabayashi, T. Y. MacDonald, M. Ghandi, E. Van Allen, G. V. Kryukov, A. Sboner, J.-P. Theurillat, T. D. Soong, E. Nickerson, D. Auclair, A. Tewari, H. Beltran, R. C. Onofrio, G. Boysen, C. Guiducci, C. E. Barbieri, K. Cibulskis, A. Sivachenko, S. L. Carter, G. Saksena, D. Voet, A. H. Ramos, W. Winckler, M. Cipicchio, K. Ardlie, P. W. Kantoff, M. F. Berger, S. B. Gabriel, T. R. Golub, M. Meyerson, E. S. Lander, O. Elemento, G. Getz, F. Demichelis, M. A. Rubin, L. A. Garraway, Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013). 10.1016/j.cell.2013.03.021pmid:23622249
P. J. Stephens, C. D. Greenman, B. Fu, F. Yang, G. R. Bignell, L. J. Mudie, E. D. Pleasance, K. W. Lau, D. Beare, L. A. Stebbings, S. McLaren, M.-L. Lin, D. J. McBride, I. Varela, S. Nik-Zainal, C. Leroy, M. Jia, A. Menzies, A. P. Butler, J. W. Teague, M. A. Quail, J. Burton, H. Swerdlow, N. P. Carter, L. A. Morsberger, C. Iacobuzio-Donahue, G. A. Follows, A. R. Green, A. M. Flanagan, M. R. Stratton, P. A. Futreal, P. J. Campbell, Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011). 10.1016/j.cell.2010.11.055pmid:21215367
D. W. Parsons, A. Roy, Y. Yang, T. Wang, S. Scollon, K. Bergstrom, R. A. Kerstein, S. Gutierrez, A. K. Petersen, A. Bavle, F. Y. Lin, D. H. López-Terrada, F. A. Monzon, M. J. Hicks, K. W. Eldin, N. M. Quintanilla, A. M. Adesina, C. A. Mohila, W. Whitehead, A. Jea, S. A. Vasudevan, J. G. Nuchtern, U. Ramamurthy, A. L. McGuire, S. G. Hilsenbeck, J. G. Reid, D. M. Muzny, D. A. Wheeler, S. L. Berg, M. M. Chintagumpala, C. M. Eng, R. A. Gibbs, S. E. Plon, Diagnostic Yield of Clinical Tumor and Germline Whole-Exome Sequencing for Children With Solid Tumors. JAMA Oncol. 2, 616 (2016). 10.1001/jamaoncol.2015.5699pmid:26822237
J. Zhang, M. F. Walsh, G. Wu, M. N. Edmonson, T. A. Gruber, J. Easton, D. Hedges, X. Ma, X. Zhou, D. A. Yergeau, M. R. Wilkinson, B. Vadodaria, X. Chen, R. B. McGee, S. Hines-Dowell, R. Nuccio, E. Quinn, S. A. Shurtleff, M. Rusch, A. Patel, J. B. Becksfort, S. Wang, M. S. Weaver, L. Ding, E. R. Mardis, R. K. Wilson, A. Gajjar, D. W. Ellison, A. S. Pappo, C.-H. Pui, K. E. Nichols, J. R. Downing, Germline Mutations in Predisposition Genes in Pediatric Cancer. N. Engl. J. Med. 373, 2336–2346 (2015). 10.1056/NEJMoa1508054pmid:26580448
T. J. Pugh, O. Morozova, E. F. Attiyeh, S. Asgharzadeh, J. S. Wei, D. Auclair, S. L. Carter, K. Cibulskis, M. Hanna, A. Kiezun, J. Kim, M. S. Lawrence, L. Lichenstein, A. McKenna, C. S. Pedamallu, A. H. Ramos, E. Shefler, A. Sivachenko, C. Sougnez, C. Stewart, A. Ally, I. Birol, R. Chiu, R. D. Corbett, M. Hirst, S. D. Jackman, B. Kamoh, A. H. Khodabakshi, M. Krzywinski, A. Lo, R. A. Moore, K. L. Mungall, J. Qian, A. Tam, N. Thiessen, Y. Zhao, K. A. Cole, M. Diamond, S. J. Diskin, Y. P. Mosse, A. C. Wood, L. Ji, R. Sposto, T. Badgett, W. B. London, Y. Moyer, J. M. Gastier-Foster, M. A. Smith, J. M. Guidry Auvil, D. S. Gerhard, M. D. Hogarty, S. J. M. Jones, E. S. Lander, S. B. Gabriel, G. Getz, R. C. Seeger, J. Khan, M. A. Marra, M. Meyerson, J. M. Maris, The genetic landscape of high-risk neuroblastoma. Nat. Genet. 45, 279–284 (2013). 10.1038/ng.2529pmid:23334666
J. R. Downing, R. K. Wilson, J. Zhang, E. R. Mardis, C.-H. Pui, L. Ding, T. J. Ley, W. E. Evans, The Pediatric Cancer Genome Project. Nat. Genet. 44, 619–622 (2012). 10.1038/ng.2287pmid:22641210
St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project, Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas. Nat. Genet. 44, 251–253 (2012). 10.1038/ng.1102pmid:22286216
J. Zhang, L. Ding, L. Holmfeldt, G. Wu, S. L. Heatley, D. Payne-Turner, J. Easton, X. Chen, J. Wang, M. Rusch, C. Lu, S.-C. Chen, L. Wei, J. R. Collins-Underwood, J. Ma, K. G. Roberts, S. B. Pounds, A. Ulyanov, J. Becksfort, P. Gupta, R. Huether, R. W. Kriwacki, M. Parker, D. J. McGoldrick, D. Zhao, D. Alford, S. Espy, K. C. Bobba, G. Song, D. Pei, C. Cheng, S. Roberts, M. I. Barbato, D. Campana, E. Coustan-Smith, S. A. Shurtleff, S. C. Raimondi, M. Kleppe, J. Cools, K. A. Shimano, M. L. Hermiston, S. Doulatov, K. Eppert, E. Laurenti, F. Notta, J. E. Dick, G. Basso, S. P. Hunger, M. L. Loh, M. Devidas, B. Wood, S. Winter, K. P. Dunsmore, R. S. Fulton, L. L. Fulton, X. Hong, C. C. Harris, D. J. Dooling, K. Ochoa, K. J. Johnson, J. C. Obenauer, W. E. Evans, C.-H. Pui, C. W. Naeve, T. J. Ley, E. R. Mardis, R. K. Wilson, J. R. Downing, C. G. Mullighan, The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature 481, 157–163 (2012). 10.1038/nature10725pmid:22237106

Read Full Post »

Posts Tagged ‘The Cancer Genome Atlas (TCGA)’

APRIL 16, 2015

Methods and Findings

Conclusions

Background

Why Was This Study Done?

What Did the Researchers Do and Find?

What Do These Findings Mean?

Additional Information

Share this:

Like this:

Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for Non-Small Cell Lung Cancer

A Comparison of Manual Expert-Curative and an LLM-based analysis of Knowledge Gaps in Non Small Lung Cancer Whole Exome Sequencing Studies and a Use Case Example of Chat GPT 5

Part A: Introduction to LLM, Knowledge Graph software InfraNodus, ChatGPT5 and Background Information on curated material for Test Case

Nodus Labs InfrNodus AI Knowledge Graph Software Tools Allow Text Relationship Visualization and Integrated AI Functionality

Part B. InfraNodus Analysis of manual curation and Knowledge Graph Creation

Generated the following responses using AI to bridge the gap between

Part C: Chat GPT 5 Analysis of Manually Curated Material

Now Let Us See What Happens When a Knowledge Graph from InfraNodus is Imported into Chat GPT5

Part D: Curation entitled Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for Non-Small Cell Lung Cancer originally published on 09/05/2014

Paper 1. Comprehensive genomic characterization of squamous cell lung cancers[1]

Summary:

Makeup of tumor samples

Results:

Gender and Smoking Habits Show different mutational patterns

Whole exome Sequencing and copy number analysis reveal Unique, Candidate Driver Genes

Paper 2. A Genomics-Based Classification of Human Lung Tumors[2]

Paper Summary

Results

Conclusions:

Paper 3. Genomic Landscape of Non-Small Cell Lung Cancer in Smokers and Never-Smokers[3]

Methodology

Results

Table. Validated Gene-Fusions Obtained from Ref-Seq Data

† Author’s Note:

Table. Significantly mutated genes (point mutations, insertions/deletions) with associated function.

Table. Literature Analysis of pathways containing significantly altered genes in NSCLC reveal putative targets and risk factors, linkage between other tumor types, and research areas for further investigation.

please click on graph for larger view

Paper 4. Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing[9]

Highlights

Summary

Paper 5. Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer[10]

Highlights

Curator Graphical Summary of Interesting Findings From the Above Studies

High Yield of RNA Sequencing for Targetable Kinase Fusions in Lung Adenocarcinomas with No Mitogenic Driver Alteration Detected by DNA Sequencing and Low Tumor Mutation Burden

Abstract

Translational Relevance

Wake Up and Smell the Fusions: Single-Modality Molecular Testing Misses Drivers

Abstract

Share this:

Like this:

Cancer Mutations Across the Landscape

UPDATED 4/24/2020 The genomic landscape of pediatric cancers: Curation of WES/WGS studies shows need for more data

Mutational landscape and significance across 12 major cancer types

Introduction

Standardization of mutation data

Distinct mutation frequencies and sequence context

Significantly mutated genes

Common mutations

Figure 5 | Driver initiation and progression mutations and tumour clonal mutation is in the subclone

Survival Analysis

Driver mutations and tumour clonal architecture

References (20 of 38)

UPDATED 4/24/2020 The genomic landscape of pediatric cancers: Curation of WES/WGS studies shows need for more data

Somatic alterations in pediatric cancers

Genome landscape studies

Share this:

Like this:

Follow Blog via Email

Recent Posts

Archives

Categories

Meta