Healthcare analytics, AI solutions for biological big data, providing an AI platform for the biotech, life sciences, medical and pharmaceutical industries, as well as for related technological approaches, i.e., curation and text analysis with machine learning and other activities related to AI applications to these industries.
2,000 human brains yield clues to how genes raise risk for mental illnesses
Reporter: Irina Robu, PhD
It’s one thing to detect sites in the genome associated with mental disorders; it’s quite another to discover the biological mechanisms by which these changes in DNA work in the human brain to boost risk. In their first concerted effort to tackle the problem, 15 collaborating research teams of the National Institutes of Health-funded PsychENCODE Consortium evaluated data of 2000 human brains which might yield clues to how genes raise risk for mental illnesses.
Applying newly uncovered secrets of the brain’s molecular architecture, they established an artificial intelligence model that is six times better than preceding ones at predicting risk for mental disorders. They also identified several hundred previously unknown risk genes for mental illnesses and linked many known risk variants to specific genes. In the brain tissue and single cells, the researchers identified patterns of gene expression, marks in gene regulation as well as genetic variants that can be linked to mental illnesses.
Dr. Nenad Sestan of Yale University explained that “ the consortium’s integrative genomic analyses elucidate the mechanisms by which cellular diversity and patterns of gene expression change throughout development and reveal how neuropsychiatric risk genes are concentrated into distinct co-expression modules and cell types”. The implicated variants are typically small-effect genetic variations that fall within regions of the genome that don’t code for proteins, but instead are thought to regulate gene expression and other aspects of gene function.
In addition to the 2000 postmortem human brains, researchers examined brain tissue from prenatal development as well as people with schizophrenia, bipolar disorder, and typical development compared findings with parallel data from non-human primates. Their findings indicate that gene variants linked to mental illnesses exert more effects when they jointly form “modules”, communicating genes with related functions and at specific developmental time points that seem to coincide with the course of illness. Variability in risk gene expression and cell types increases during formative stages in early prenatal development and again during the teen years. However, in postmortem brains of people with a mental illness, thousands of RNAs were found to have anomalies.
According to NIMH, Geetha Senthil the multi-omic data resource caused by the PsychENCODE collaboration will pave a path for building molecular models of disease and developmental processes and may offer a platform for target identification for pharmaceutical research.
Agilent was created as a spin off from Hewlett-Packard Company in 1999.
Agilent Technologies Inc. is engaged in the life sciences, diagnostics and applied chemical markets. The Company provides application focused solutions that include instruments, software, services and consumables for the entire laboratory workflow. The Company has three business segments:
the life sciences and applied markets business,
the diagnostics and genomics business, and
the Agilent Cross Lab business
The Company’s life sciences and applied markets business segment brings together the Company’s analytical laboratory instrumentation and informatics.
The Company’s diagnostics and genomics business segment consists of three businesses: the Dako business, the genomics business and the nucleic acid solutions business.
The Company’s Agilent Cross Lab business segment combines its analytical laboratory services and consumables business
CARPINTERIA, Calif.–(BUSINESS WIRE)–Dako, an Agilent Technologies company and a worldwide provider of cancer diagnostics, today announced the U.S. Food and Drug Administration has approved a new test that can identify PD-L1 expression levels on the surface of non-small cell lung cancer tumor cells and provide information on the survival benefit with OPDIVO® (nivolumab) for patients with non-squamous NSCLC.
Argentina | Australia | Austria | Brazil | Canada |Chile | China | Colombia | Czech Republic | Denmark | Ecuador | Finland | Germany |Hong Kong | Israel | Italy | Japan | Korea | Malaysia | Mexico | New Zealand | Norway | Paraguay | Peru| Philippines | Poland | Romania | Singapore | South Africa | Spain | Sweden |Switzerland | Taiwan ROC | Thailand | Turkey | United Kingdom | Uruguay | Vietnam
Gen9 is building on advances in synthetic biology to power a scalable fabrication capability that will significantly increase the world’s capacity to produce DNA content. The privately held company’s next-generation gene synthesis technology allows for the high-throughput, automated production of DNA constructs at lower cost and higher accuracy than previous methods on the market. Founded by world leaders in synthetic biology, Gen9 aims to ensure the constructive application of synthetic biology in industries ranging from enzyme and chemical production to pharmaceuticals and biofuels.
SERVICES
Synthetic Biology
Gene Synthesis Services
Variant Libraries
Gene Sequence Design Services
INVESTORS
Agilent Technologies : Private Equity
CAMBRIDGE, Mass. and SANTA CLARA, Calif. — April 24, 2013 —Gen9 Receives $21 Million Strategic Investment from Agilent Technologies
GenScript is the largest gene synthesis provider in the USA
GenScript Corporation, a biology contract research organization, provides biological research and drug discovery services to pharmaceutical companies, biotech firms, and research institutions in the United States, Europe, and Japan. It offers bio-reagent, custom molecular biology, custom peptide, protein production, custom antibody production, drug candidates testing, assay development and screening, lead optimization, antibody drug development, gene synthesis, and assay-ready cell line production services.
The company also offers molecular biology, peptide, protein, immunoassay, chemicals, and cell biology products. It offers its products through distributors in Tokyo, Japan; and Seoul, Korea. GenScript Corporation has a strategic partnership with Immunologix, Inc. The company was founded in 2002 and is based in Piscataway, New Jersey. It has subsidiaries in France, Japan, and China.
Note: As of October 24, 2011, Immunologix, Inc. was acquired by Intrexon Corporation. Immunologix, Inc. develops and produces antibody-based therapeutics for various biological targets. It produces human monoclonal antibodies against viral, bacterial, and tumor antigens, as well as human auto antigens.
Intrexon Corporation, founded in 1998, is a leader in synthetic biology focused on collaborating with companies in Health, Food, Energy, Environment and Consumer sectors to create biologically based products that improve quality of life and the health of the planet.
PRODUCTS AND SERVICES
Gene synthesis
Antibody services
Protein Services
Peptide services
INVESTORS
Note: The Balloch Group (‘TBG’) was established in 2001 by Howard Balloch (Canada‘s ambassador to China from 1996 to 2001). TBG has since grown from a market-entry consultancy working with North American clients in China to a leading advisory and merchant banking firm serving both domestic Chinese companies and multinational corporations. TBG was ranked as the number one boutique investment bank in China by ChinaVenture in 2008.
Monica Heger : SAN FRANCISCO (GenomeWeb) – Illumina today announced two new next-generation sequencing platforms, a targeted sequencing system called MiniSeq and a semiconductor sequencer that is still under development.
Illumina disclosed the initiatives during a presentation at the JP Morgan Healthcare conference held here today. During the presentation, Illumina CEO Jay Flatley also announced a new genotyping array called Infinium XT; a partnership with Bio-Rad to develop a single-cell sequencing workflow; preliminary estimates of its fourth-quarter 2015 revenues; and an update on existing products. The presentation followed the company’s announcement on Sunday that it has launched a new company called Grail to develop a next-generation sequencing test for early cancer detection from patient blood samples.
The MiniSeq system, which is based on Illumina’s current sequencing technology, will begin shipping early this quarter and has a list price of $49,500. It can perform a variety of targeted DNA and RNA applications, from single-gene to pathway sequencing, and promises “all-in” prices, including library prep and sequencing, of $200 to $300 per sample, Flatley said during the JP Morgan presentation.
Integrated DNA Technologies, Inc. (IDT), the global leader in nucleic acid synthesis, serving all areas of life sciences research and development, offers products for a broad range of genomics applications. IDT’s primary business is the production of custom, synthetic nucleic acids for molecular biology applications, including qPCR, sequencing, synthetic biology, and functional genomics. The company manufactures and ships an average of 44,000 custom nucleic acids per day to more than 82,000 customers worldwide. For more information, visit idtdna.com.
Dyes GMP for Molecular Diagnostics Large Scale Oligo Synthesis
Note : Skokie, IL – December 1, 2015. Integrated DNA Technologies Inc. (“IDT”), the global leader in custom nucleic acid synthesis, has entered into a definitive agreement to acquire the oligonucleotide synthesis business of AITbiotech Pte. Ltd. in Singapore (“AITbiotech”). With this acquisition, IDT expands its customer base across Southeast Asia making it possible for these additional customers to now have access to its broad range of products for genomic applications. AITbiotech will continue operations in its other core business areas.
With over 20 years of experience in oligonucleotide development and production, and over 1000 sequences manufactured, Avecia has played an integral role in the advancing oligo therapeutic market. Our mission is to continue to build value for our customers, as they progress through drug development into commercialization. And as a member of the Nitto Denko Corporation (nitto.com), Avecia is committed to the future of the oligonucleotide market. We are driven by innovative ideas and flexible solutions, designed to provide our customers with the best in service, quality, and technology.
OriGene Technologies, Inc. develops, manufactures, and sells genome wide research and diagnostic products for pharmaceutical, biotechnology, and academic research applications. The company offers cDNA clones, including TrueORF cDNA, viral ORF, destination vectors, TrueClones (human), TrueClones (mouse), organelle marker plasmids, MicroRNA tools, mutant and variant clones, plasmid purification kits, transfection reagents, and gene synthesis service; and HuSH shRNA, siRNA, miRNA, qPCR reagents, plasmid purification products, transfection reagents, PolyA+ and total RNA products, first-strand cDNA synthesis, and CRISPR/Cas9 genome products. It also provides proteins and lysates, such as purified human proteins, over-expression cell lysates, mass spectrometry standard proteins, and protein purification reagents; UltraMAB IHC antibodies, TrueMAB primary antibodies, anti-tag and fluorescent proteins, ELISA antibodies, luminex antibodies, secondary antibodies, and controls and others; and anatomic pathology products, including IHC antibodies, detection systems, and IHC accessories
The company offers luminex and ELISA antibody pairs, autoantibody profiling arrays, ELISA kits, cell assay kits, assay reagents, custom development, and fluorogenic cell assays; TissueFocus search tools; tissue sections; tissue microarrays, cancer protein lysate arrays, TissueScan cDNA arrays, tissue blocks, and quality control products, as well as tissue RNA, DNA, and protein lysates; and lab essentials. Its research areas include cancer biomarker research, RNAi, pathology IHC, stem cell research, ion channels, and protein kinase products. The company provides gene synthesis and molecular biology services, genome editing, custom cloning, custom shRNA, purified protein, monoclonal antibody development, and assay development. It sells its products through distributors worldwide, as well as online. OriGene Technologies, Inc. was incorporated in 1995 and is based in Rockville, Maryland.
Louis, MO – November 18, 2015 Merck KGaA, Darmstadt, Germany, Completes Sigma-Aldrich Acquisition
Merck KGaA today announced the completion of its $17 billion acquisition of Sigma-Aldrich, creating one of the leaders in the $130 billion global industry to help solve the toughest problems in life science.
Press Release: 18-Nov-2015
Letter to our Life Science Customers from Dr. Udit Batra
The life science business of Merck KGaA, Darmstadt, Germany brings together the world-class products and services, innovative capabilities and exceptional talent of EMD Millipore and Sigma-Aldrich to create a global leader in the life science industry.
“Everything we do starts with our shared purpose – to solve the toughest problems in life science by collaborating with the global scientific community.
This combination is built on complementary strengths, which will enable us to serve you even better as one organization than either company could alone.
This means providing a broader portfolio with a catalog of more than 300,000 products, including many of the most respected brands in the industry, greater geographic reach, and an unmatched combination of industry-leading capabilities.”
Thermo Fisher Scientific Inc. is a provider of analytical instruments, equipment, reagents and consumables, software and services for research, manufacturing, analysis, discovery and diagnostics. The company operates through four segments: Life Sciences Solutions, provides reagents, instruments and consumables used in biological and medical research, discovery and production of new drugs and vaccines as well as diagnosis of disease; Analytical Instruments, provides instruments, consumables, software and services that are used in the laboratory; Specialty Diagnostics, offers diagnostic test kits, reagents, culture media, instruments and associated products, and Laboratory Products and Services, offers self-manufactured and sourced products for the laboratory.
WALTHAM, Mass. & SANTA CLARA, Calif.–(BUSINESS WIRE)–Jan. 8, 2016– Thermo Fisher Scientific Inc. (NYSE:TMO), the world leader in serving science, and Affymetrix Inc. (NASDAQ:AFFX), a leading provider of cellular and genetic analysis products, today announced that their boards of directors have unanimously approved Thermo Fisher’s acquisition of Affymetrix for $14.00 per share in cash. The transaction represents a purchase price of approximately $1.3 billion.
The ability to sequence a person’s entire genome has led many researchers to hunt for the genetic causes of certain diseases. But without a larger set of genomes to compare mutations against, putting these variations into context is difficult. An international group of researchers has banked the full exomes of 60,706 individuals in a database called the Exome Aggregation Consortium (ExAC). The team’s analaysis, posted last month (October 30) on the preprint server bioRxiv, was presented at the Genome Science 2015 conference in Birmingham, U.K. (September 7).
Led by Daniel MacArthur from the Broad Institute of MIT and Harvard, the research team collected exomes from labs around the world for its dataset. “The resulting catalogue of human genetic diversity has unprecedented resolution,” the authors wrote in their preprint. Many of the variants observed in the dataset occurred only once.
“This is one of the most useful resources ever created for medical testing for genetic disorders,” Heidi Rehm, a clinical lab director at Harvard Medical School, told Science News.
Among other things, the team found 3,230 genes that are highly conserved across exomes, indicating likely involvement in critical cellular functions. Of these, 2,557 are not associated with diseases. The authors hypothesized that these genes, if mutated, either lead to embryonic death—before a problem can be diagnosed—or cause rare diseases that have not yet been genetically characterized.
“We should soon be able to say, with high precision: If you have a mutation at this site, it will kill you. And we’ll be able to say that without ever seeing a person with that mutation,” MacArthur said during his Genome Science talk, according to The Atlantic.
This is not the complete set of essential genes in the human body, David Goldstein, a geneticist at Columbia University in New York City, pointed out to Nature. Only by studying more exomes will researchers be able to refine that number, he noted.
Analysis of protein-coding genetic variation in 60,706 humans
Exome Aggregation Consortium, MonkolLek, KonradKarczewski, EricMinikel, KaitlinSamocha, et al.
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities. The resulting catalogue of human genetic diversity has unprecedented resolution, with an average of one variant every eight bases of coding sequence and the presence of widespread mutational recurrence. The deep catalogue of variation provided by the Exome Aggregation Consortium (ExAC) can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 79% of which have no currently established human disease phenotype. Finally, we show that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human knockout variants in protein-coding genes.
Over the last five years, the widespread availability of high-throughput DNA sequencing technologies has permitted the sequencing of the whole genomes or exomes (the 18 protein-coding regions of genomes) of over half a million humans. In theory, these data represent a powerful source of information about the global patterns of human genetic variation, but in practice, are difficult to access for practical, logistical, and ethical reasons; in addition, the inconsistent processing complicates variant-calling pipelines used by different groups. Current publicly available datasets of human DNA sequence variation contain only a small fraction of all sequenced samples: the Exome Variant Server, created as part of the NHLBI Exome Sequencing Project (ESP)1, contains frequency information spanning 6,503 exomes; and the 1000 Genomes (1000G) Project, which includes individual-level genotype data from whole-genome and exome sequence data for 2,504 individuals2.
Databases of genetic variation are important for our understanding of human population history and biology1–5, but also provide critical resources for the clinical interpretation of variants observed in patients suffering from rare Mendelian diseases6,7. The filtering of candidate variants by frequency in unselected individuals is a key step in any pipeline for the discovery of causal variants in Mendelian disease patients, and the efficacy of such filtering depends on both the size and the ancestral diversity of the available reference data.
Here, we describe the joint variant calling and analysis of high-quality variant calls across 60,706 human exomes, assembled by the Exome Aggregation Consortium (ExAC; exac.broadinstitute.org). This call set exceeds previously available exome-wide variant databases by nearly an order of magnitude, providing unprecedented resolution for the analysis of very low-frequency genetic variants. We demonstrate the application of this data set to the analysis of patterns of genetic variation including the discovery of widespread mutational recurrence, the inference of gene-level constraint against 10 truncating variation, the clinical interpretation of variation in Mendelian disease genes, and the discovery of human “knockout” variants in protein-coding genes.
…..
Deleterious variants are expected to have lower allele frequencies than neutral ones, due to negative selection. This theoretical property has been demonstrated previously in human population sequencing data18,19 and here (Figure 1d, Figure 1e). This allows inference of the degree of natural selection against specific functional classes of variation: however, mutational recurrence as described above indicates that allele frequencies observed in ExAC-scale samples are also skewed by mutation rate, with 10 more mutable sites less likely to be singletons (Figure 2c and Extended Data Figure 4d). Mutation rate is in turn non-uniformly distributed across functional classes – for instance, stop lost mutations can never occur at CpG dinucleotides (Extended Data Figure 4e). We corrected for mutation rates (Supplementary Information) by creating a mutability-adjusted proportion singleton (MAPS) metric. This metric reflects (as expected) strong selection against predicted PTVs, as well as missense variants predicted by conservation-based methods to be deleterious (Figure 2e).
The deep ascertainment of rare variation in ExAC also allows us to infer the extent of 19 selection against variant categories on a per-gene basis by examining the proportion of 20 variation that is missing compared to expectations under random mutation. Conceptually similar approaches have been applied to smaller exome datasets13,20 but have been underpowered, particularly for the analysis of depletion of PTVs. We compared the observed number of rare (MAF <0.1%) variants per gene to an expected number derived from a selection neutral, sequence-context based mutational model13. The model performs extremely well in predicting the number of synonymous variants, which should be under minimal purifying selection, per gene (r = 0.98; Extended Data Figure 5).
……
Critically, we note that LoF-intolerant genes include virtually all known severe haploinsufficient human disease genes (Figure 3b), but that 79% of LoF-intolerant genes have not yet been assigned a human disease phenotype despite the clear evidence for extreme selective constraint (Supplementary Information 4.11). These likely represent either undiscovered severe dominant disease genes, or genes in which loss of a single copy results in embryonic lethality.
The most highly constrained missense (top 25% missense Z scores) and PTV (pLI ≥0.9) genes show higher expression levels and broader tissue expression than the least constrained genes24 (Figure 3c). These most highly constrained genes are also depleted for eQTLs (p < 10-9 for missense and PTV; Figure 3d), yet are enriched within genome-wide significant trait-associated loci (χ2 p < 10-14, Figure 3e). Intuitively, genes intolerant of PTV variation are dosage sensitive: natural selection does not tolerate a 50% deficit in expression due to the loss of single allele. It is therefore unsurprising that these genes are also depleted of common genetic variants that have a large enough effect on expression to be detected as eQTLs with current limited sample sizes. However, smaller changes in the expression of these genes, through weaker eQTLs or functional variants, are more likely to contribute to medically relevant phenotypes. Therefore, highly constrained genes are dosage-sensitive, expressed more broadly across tissues (as expected for core cellular processes), and are enriched for medically relevant variation.
Finally, we investigated how these constraint metrics would stratify mutational classes according to their frequency spectrum, corrected for mutability as in the previous section (Figure 3f). The effect was most dramatic when considering stop-gained variants in the LoF-intolerant set of genes. For missense variants, the missense Z score offers information additional to Polyphen2 and CADD classifications, indicating that gene-level measures of constraint offer additional information to variant-level metrics in assessing potential pathogenicity.
We assessed the value of ExAC as a reference dataset for clinical sequencing approaches, which typically prioritize or filter potentially deleterious variants based on functional consequence and allele frequency6. To simulate a Mendelian variant analysis, we filtered variants in 100 ExAC exomes per continental population against ESP (the previous default reference data set for clinical analysis) or the remainder of ExAC, removing variants present at ≥0.1% allele frequency, a filter recommended for dominant 16 disease variant discovery6. Filtering on ExAC reduced the number of candidate protein-altering variants by 7-fold compared to ESP, and was most powerful when the highest 18 allele frequency in any one population (“popmax”) was used rather than average (“global”) allele frequency (Figure 4a). ESP is not well-powered to filter at 0.1% AF without removing many genuinely rare variants, as AF estimates based on low allele counts are both upward-biased and imprecise (Figure 4b). We thus expect that ExAC will provide a very substantial boost in the power and accuracy of variant filtering in Mendelian disease projects.
…….
The above curation efforts confirm the importance of allele frequency filtering in analysis of candidate disease variants. However, literature and database errors are prevalent even at lower allele frequencies: the average ExAC exome contains 0.89 reportedly Mendelian variants in well-characterized dominant disease genes at <1% popmax AF and 0.20 at <0.1% popmax AF. This inflation likely results from a combination of false reports of pathogenicity and incomplete penetrance, as we show for PRNP in the accompanying work [Minikel et al, submitted]. The abundance of rare functional variation in many disease genes in ExAC is a reminder that such variants should not be assumed to be causal or highly penetrant without careful segregation or case-control analysis28,7.
We investigated the distribution of PTVs, variants predicted to disrupt protein-coding genes through the introduction of a stop codon or frameshift or the disruption of an essential splice site; such variants are expected to be enriched for complete loss-of-function of the impacted genes. Naturally-occurring PTVs in humans provide a model for the functional impact of gene inactivation, and have been used to identify many genes in 6 which LoF causes severe disease31, as well as rare cases where LoF is protective against disease32.
Among the 7,404,909 HQ variants in ExAC, we found 179,774 high-confidence PTVs (as 10 defined in Supplementary Information Section 6), 121,309 of which are singletons. This 11 corresponds to an average of 85 heterozygous and 35 homozygous PTVs per individual (Figure 5a). The diverse nature of the cohort enables the discovery of substantial numbers of novel PTVs: out of 58,435 PTVs with an allele count greater than one, 33,625 occur in only one population. However, while PTVs as a category are extremely rare, the majority of the PTVs found in any one person are common, and each individual 16 has only ~2 singleton PTVs, of which 0.14 are found in PTV-constrained genes (pLI 17 >0.9). The site frequency spectrum of these variants across the populations represented in ExAC recapitulates known aspects of demographic models, including an increase in intermediate-frequency (1%-5%) PTVs in Finland33 and relatively common (>0.1%) PTVs in Africans (Figure 5b).
……
Discussion Here we describe the generation and analysis of the most comprehensive catalogue of 29 human protein-coding genetic variation to date, incorporating high-quality exome sequencing data from 60,706 individuals of diverse geographic ancestry. The resulting call set provides unprecedented resolution for the analysis of very low-frequency protein-coding variants in human populations, as well as a powerful resource for the clinical interpretation of genetic variants observed in disease patients. The complete frequency CC-BY-ND 4.0 International license for this preprint is the author/funder. It is made available under a bioRxiv preprint first posted online October 30, 2015; http://dx.doi.org/10.1101/030338 ; The copyright holder and annotation data from this call-set has been made freely available through a public website [exac.broadinstitute.org]
The ExAC resource provides the largest database to date for the estimation of allele frequency for protein-coding genetic variants, providing a powerful filter for analysis of candidate pathogenic variants in severe Mendelian diseases. Frequency data from ESP1 have been widely used for this purpose, but those data are limited by population diversity and by resolution at allele frequencies ≤0.1%. ExAC therefore provides 21 substantially improved power for Mendelian analyses, although it is still limited in power at lower allele frequencies, emphasizing the need for more sophisticated pathogenic variant filtering strategies alongside on-going data aggregation efforts. ExAC also highlights an unexpected tolerance of many disease genes to functional variation, and reveals that the literature and public databases contain an inflated number of reportedly pathogenic variants across the frequency spectrum, indicating a need for stringent criteria for assertions of pathogenicity.
Finally, we show that different populations confer different advantages in the discovery of gene-disrupting PTVs, providing guidance for projects seeking to identify human “knockouts” to understand gene function. Individuals of African ancestry have more PTVs (140 on average), with this enrichment most pronounced at allele frequencies above 1% (Figure 5b). Finnish individuals, as a result of a population bottleneck, are depleted at the lowest (<0.1%) allele frequencies but have a peak in frequency at 1-5% (Figure 5b). However, these differences are diminished when considering only LoF-constrained (pLI > 0.9) genes (Extended Data Figure 10). Sampling multiple populations would likely be a fruitful strategy for a researcher investigating common PTV variation. However, discovery of homozygous PTVs is markedly enhanced in the South Asia samples, which come primarily from a Pakistani cohort with 38.3% of individuals self- reporting as having closely related parents, emphasizing the extreme value of consanguineous cohorts for “human knockout” discovery (Figure 5d) [Saleheen et al., to 8 be co-submitted].
…..
While the ExAC dataset dramatically exceeds the scale of previously available frequency reference datasets, much remains to be gained by further increases in sample size. Indeed, the fact that even the rarest transversions have mutational rates13 on the order of 1 x 10-9 implies that almost all possible non-lethal SNVs likely exist in some person on Earth. ExAC already includes >70% of all possible protein-coding CpG transitions at well-covered sites; order of magnitude increases in sample size will eventually lead to saturation of other classes of variation.
Why Does Cytotoxic Chemotherapy Still Remain a Mainstay in Many Chemotherapeutic Regimens? [6.1.1]
Reporter: Stephen J. Williams, Ph.D.
At the 2015 AACR National Meeting, Drs. Anthony Letai, Dr. Michael Hermann, Dr. Rene Bernards, and Dr. Guido Kroemer gave The 2015 Stanley J. Korsmeyer Memorial Symposium: Cell Death and Cancer Therapy: Why Has Conventional Chemotherapy Been So Successful?
Cytotoxic chemotherapy, for which the mechanism of action is centered on the ability of the drug to kill a cell by either necrosis, genotoxic, apoptosis, or autophagy mechanisms rather than just halting cell growth, is still, in this era of personalized and cytostatic therapies, is still a mainstay in many treatment regimens for a majority of cancers. Treatment regimens such as MOPP (mechlorethamine, Oncovin, procarbazine, prednisone), CMF (cyclophosphamide, methotrexate, 5-fluorouracil) , carboplatin with taxol, and even with personalized therapies, which usually are given in combination with a cytotoxic agent. However treatment regimens containing these cytotoxic chemotherapeutics show some of the best survival rates. The abstract for the Symposium is given below:
In this current era of precisely targeted therapies and –omics technologies, it is often forgotten that no medical therapy has cured, and continues to cure, more people of cancer than conventional chemotherapy. Notwithstanding its superior performance across many cancer types, the mechanism of the therapeutic index of conventional agents, largely targeting ubiquitous elements like DNA and microtubules, is poorly understood. The textbook explanation of conventional chemotherapy’s working by killing supposedly rapidly dividing cancer cells lacks clinical evidence and flies in the face of many obvious clinical counter-examples. In the session,m the speakers will describe how conventional cytotoxic chemotherapy preferentially kills cancer cells. Moreover, they will describe how clinical response to chemotherapy might be better predicted.
This post is presented as the speakers titles and a brief curation of their papers related to the subject matter.
Anthony G. Letai, Dana-Farber Cancer Institute, Boston, MA. Conventional chemotherapy cures people by exploiting apoptotic priming.
Conventional chemotherapy has an amazing track record that is often under-appreciated in today’s world of genomics and targeted pathway inhibitors. Conventional chemotherapy is responsible for curing millions of cancer patients over the past 5 decades. That is, millions of patients have presented to their doctors with an otherwise fatal malignancy, were given a finite course of chemotherapy (largely DNA and microtubule perturbing agents) and had their cancer eradicated, never to return. Perhaps as remarkable as the magnitude of the achievement of conventional chemotherapy is the magnitude of our ignorance of why it should ever work, and why it works far better in some tumors than in others. Textbook explanations rely on concepts of differential proliferation rates in cancers that are incompletely supported in the clinical literature. Successful chemotherapy treatments usually kill via the mitochondrial pathway of apoptosis. We have found that simple functional measurements of the pre-treatment state of the tumor cell can be rapidly made with BH3 profiling. These measurements demonstrate that a major, if not the major, reason for a therapeutic index for cancer chemotherapy is that chemo-sensitive cancer cells are simply more primed for apoptosis than normal cells. Moreover, apoptotic priming can be measured to make clinical predictions regarding quality of response on an individualized basis. Enhancing pretreatment priming of cancer cells with selectively acting targeted agents is a promising strategy to extend the demonstrated curative power of conventional chemotherapy.
Triona Ni Chonghaile, Justine E. Roderick, Cian Glenfield, Jeremy Ryan, Stephen E. Sallan, Lewis B. Silverman, Mignon L. Loh, Stephen P. Hunger, Brent Wood, Daniel J. DeAngelo, Richard Stone, Marian Harris, Alejandro Gutierrez, Michelle A. Kelliher, Anthony Letai
Cancer Discov. Author manuscript; available in PMC 2015 March 1.
Published in final edited form as: Cancer Discov. 2014 September; 4(9): 1074–1087. Published online 2014 July 3. doi: 10.1158/2159-8290.CD-14-0353
Sidong Huang, Michael Hölzel, Theo Knijnenburg, Andreas Schlicker, Paul Roepman, Ultan McDermott, Mathew Garnett, Wipawadee Grernrum, Chong Sun, Anirudh Prahallad, Floris H. Groenendijk, Lorenza Mittempergher, Wouter Nijkamp, Jacques Neefjes, Ramon Salazar, Peter ten Dijke, Hidetaka Uramoto, Fumihiro Tanaka, Roderick L. Beijersbergen, Lodewyk F.A. Wessels, René Bernards
Cell. Author manuscript; available in PMC 2013 June 5.
Published in final edited form as: Cell. 2012 November 21; 151(5): 937–950.
Floris H Groenendijk, Wouter W Mellema, Eline van der Burg, Eva Schut, Michael Hauptmann, Hugo M Horlings, Stefan M Willems, Michel M van den Heuvel, Jos Jonkers, Egbert F Smit, René Bernards
Int J Cancer. 2015 March 15; 136(6): 1434–1444. Published online 2014 August 1.
Prashanth Kumar Bajpe, Guus J. J. E. Heynen, Lorenza Mittempergher, Wipawadee Grernrum, Iris A. de Rink, Wouter Nijkamp, Roderick L. Beijersbergen, Rene Bernards, Sidong Huang
Andreas I Papadakis, Chong Sun, Theo A Knijnenburg, Yibo Xue, Wipawadee Grernrum, Michael Hölzel, Wouter Nijkamp, Lodewyk FA Wessels, Roderick L Beijersbergen, Rene Bernards, Sidong Huang
Cell Res. 2015 April; 25(4): 445–458. Published online 2015 February 6.
Katherine Stemke-Hale, Ana Maria Gonzalez-Angulo, Ana Lluch, Richard M. Neve, Wen-Lin Kuo, Michael Davies, Mark Carey, Zhi Hu, Yinghui Guan, Aysegul Sahin, W. Fraser Symmans, Lajos Pusztai, Laura K. Nolden, Hugo Horlings, Katrien Berns, Mien-Chie Hung, Marc J. van de Vijver, Vicente Valero, Joe W. Gray, René Bernards, Gordon B. Mills, Bryan T. Hennessy
Cancer Res. Author manuscript; available in PMC 2009 August 1.
Published in final edited form as: Cancer Res. 2008 August 1; 68(15): 6084–6091.
Brigitte C. Widemann, Maria T. Acosta, Sylvia Ammoun, Allan J. Belzberg, Andre Bernards, Jaishri Blakeley, Antony Bretscher, Karen Cichowski, D. Wade Clapp, Eva Dombi, Gareth D. Evans, Rosalie Ferner, Cristina Fernandez-Valle, Michael J. Fisher, Marco Giovannini, David H. Gutmann, C. Oliver Hanemann, Robert Hennigan, Susan Huson, David Ingram, Joe Kissil, Bruce R. Korf, Eric Legius, Roger J. Packer, Andrea I McClatchey, Frank McCormick, Kathryn North, Minja Pehrsson, Scott R. Plotkin, Vijaya Ramesh, Nancy Ratner, Susann Schirmer, Larry Sherman, Elizabeth Schorry, David Stevenson, Douglas R. Stewart, Nicole Ullrich, Annette C. Bakker, Helen Morrison
Am J Med Genet A. Author manuscript; available in PMC 2014 September 1.
Published in final edited form as: Am J Med Genet A. 2014 March; 0(3): 563–578. Published
Ben S. Wittner, Dennis C. Sgroi, Paula D. Ryan, Tako J. Bruinsma, Annuska M. Glas, Anitha Male, Sonika Dahiya, Karleen Habin, Rene Bernards, Daniel A. Haber, Laura J. Van’t Veer, Sridhar Ramaswamy Clin Cancer Res. Author manuscript; available in PMC 2011 May 7.
Guido Kroemer, INSERM U848- Institute Gustave-Roussy, Villejuif, France. A hallmark of successful cancer therapies: Reinstatement of immunosurvelliance.
CRISPR/Cas9 Finds Its Way As an Important Tool For Drug Discovery & Development
UPDATED 6/11/2021
CRISPR Diagnostics: CRISPR-dx Comes of Age: Tool in Drug Development
The past five years has seen a rapid expansion of the ability of CRISPR based tools toward diagnostic testing. Recently, CRISPR has been used to detect SARS-CoV-2 in patients. An article in the journal Science describes the different classes of CRISPR diagnostics in use today .
Update near end of post
UPDATED 8/08/2020
Association to Causation: Using GWAS to Identify Druggable Targets
A Gen Webinar Thursday, August 6, 2020; 11:00am – 12:30pm EST
See at end of post
Curator: Stephen J. Williams, Ph.D.
Article 21.4.8- CRISPRCas9 Finds Its Way As an Important Tool For Drug Discovery & Development
CRISPR/Cas9 Finds Its Way As an Important Tool For Drug Discovery & Development, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 2: CRISPR for Gene Editing and DNA Repair
The RNA-guided Cas9 nuclease from the microbial clustered regularly interspaced short palindromic repeats (CRISPR)adaptive immune system can be used to facilitate efficient genome engineering in eukaryotic cells by simply specifying a 20-nt targeting sequence within its guide RNA.
CRISPR/Cas systems are part of the adaptive immune system of bacteria and archaea, protecting them against invading nucleic acids such as viruses by cleaving the foreign DNA in a sequence-dependent manner. Although CRISPR arrays were first identified in the Escherichia coli genome in 1987 (Ishino et al., 1987), their biological function was not understood until 2005, when it was shown that the spacers were homologous to viral and plasmid sequences suggesting a role in adaptive immunity (Bolotin et al., 2005; Mojica et al., 2005; Pourcel et al., 2005). Two years later, CRISPR arrays were confirmed to provide protection against invading viruses when combined with Cas genes (Barrangou et al., 2007). The mechanism of this immune system based on RNA-mediated DNA targeting was demonstrated shortly thereafter (Brounset al., 2008; Deltcheva et al., 2011; Garneau et al., 2010; Marraffini and Sontheimer, 2008).
Jennifer Doudna, PhD Professor of Molecular and Cell Biology and Chemistry, University of California, Berkeley Investigator, Howard Hughes Medical Institute has recently received numerous awards and accolades for the discovery of CRISPR/Cas9 as a tool for mammalian genetic manipulation as well as her primary intended research target to understand bacterial resistance to viral infection.
A good post on the matter and Dr. Doudna can be seen below:
In Delineating a Role for CRISPR-Cas9 in Pharmaceutical Targeting inheritable metabolic disorders in which may benefit from a CRISPR-Cas9 mediated therapy is discussed. However this curation is meant to focus on CRISPR/CAS9 AS A TOOL IN PRECLINICAL DRUG DEVELOPMENT.
Three Areas of Importance of CRISPR/Cas9 as a TOOL in Preclinical Drug Discovery Include:
Gene-Function Studies: CRISPR/CAS9 ability to DEFINE GENETIC LESION and INSERTION SITE
CRISPR/CAS9 Use in Developing Models of Disease
CRISPR/CAS9 Use as aDiagnostic Tool
Using CRISPR/Cas9 in PRECLINICAL TOXICOLOGY STUDIES
I. Gene-Function Studies: CRISPR/CAS9 ability to DEFINE GENETIC LESION and INSERTION SITE
The advent of the first tools for manipulating genetic material (cloning, PCR, transgenic technology, and before microarray and other’omic methods) allowed scientists to probe novel, individual gene functions as well as their variants and mutants in a “one-gene-at-a time” process. In essence, a gene (or mutant gene) was sequenced, cloned into expression vectors and transfected into recipient cells where function was evaluated.
However, some of the experimental issues with this methodology involved
Lack of knowledge of insertion site of the transgene – this leads to off-target effects usually due to insertion of a transgene in front of unwanted promoters or insertion at a site resulting in gene disruption or even mutagenesis. In an extreme case, such as transposon-induced mutagenesis may lead to transformation as described in an earlier post on this site How Mobile Elements in “Junk” DNA Promote Cancer – Part 1: Transposon-mediated Tumorigenesis
Most transfections experiments result in NON ISOGENIC cell lines – by definition the insertion of a transgene alters the genetic makeup of a cell line. Simple transfection experiments with one transgene compared to a “null” transfectant compares non-isogenic lines, possibly confusing the interpretation of gene-function studies. Therefore a common technique is to develop cell lines with inducible gene expression, thereby allowing the investigator to compare a gene’s effect in ISOGENIC cell lines.
Use of CRSPR in Highthrough-put Screening of Genetic Function
A very nice presentation and summary of CRSPR’s use in determining gene function in a high-throughput manner can be found below
an approach for global detection of DNA double-stranded breaks (DSBs) introduced by RGNs and potentially other nucleases. This method, called genome-wide, unbiased identification of DSBs enabled by sequencing (GUIDE-seq), relies on capture of double-stranded oligodeoxynucleotides into DSBs. Application of GUIDE-seq to 13 RGNs in two human cell lines revealed wide variability in RGN off-target activities and unappreciated characteristics of off-target sequences. The majority of identified sites were not detected by existing computational methods or chromatin immunoprecipitation sequencing (ChIP-seq). GUIDE-seq also identified RGN-independent genomic breakpoint ‘hotspots’.
1) In this talk Dr. Tyler Jacks discussed his use of CRSPR to generate a mouse model of liver tumor in an immunocompetent mouse. Some notes from this talk are given below
B) Engineering Cancer Genomes: Tyler Jacks, Ph.D.; Director, Koch Institute for Integrative Cancer Research
Cancer GEM’s (genetically engineered mouse models of cancer) had moved from transgenics to defined oncogenes
Observation that p53 -/- mice develop spontaneous tumors (lymphomas)
then GEMs moved to Cre/Lox systems to generate mice with deletions however these tumor models require lots of animals, much time to create, expensive to keep;
figured can use CRSPR/Cas9 as rapid, inexpensive way to generate engineered mice and tumor models
he used CRSPR/Cas9 vectors targeting PTEN to introduce PTEN mutations in-vivo to hepatocytes; when they also introduced p53 mutations produced hemangiosarcomas; took ONLY THREE months to produce detectable tumors
also produced liver tumors by using CRSPR/Cas9 to introduce gain of function mutation in β-catenin
2) In the Upcoming Meeting New Frontiers in Gene Editing multiple uses of CRISPR technology is discussed in relation to gene knockout/function studies, tumor model development and
New Frontiers in Gene Editing
Session Spotlight: BUILDING IN VIVO MODELS FOR DRUG DISCOVERY
Genome Editing Animal Models in Drug Discovery Myung Shin, Ph.D., Senior Principal Scientist, Biology-Discovery, Genetics and Pharmacogenomics, Merck Research Laboratories
Recent advances in genome editing have greatly accelerated and expanded the ability to generate animal models. These tools allow generating mouse models in condensed timeline compared to that of conventional gene-targeting knock-out/knock-in strategies. Moreover, the genome editing methods have expanded the ability to generate animal models beyond mice. In this talk, we will discuss the application of ZFN and CRISPR to generate various animal models for drug discovery programs.
In vivo Cancer Modeling and Genetic Screening Using CRISPR/Cas9 Sidi Chen, Ph.D., Postdoctoral Fellow, Laboratories of Dr. Phillip A. Sharp and Dr. Feng Zhang, Koch Institute for Integrative Cancer Research at MIT and Broad Institute of Harvard and MIT
Here we describe a genome-wide CRISPR-Cas9-mediated loss-of-function screen in tumor growth and metastasis. We mutagenized a non-metastatic mouse cancer cell line using a genome-scale library. The mutant cell pool rapidly generates metastases when transplanted into immunocompromised mice. Enriched sgRNAs in lung metastases and late stage primary tumors were found to target a small set of genes, suggesting specific loss-of-function mutations drive tumor growth and metastasis.
FEATURED PRESENTATION: In vivo Chromosome Engineering Using CRISPR-Cas9 Andrea Ventura, M.D., Ph.D., Assistant Member, Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center
We will discuss our experience using somatic genome editing to engineer oncogenic chromosomal rearrangements in vivo. More specifically, we will present the results of our ongoing efforts aimed at modeling cancers driven by chromosomal rearrangements using viral mediated delivery of Crispr-Cas9 to adult animals.
RNAi and CRISPR/Cas9-Based in vivo Models for Drug Discovery Christof Fellmann, Ph.D., Postdoctoral Fellow, Laboratory of Dr. Jennifer Doudna, Department of Molecular and Cell Biology, The University of California, Berkeley
Genetically engineered mouse models (GEMMs) are a powerful tool to study disease initiation, treatment response and relapse. By combining CRISPR/Cas9 and “Sensor” validated, tetracycline-regulated “miR-E” shRNA technology, we have developed a fast and scalable platform to generate RNAi GEMMs with reversible gene silencing capability. The synergy of CRISPR/Cas9 and RNAi enabled us to not only model disease pathogenesis, but also mimic drug therapy in mice, providing us capability to perform preclinical studies in vivo.
In vivo Genome Editing Using Staphylococcus aureus Cas9 Fei Ann Ran, Ph.D., Post-doctoral Fellow, Laboratory of Dr. Feng Zhang, Broad Institute and Junior Fellow, Harvard Society of Fellows
The RNA-guided Cas9 nuclease from the bacterial CRISPR/Cas system has been adapted as a powerful tool for facilitating targeted genome editing in eukaryotes. Recently, we have identified an additional small Cas9 nuclease from Staphylococcus aureus that can be packaged with its guide RNA into a single adeno-associated virus (AAV) vector for in vivo applications. We demonstrate the use of this system for effective gene modification in adult animals and further expand the Cas9 toolbox for in vivo genome editing.
OriGene, Making the Right Tools for CRISPR Research Xuan Liu, Ph.D., Senior Director, Marketing, OriGene
CRISPR technology has quickly revolutionized the scientific community. Its simplicity has democratized the genome editing technology and enabled every lab to consider its utility in gene function research. As the largest tool box for gene functional research, OriGene created a large collection of CRISPR-related tools, including various all-in-one vectors for gRNA cloning, donor vector backbones, genome-wide knockout kits, AAVS1 insertion vectors, etc. OriGene’s high quality products will accelerate CRISPR research.
A critical component of producing transgenic animals is the ability of each successive generations to pass on the transgene. In her post on this site, A NEW ERA OF GENETIC MANIPULATION Dr. Demet Sag discusses the molecular biology of Cas9 systems and their efficiency to cause point mutations which can be passed on to subsequent generations
“
This group developed a new technology for editing genes that can be transferable change to the next generation by combining microbial immune defense mechanism, CRISPR/Cas9 that is the latest ground breaking technology for translational genomics with gene therapy-like approach.
In short, this so-called “mutagenic chain reaction” (MCR) introduces a recessive mutation defined by CRISPR/Cas9 that lead into a high rate of transferable information to the next generation. They reported that when they crossed the female MCR offspring to wild type flies, the yellow phenotype observed more than 95 percent efficiency.
“
The advantage of CRISPR/Cas9 over ZFNs or TALENs is its scalability and multiplexibility in that multiple sites within the mammalian genome can be simultaneously modified, providing a robust, high-throughput approach for gene editing in mammalian cells.
Applied StemCell, Inc. offers various services related to animal models including conventional transgenic rats, and phenotype analysis using knock-in, knock-out strategies.
Further explanation of their use of CRSPR can be found at the site below:
III. Using CRISPR/Cas9 in PRECLINICAL TOXICOLOGY STUDIES
As of now it is unclear as to the strategy of pharma in how to use this technology for toxicology testing however a few companies have licensed the technology to use across their R&D platforms including
A recent paper used a sister technique TALEN to generate knock-in pigs which suggest that it would be possible to generate pigs with human transgenes, especially in human liver isozymes in orer to study hepatotoxicity of drugs.
Associate Professor UCSF Investigator Chan Zuckerberg Biohub
Kevin Holden, PhD
Head of Science Synthego
Abhi Saharia, PhD
VP, Commercial Development Synthego
Human genetics provides perhaps the single best opportunity to innovate and improve clinical success rates, through the identification of novel drug targets for complex disease. Even as correlation identifies multiple genetic variants associated with disease, it is challenging to conduct requisite functional studies to identify the causal variants, especially since most association signals map to non-coding regions of the genome.
Genetic editing technologies, such as CRISPR, have enabled the modeling of associated variants at their native loci, including non-coding loci, empowering the identification of underlying biological mechanisms of disease with potential causal genes. However, genome editing is largely manual today severely limiting scale, and forcing the use of rational filters to prioritize which variants to investigate functionally.
In this GEN webinar, we will discuss several strategies enabling large-scale functional investigation of disease-associated variants in a cost- and time-effective manner, including different types of pooled CRISPR-based screens and the development of a fully automated genome engineering platform. We will also review how optimization of genome engineering on this platform enables the engineering of disease-associated variants at scale in pluripotent cells.
They will be presenting on use of wide scale CRSPR screens to validate druggable targets
The presenters will also discuss new platforms for these wide scale screens
Big gap between accumulation of genetic variant information and functions of these variants
CRSPRi or CRSPa (siRNA coupled or enhancer coupled CRSPR guides)
Arrayed screens: multiplate guide RNAs and phenotype measured (phenotype can be morphology, complex biological systems like organoids or non autonomous functions
Using pooled screens and use of suitable cell model critical for this strategy
For example in iPSC vs. neurons has different expression patterns upon same CRSPR of UBA1
Advantage is using CRSPR to take iPSC from diseased variant patient to make a corrected isogenic control then introduce gRNAs and use modifier screens to determine phenotypes
Generated a platform called CRISPRbrain.org to do bioinformatics on various experiments with different guide RNAs (CRSPRs)
Nature medicine paper did GWAS and found 27 SNV associated with high risk disease and a rational filter focused on 1 SNV in noncoding region but why study a single variant and if studied all 27 would they have been able to identify a more representative druggable set?
Goal is to reduce or eliminate these rational filters
Syntheco uses an electroporation with ribonucleic proteins (RNP) to give highest efficiency and minimizes off target as complex is only in cells for a short period of time
They confirm they are doing single cell cloning by using automated microscopy to confirm single cell growth in each cloning well
Engineering iPSc genetically modified cells at scale
The closer you get to your target site the more efficient your CRSPR so a big factor when making guides, especially for knock-in CRSPR
Adding a small molecule non homologous end joining inhibitor increases efficiency to 95%
Cold shocking the cells also assists in homologous repair
Use cleavage resistant templates
III. CRISPR/CAS9 AS A DIAGNOSTIC TOOL
In the journal Science, Omar Abudayyeh and Jonathan Gootenberg discuss how CRISPR-based diagnostic (CRISPR-dx) tools offer a solution, and multiple CRISPR-dx products for detection of the SARS-CoV-2 RNA genome have been authorized by the US Food and Drug Administration (FDA). In addition they discuss the work by Jiao et al. in combining this technique to develop a rapid and sensitive SARS-CoV2 diagnostic test.
Omar O. Abudayyeh, Jonathan S. Gootenberg. Science 28 May 2021: CRISPR Diagnostics Vol. 372, Issue 6545, pp. 914-915; DOI: 10.1126/science.abi9335
Summary
Although clinical diagnostics take many forms, nucleic acid–based testing has become the gold standard for sensitive detection of many diseases, including pathogenic infections. Quantitative polymerase chain reaction (qPCR) has been widely adopted for its ability to detect only a few DNA or RNA molecules that can unambiguously specify a particular disease. However, the complexity of this technique restricts application to laboratory settings. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has underscored the need for the development and deployment of nucleic acid tests that are economical, easily scaled, and capable of being run in low-resource settings, without sacrifices in speed, sensitivity or specificity. CRISPR-based diagnostic (CRISPR-dx) tools offer a solution, and multiple CRISPR-dx products for detection of the SARS-CoV-2 RNA genome have been authorized by the US Food and Drug Administration (FDA). On page 941 of this issue, Jiao et al. (1) describe a new CRISPR-based tool to distinguish several SARS-CoV-2 variants in a single reaction.
Although clinical diagnostics take many forms, nucleic acid–based testing has become the gold standard for sensitive detection of many diseases, including pathogenic infections. Quantitative polymerase chain reaction (qPCR) has been widely adopted for its ability to detect only a few DNA or RNA molecules that can unambiguously specify a particular disease. However, the complexity of this technique restricts application to laboratory settings. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has underscored the need for the development and deployment of nucleic acid tests that are economical, easily scaled, and capable of being run in low-resource settings, without sacrifices in speed, sensitivity or specificity. CRISPR-based diagnostic (CRISPR-dx) tools offer a solution, and multiple CRISPR-dx products for detection of the SARS-CoV-2 RNA genome have been authorized by the US Food and Drug Administration (FDA). On page 941 of this issue, Jiao et al. (1) describe a new CRISPR-based tool to distinguish several SARS-CoV-2 variants in a single reaction.
There are multiple types of CRISPR systems comprising basic components of a single protein or protein complex, which cuts a specific DNA or RNA target programmed by a complementary guide sequence in a CRISPR-associated RNA (crRNA). The type V and VI systems and the CRISPR-associated endonucleases Cas12 (2, 3) and Cas13 (4, 5) bind and cut DNA or RNA, respectively. Furthermore, upon recognizing a target DNA or RNA sequence, Cas12 and Cas13 proteins exhibit “collateral activity” whereby any DNA or RNA, respectively, in the sample is cleaved regardless of its nucleic acid sequence (4, 6). Thus, reporter DNAs or RNAs, which allow for visual or fluorescent detection upon cleavage, can be added to a sample to infer the presence or absence of specific DNA or RNA species (4–8).
Initial versions of CRISPR-dx utilizing Cas13 alone were sensitive to the low picomolar range, corresponding to a limit of detection of millions of molecules in a microliter sample. To improve sensitivity, preamplification methods, such as recombinase polymerase amplification (RPA), PCR, loop-mediated isothermal amplification (LAMP), or nucleic acid sequence–based amplification (NASBA), can be used with Cas12 or Cas13 to enable a limit of detection down to a single molecule (8). This preamplification approach, applicable to both Cas12 and Cas13 (6, 7), enabled a suite of detection methods and multiplexing up to four orthogonal targets (7). Additional developments expanded CRISPR-dx readouts beyond fluorescence, including lateral flow (7), colorimetric (9), and electronic or material responsive readouts (10), allowing for instrument-free approaches. In addition, post–collateral-cleavage amplification methods, such as the use of the CRISPR-associated enzyme Csm6, have been combined with Cas13 to further increase the speed of CRISPR-dx tests (7). As an alternative to collateral-cleavage–based detection, type III CRISPR systems, which involve large multiprotein complexes capable of targeting both DNA and RNA, have been used for SARS-CoV-2 detection through production of colorimetric or fluorometric readouts (11).
FDA-authorized CRISPR-dx tests are currently only for use in centralized labs, because the most common CRISPR detection protocols require fluid handling steps and two different incubations, precluding their immediate use at the point of care. Single-step formulations have been developed to overcome this limitation, and these “one-pot” versions of CRISPR-dx are simple to run, operate at a single temperature, and run without complex equipment, producing either fluorescence or lateral flow readouts. The programmability of CRISPR makes new diagnostic tests easier to develop, and within months of the release of the SARS-CoV-2 genome, many COVID-19–specific CRISPR tests were reported and distributed around the world.
The broader capability for Cas enzyme–enhanced nucleic acid binding or cleavage has led to several other detection modalities. Cas9-based methods for cleaving nucleic acids in solution for diagnostic purposes have been combined with other detection platforms, such as destruction of undesired amplicons for preparation of next-generation sequencing libraries (12), or selective removal of alleles for nucleotide-specific detection (13). Alternatively, the programmable cleavage event from the Cas nuclease can be used to initiate an amplification reaction (14). Cas9-based DNA targeting has also been used for nucleotide detection in combination with solid-state electronics, promising an amplification-free platform for detection. In this platform, called CRISPR-Chip, the Cas9 protein binds nucleotide targets of interest (often in the context of the native genome) to graphene transistors, where the presence of these targets alters either current or voltage (15). By utilizing additional Cas9 orthologs and specific guide designs, CRISPR-Chip approaches have been tuned for single–base-pair sensitivity (15). Because they are integrated with electronic readers, CRISPR-Chip platforms may allow facile point-of-care detection with handheld devices.
Different classes of CRISPR diagnostics. GRAPHIC: ERIN DANIEL
Jiao et al. use a distinct characteristic of type II CRISPR systems, which involve Cas9, to develop a new type of noncollateral based CRISPR detection. Unlike Cas12s and Cas13, Cas9-crRNA complex formation requires an additional RNA known as the trans-activating CRISPR RNA (tracrRNA). By sequencing RNAs bound to Cas9 from Campylobacter jejuni in its natural host, the authors identified unexpected crRNAs, called noncanonical crRNA (ncrRNA), that corresponded to endogenous transcripts. Upon investigation of this surprising observation, it became clear that the tracrRNA was capable of hybridizing to semi-complementary sequences from a variety of RNA sources, leading to biogenesis of ncrRNAs of various sizes. Recognizing that they could program tracrRNAs to target a transcript of interest, the authors generated a reprogrammed tracrRNA (Rptr) that could bind and cleave a desired transcript, converting a piece of that transcript into a functional guide RNA. By then creating fluorescent DNA sensors that would be cleaved by the Rptr and ncrRNAs, the sensing of RNA by Cas9 could be linked to a detectable readout. This platform, called LEOPARD (leveraging engineered tracrRNAs and on-target DNAs for parallel RNA detection), can be combined with gel-based readouts and enables multiplexed detection of several different sequences in a single reaction (see the figure).
Jiao et al. also combined LEOPARD with PCR in a multistep workflow to detect SARS-CoV-2 genomes from patients with COVID-19. Although more work is needed to integrate this Cas9-based detection modality into a single step with RPA or LAMP to create a portable and sensitive isothermal test, an advantage of this approach is the higher-order multiplexing that can be achieved, allowing multiple pathogens, diseases, or variants to be detected simultaneously. More work is also needed to combine this technology with extraction-free methods for better ease of use; alternative readouts to gel-based readouts, such as lateral flow and colorimetric readouts, would be beneficial for point-of-care detection.
In just 5 years, the CRISPR-dx field has rapidly expanded, growing from a set of peculiar molecular biology discoveries to multiple FDA-authorized COVID-19 tests and spanning four of the six major subtypes of CRISPR systems. Despite the tremendous promise of CRISPR-dx, substantial challenges remain to adapting these technologies for point-of-care and at-home settings. Simplification of the chemistries to operate as a single reaction in a matter of minutes would be revolutionary, especially if the reaction could be run at room temperature without any complex or expensive equipment. These improvements to CRISPR-dx assays can be achieved by identification or engineering of additional Cas enzymes with lower-temperature requirements, higher sensitivity, or faster kinetics, enabling rapid and simple amplification-free detection with single-molecule sensitivity.
Often overlooked is the necessity for a sample DNA or RNA preparation step that is simple enough to be added directly to the CRISPR reaction to maintain a simple workflow for point-of-care testing. In addition, higher-order multiplexing developments would allow for expansive testing menus and approach the possibility of testing for all known diseases. As these advancements are realized, innovative uses of CRISPR-dx will continue in areas such as surveillance, integration with biomaterials, and environmental monitoring. In future years, CRISPR-dx assays may become universal in the clinic and at home, reshaping how diseases are diagnosed.
Cancer Biology and Genomics for Disease Diagnosis (Vol. I) Now Available for Amazon Kindle
Reporter: Stephen J Williams, PhD
Article ID #179: Cancer Biology and Genomics for Disease Diagnosis (Vol. I) Now Available for Amazon Kindle. Published on 8/14/2015
WordCloud Image Produced by Adam Tubman
Leaders in Pharmaceutical Business Intelligence would like to announce the First volume of their BioMedical E-Book Series C: e-Books on Cancer & Oncology
This e-Book is a comprehensive review of recent Original Research on Cancer & Genomics including related opportunities for Targeted Therapy written by Experts, Authors, Writers. This ebook highlights some of the recent trends and discoveries in cancer research and cancer treatment, with particular attention how new technological and informatics advancements have ushered in paradigm shifts in how we think about, diagnose, and treat cancer. The results of Original Research are gaining value added for the e-Reader by the Methodology of Curation.The e-Book’s articles have been published on the Open Access Online Scientific Journal, since April 2012. All new articles on this subject, will continue to be incorporated, as published with periodical updates.
We invite e-Readers to write an Article Reviews on Amazon for this e-Book on Amazon. All forthcoming BioMed e-Book Titles can be viewed at:
Leaders in Pharmaceutical Business Intelligence, launched in April 2012 an Open Access Online Scientific Journal is a scientific, medical and business multi expert authoring environment in several domains of life sciences, pharmaceutical, healthcare & medicine industries. The venture operates as an online scientific intellectual exchange at their website http://pharmaceuticalintelligence.com and for curation and reporting on frontiers in biomedical, biological sciences, healthcare economics, pharmacology, pharmaceuticals & medicine. In addition the venture publishes a Medical E-book Series available on Amazon’s Kindle platform.
Analyzing and sharing the vast and rapidly expanding volume of scientific knowledge has never been so crucial to innovation in the medical field. WE are addressing need of overcoming this scientific information overload by:
delivering curation and summary interpretations of latest findings and innovations
on an open-access, Web 2.0 platform with future goals of providing primarily concept-driven search in the near future
providing a social platform for scientists and clinicians to enter into discussion using social media
compiling recent discoveries and issues in yearly-updated Medical E-book Series on Amazon’s mobile Kindle platform
This curation offers better organization and visibility to the critical information useful for the next innovations in academic, clinical, and industrial research by providing these hybrid networks.
Table of Contents for Cancer Biology and Genomics for Disease Diagnosis
Preface
Introduction The evolution of cancer therapy and cancer research: How we got here?
Part I. Historical Perspective of Cancer Demographics, Etiology, and Progress in Research
Chapter 1: The Occurrence of Cancer in World Populations
Chapter 2. Rapid Scientific Advances Changes Our View on How Cancer Forms
Chapter 3: A Genetic Basis and Genetic Complexity of Cancer Emerge
Chapter 4: How Epigenetic and Metabolic Factors Affect Tumor Growth
Chapter 5: Advances in Breast and Gastrointestinal Cancer Research Supports Hope for Cure
Part II. Advent of Translational Medicine, “omics”, and Personalized Medicine Ushers in New Paradigms in Cancer Treatment and Advances in Drug Development
Chapter 6: Treatment Strategies
Chapter 7: Personalized Medicine and Targeted Therapy
Part III.Translational Medicine, Genomics, and New Technologies Converge to Improve Early Detection
Chapter 8: Diagnosis
Chapter 9: Detection
Chapter 10: Biomarkers
Chapter 11: Imaging In Cancer
Chapter 12: Nanotechnology Imparts New Advances in Cancer Treatment, Detection, & Imaging
Epilogue by Larry H. Bernstein, MD, FACP: Envisioning New Insights in Cancer Translational Biology
Variant ASGR1 Associated with a Reduced Risk of Coronary Artery Disease
Paul Nioi, Ph.D., Asgeir Sigurdsson, B.Sc., Gudmar Thorleifsson, Ph.D., Hannes Helgason, Ph.D., Arna B. Agustsdottir, B.Sc., Gudmundur L. Norddahl, Ph.D., Anna Helgadottir, M.D., Audur Magnusdottir, Ph.D., Aslaug Jonasdottir, M.Sc., Solveig Gretarsdottir, Ph.D., Ingileif Jonsdottir, Ph.D., Valgerdur Steinthorsdottir, Ph.D., Thorunn Rafnar, Ph.D., Dorine W. Swinkels, M.D., Ph.D., Tessel E. Galesloot, Ph.D., Niels Grarup, Ph.D., Torben Jørgensen, D.M.Sc., Henrik Vestergaard, D.M.Sc., Torben Hansen, Ph.D., Torsten Lauritzen, D.M.Sc., Allan Linneberg, Ph.D., Nele Friedrich, Ph.D., Nikolaj T. Krarup, Ph.D., Mogens Fenger, Ph.D., Ulrik Abildgaard, D.M.Sc., Peter R. Hansen, D.M.Sc., Anders M. Galløe, Ph.D., Peter S. Braund, Ph.D., Christopher P. Nelson, Ph.D., Alistair S. Hall, F.R.C.P., Michael J.A. Williams, M.D., Andre M. van Rij, M.D., Gregory T. Jones, Ph.D., Riyaz S. Patel, M.D., Allan I. Levey, M.D., Ph.D., Salim Hayek, M.D., Svati H. Shah, M.D., Muredach Reilly, M.B., B.Ch., Gudmundur I. Eyjolfsson, M.D., Olof Sigurdardottir, M.D., Ph.D., Isleifur Olafsson, M.D., Ph.D., Lambertus A. Kiemeney, Ph.D., Arshed A. Quyyumi, F.R.C.P., Daniel J. Rader, M.D., William E. Kraus, M.D., Nilesh J. Samani, F.R.C.P., Oluf Pedersen, D.M.Sc., Gudmundur Thorgeirsson, M.D., Ph.D., Gisli Masson, Ph.D., Hilma Holm, M.D., Daniel Gudbjartsson, Ph.D., Patrick Sulem, M.D., Unnur Thorsteinsdottir, Ph.D., and Kari Stefansson, M.D., Ph.D.
N Engl J Med 2016; 374:2131-2141June 2, 2016DOI: 10.1056/NEJMoa1508419
Several sequence variants are known to have effects on serum levels of non–high-density lipoprotein (HDL) cholesterol that alter the risk of coronary artery disease.
We sequenced the genomes of 2636 Icelanders and found variants that we then imputed into the genomes of approximately 398,000 Icelanders. We tested for association between these imputed variants and non-HDL cholesterol levels in 119,146 samples. We then performed replication testing in two populations of European descent. We assessed the effects of an implicated loss-of-function variant on the risk of coronary artery disease in 42,524 case patients and 249,414 controls from five European ancestry populations. An augmented set of genomes was screened for additional loss-of-function variants in a target gene. We evaluated the effect of an implicated variant on protein stability.
We found a rare noncoding 12-base-pair (bp) deletion (del12) in intron 4 of ASGR1, which encodes a subunit of the asialoglycoprotein receptor, a lectin that plays a role in the homeostasis of circulating glycoproteins. The del12 mutation activates a cryptic splice site, leading to a frameshift mutation and a premature stop codon that renders a truncated protein prone to degradation. Heterozygous carriers of the mutation (1 in 120 persons in our study population) had a lower level of non-HDL cholesterol than noncarriers, a difference of 15.3 mg per deciliter (0.40 mmol per liter) (P=1.0×10−16), and a lower risk of coronary artery disease (by 34%; 95% confidence interval, 21 to 45; P=4.0×10−6). In a larger set of sequenced samples from Icelanders, we found another loss-of-function ASGR1 variant (p.W158X, carried by 1 in 1850 persons) that was also associated with lower levels of non-HDL cholesterol (P=1.8×10−3).
ASGR1 haploinsufficiency was associated with reduced levels of non-HDL cholesterol and a reduced risk of coronary artery disease. (Funded by the National Institutes of Health and others.)
Amgen’s deCODE Genetics Publishes Largest Human Genome Population Study to Date
Mark Terry, BioSpace.com Breaking News Staff reported on results of one of the largest genome sequencing efforts to date, sequencing of the genomes of 2,636 people from Iceland by deCODE genetics, Inc., a division of Thousand Oaks, Calif.-based Amgen (AMGN).
There were a total of four studies, published on March 25, 2015 on the online version of Nature Genetics; titled “Large-scale whole-genome sequencing of the Icelandic population[1],” “Identification of a large set of rare complete human knockouts[2],” “The Y-chromosome point mutation rate in humans[3]” and “Loss-of-function variants in ABCA7 confer risk of Alzheimer’s disease[4].”
The project identified some new genetic variants which increase risk of Alzheimer’s disease and confirmed some variants known to increase risk of diabetes and atrial fibrillation. A more in-depth post will curate these findings but there was an interesting discrete geographic distribution of certain rare variants located around Iceland. The dataset offers a treasure trove of meaningful genetic information not only about the Icelandic population but offers numerous new targets for breast, ovarian cancer as well as Alzheimer’s disease.
“This work is a demonstration of the unique power sequencing gives us for learning more about the history of our species,” said Kari Stefansson, founder and chief executive officer of deCode and one of the lead authors in a statement, “and for contributing to new means of diagnosing, treating and preventing disease.”
The scale and ambition of the study is impressive, but perhaps more important, the research identified a new genetic variant that increases the risk of Alzheimer’s disease and already had identified an APP variant that is associated with decreased risk of Alzheimer’s Disease. It also confirmed variants that increase the risk of diabetes and a variant that results in atrial fibrillation.
The database of human genetic variation (dbSNP) contained over 50 million unique sequence variants yet this database only represents a small proportion of single nucleotide variants which is thought to exist. These “private” or rare variants undoubtedly contribute to important phenotypes, such as disease susceptibility. Non-SNV variants, like indels and structural variants, are also under-represented in public databases. The only way to fully elucidate the genetic basis of a trait is to consider all of these types of variants, and the only way to find them is by large-scale sequencing.
Curation of Population Genomic Sequencing Programs/Corporate Partnerships
Click on “Curation of genomic studies” below for full Table
Gudbjartsson DF, Helgason H, Gudjonsson SA, Zink F, Oddson A, Gylfason A, Besenbacher S, Magnusson G, Halldorsson BV, Hjartarson E et al: Large-scale whole-genome sequencing of the Icelandic population. Nature genetics 2015, advance online publication.
Sulem P, Helgason H, Oddson A, Stefansson H, Gudjonsson SA, Zink F, Hjartarson E, Sigurdsson GT, Jonasdottir A, Jonasdottir A et al: Identification of a large set of rare complete human knockouts. Nature genetics 2015, advance online publication.
Helgason A, Einarsson AW, Gumundsdottir VB, Sigursson A, Gunnarsdottir ED, Jagadeesan A, Ebenesersdottir SS, Kong A, Stefansson K: The Y-chromosome point mutation rate in humans. Nature genetics 2015, advance online publication.
Steinberg S, Stefansson H, Jonsson T, Johannsdottir H, Ingason A, Helgason H, Sulem P, Magnusson OT, Gudjonsson SA, Unnsteinsdottir U et al: Loss-of-function variants in ABCA7 confer risk of Alzheimer’s disease. Nature genetics 2015, advance online publication.
Other post related to DECODE, population genomics, and NGS on this site include:
War on Cancer Needs to Refocus to Stay Ahead of Disease Says Cancer Expert
Writer, Curator: Stephen J. Williams, Ph.D.
Article ID #171: War on Cancer Needs to Refocus to Stay Ahead of Disease Says Cancer Expert. Published on 3/27/2015
WordCloud Image Produced by Adam Tubman
UPDATED 1/08/2020
Is one of the world’s most prominent cancer researchers throwing in the towel on the War On Cancer? Not throwing in the towel, just reminding us that cancer is more complex than just a genetic disease, and in the process, giving kudos to those researchers who focus on non-genetic aspects of the disease (see Dr. Larry Bernstein’s article Is the Warburg Effect the Cause or the Effect of Cancer: A 21st Century View?).
National Public Radio (NPR) has been conducting an interview series with MIT cancer biology pioneer, founding member of the Whitehead Institute for Biomedical Research, and National Academy of Science member and National Medal of Science awardee Robert A. Weinberg, Ph.D., who co-discovered one of the first human oncogenes (Ras)[1], isolation of first tumor suppressor (Rb)[2], and first (with Dr. Bill Hahn) proved that cells could become tumorigenic after discrete genetic lesions[3]. In the latest NPR piece, Why The War On Cancer Hasn’t Been Won (seen on NPR’s blog by Richard Harris), Dr. Weinberg discusses a comment in an essay he wrote in the journal Cell[4], basically that, in recent years, cancer research may have focused too much on the genetic basis of cancer at the expense of multifaceted etiology of cancer, including the roles of metabolism, immunity, and physiology. Cancer is the second most cause of medically related deaths in the developed world. However, concerted efforts among most developed nations to eradicate the disease, such as increased government funding for cancer research and a mandated ‘war on cancer’ in the mid 70’s has translated into remarkable improvements in diagnosis, early detection, and cancer survival rates for many individual cancer. For example, survival rate for breast and colon cancer have improved dramatically over the last 40 years. In the UK, overall median survival times have improved from one year in 1972 to 5.8 years for patients diagnosed in 2007. In the US, the overall 5 year survival improved from 50% for all adult cancers and 62% for childhood cancer in 1972 to 68% and childhood cancer rate improved to 82% in 2007. However, for some cancers, including lung, brain, pancreatic and ovarian cancer, there has been little improvement in survival rates since the “war on cancer” has started.
As Weinberg said, in the 1950s, medical researchers saw cancer as “an extremely complicated process that needed to be described in hundreds, if not thousands of different ways,”. Then scientists tried to find a unifying principle, first focusing on viruses as the cause of cancer (for example rous sarcoma virus and read Dr. Gallo’s book on his early research on cancer, virology, and HIV in Virus Hunting: AIDS, Cancer & the Human Retrovirus: A Story of Scientific Discovery).
However (as the blog article goes on) “that idea was replaced by the notion that cancer is all about wayward genes.”
“The thought, at least in the early 1980s, was that were a small number of these mutant, cancer-causing oncogenes, and therefore that one could understand a whole disparate group of cancers simply by studying these mutant genes that seemed to be present in many of them,” Weinberg says. “And this gave the notion, the illusion over the ensuing years, that we would be able to understand the laws of cancer formation the way we understand, with some simplicity, the laws of physics, for example.”
According to Weinberg, this gene-directed unifying theory has given way as recent evidences point back once again to a multi-faceted view of cancer etiology.
But this is not a revolutionary or conflicting idea for Dr. Weinberg, being a recipient of the 2007 Otto Warburg Medal and focusing his latest research on complex systems such as angiogenesis, cell migration, and epithelial-stromal interactions.
In fact, it was both Dr. Weinberg and Dr. Bill Hanahan who formulated eight governing principles or Hallmarks of cancer:
Maintaining Proliferative Signals
Avoiding Immune Destruction
Evading Growth Suppressors
Resisting Cell Death
Becoming Immortal
Angiogenesis
Deregulating Cellular Energy
Activating Invasion and Metastasis
Taken together, these hallmarks represent the common features that tumors have, and may involve genetic or non-genetic (epigenetic) lesions … a multi-modal view of cancer that spans over time and across disciplines. As reviewed by both Dr. Larry Bernstein and me in the e-book Volume One: Cancer Biology and Genomics for Disease Diagnosis, each scientific discipline, whether the pharmacologist, toxicologist, virologist, molecular biologist, physiologist, or cell biologist has contributed greatly to our total understanding of this disease, each from their own unique perspective based on their discipline. This leads to a “multi-modal” view on cancer etiology and diagnosis, treatment. Many of the improvements in survival rates are a direct result of the massive increase in the knowledge of tumor biology obtained through ardent basic research. Breakthrough discoveries regarding oncogenes, cancer cell signaling, survival, and regulated death mechanisms, tumor immunology, genetics and molecular biology, biomarker research, and now nanotechnology and imaging, have directly led to the advances we now we in early detection, chemotherapy, personalized medicine, as well as new therapeutic modalities such as cancer vaccines and immunotherapies and combination chemotherapies. Molecular and personalized therapies such as trastuzumab and aromatase inhibitors for breast cancer, imatnib for CML and GIST related tumors, bevacizumab for advanced colorectal cancer have been a direct result of molecular discoveries into the nature of cancer. This then leads to an interesting question (one to be tackled in another post):
Would shifting focus less on cancer genome and back to cancer biology limit the progress we’ve made in personalized medicine?
In a 2012 post Genomics And Targets For The Treatment Of Cancer: Is Our New World Turning Into “Pharmageddon” Or Are We On The Threshold Of Great Discoveries? Dr. Leonard Lichtenfield, MD, Deputy Chief Medical Officer for the ACS, comments on issues regarding the changes which genomics and personalized strategy has on oncology drug development. As he notes, in the past, chemotherapy development was sort of ‘hit or miss’ and the dream and promise of genomics suggested an era of targeted therapy, where drug development was more ‘rational’ and targets were easily identifiable.
To quote his post
“
That was the dream, and there have been some successes–even apparent cures or long term control–with the used of targeted medicines with biologic drugs such as Gleevec®, Herceptin® and Avastin®. But I think it is fair to say that the progress and the impact hasn’t been quite what we thought it would be. Cancer has proven a wily foe, and every time we get answers to questions what we usually get are more questions that need more answers. The complexity of the cancer cell is enormous, and its adaptability and the genetic heterogeneity of even primary cancers (as recently reported in a research paper in the New England Journal of Medicine) has been surprising, if not (realistically) unexpected.
In addition, Dr. Lichtenfeld makes some interesting observations including:
A “pharmageddon” where drug development risks/costs exceed the reward so drug developers keep their ‘wallets shut’. For example even for targeted therapies it takes $12 billion US to develop a drug versus $2 billion years ago
Drugs are still drugs and failure in clinical trials is still a huge risk
“Eroom’s Law” (like “Moore’s Law” but opposite effect) – increasing costs with decreasing success
Limited market for drugs targeted to a select mutant; what he called “slice and dice”
Andrea Califano, PhD – Precision Medicine predictions based on statistical associations where systems biology predictions based on a physical regulatory model
Spyro Mousses, PhD – open biomedical knowledge and private patient data should be combined to form systems oncology clearinghouse to form evolving network, linking drugs, genomic data, and evolving multiscalar models
Razelle Kurzrock, MD – What if every patient with metastatic disease is genomically unique? Problem with model of smaller trials (so-called N=1 studies) of genetically similar disease: drugs may not be easily acquired or re-purposed, and greater regulatory burdens
So, discoveries of oncogenes, tumor suppressors, mutant variants, high-end sequencing, and the genomics and bioinformatic era may have led to advent of targeted chemotherapies with genetically well-defined patient populations, a different focus in chemotherapy development
… but as long as we have the conversation open I have no fear of myopia within the field, and multiple viewpoints on origins and therapeutic strategies will continue to develop for years to come.
References
Parada LF, Tabin CJ, Shih C, Weinberg RA: Human EJ bladder carcinoma oncogene is homologue of Harvey sarcoma virus ras gene. Nature 1982, 297(5866):474-478.
Friend SH, Bernards R, Rogelj S, Weinberg RA, Rapaport JM, Albert DM, Dryja TP: A human DNA segment with properties of the gene that predisposes to retinoblastoma and osteosarcoma. Nature 1986, 323(6089):643-646.
Hahn WC, Counter CM, Lundberg AS, Beijersbergen RL, Brooks MW, Weinberg RA: Creation of human tumour cells with defined genetic elements. Nature 1999, 400(6743):464-468.
Weinberg RA: Coming full circle-from endless complexity to simplicity and back again. Cell2014, 157(1):267-271.
The cancer death rate in the United States fell 2.2 percent in 2017 — the biggest single-year drop ever reported — propelled by gains against lung cancer, the American Cancer Society said Wednesday.
Declines in the mortality rate for lung cancer have accelerated in recent years in response to new treatments and falling smoking rates, said Rebecca Siegel, lead author of Cancer Statistics 2020, the latest edition of the organization’s annual report on cancer trends.
The improvement in 2017, the most recent year for which data is available, is part of a long-term drop in cancer mortality that reflects, to a large extent, the smoking downturn. Since peaking in 1991, the cancer death rate has fallen 29 percent, which translates into 2.9 million fewer deaths.
Norman “Ned” Sharpless, director of the National Cancer Institute, which was not involved in the report, said the data reinforces that “we are making steady progress” on cancer. For lung cancer, he pointed to new immunotherapy treatments and so-called targeted therapies that stop the action of molecules key to cancer growth. He predicted that the mortality rate would continue to fall “as we get better at using these therapies.” Multiple clinical trials are exploring how to combine the new approaches with older ones, such as chemotherapy.
Sharpless expressed concern, however, that progress against cancer would be undermined by increased obesity, which is a risk factor for several malignancies.
The cancer society report projected 1.8 million new cases of cancer in the United States this year and more than 606,000 deaths. Nationally, cancer is the second-leading cause of death after heart disease in both men and women. It is the No. 1 cause in many states, and among Hispanic and Asian Americans and people younger than 80, the report said.
The cancer death rate is defined as deaths per 100,000 people. The cancer society has been reporting the rate since 1930.
Because lung cancer is the leading cause of cancer deaths, accounting for 1 in 4, any change in the mortality rate has a large effect on the overall cancer death rate, Siegel noted.
She described the gains against lung cancer, and against another often deadly cancer, melanoma, as “exciting.” But, she added, “the news this year is mixed” because of slower progress against colorectal, breast and prostate cancers. Those cancers often can be detected early by screening, she said.
The report said substantial racial and geographic disparities remain for highly preventable cancers, such as cervical cancer, and called for “the equitable application” of cancer control measures.
In recent years, melanoma has showed the biggest mortality-rate drop of any cancer. That’s largely a result of breakthrough treatments such as immunotherapy, which unleashes the patient’s own immune system to fight the cancer and was approved for advanced melanoma in 2011.
Other posts on this site on The War on Cancer and Origins of Cancer include:
This update was performed by the following methods:
A. GPT 5 Text analysis and Reasoning
B. Insertion of Knowledge Graph on topic Curation of Genomic Analysis from Non Small Cell Lung Cancer Studies from Nodus Labs using InfraNodus software
C. Domain Knowledge Expert evaluation of the Update outcomes
This article has the following Structure:
Part A: Introduction to LLM, Knowledge Graph software InfraNodus, ChatGPT5 and Background Information on curated material for Test Case
Part B: InfraNodus Analysis of manual curation and Knowledge Graph Creation
Part C: Chat GPT 5 Analysis of Manually Curated Material
Part D: Curation entitled Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for Non-Small Cell Lung Cancer originally published on 09/05/2014
Results of Article Update with GPT 5
1. GPT5 alone was not able to understand the goal of the article, namely to determine knowledge gaps in a particular research area involving 5 genomic studies on lung cancer patients
2. GPT5 alone was not able to group concepts or comonalities between biological pathways unless supplied with a manually curated list of KEGG pathways from a list of mutated genes. However this precluded any effect that fusion proteins had on the analysis and so GPT5 would only concentrate on mutated genes commonly found in literature
3. GPT was not able to access some of the open Access databases like NCBI Gene Ontology database
Results of Article Update with KnowledgeGraph presentation to GPT 5
4. As the Knowledge Graph understood the importance of fusion proteins and transversions, the knowledgegraph augmented the GPT analysis and so enriched the known pathways as well as could correctly identify the less represented pathways in the knowledge graph
5. This led to the identification of many novel signaling pathways not identified in the original analysis, and was able to perform this task with ease and speed
6. GPT with InfraNodus Analysis was able to propose pertinent questions for future research (the goal of the original curation) such as:
How does the interaction between [[EGFR]] mutations and sex-specific gene alterations, including [[RBM10]], influence treatment outcomes in lung adenocarcinoma?
How does the intersection of mutational patterns from smoking influence pathway activation in NSCLC, and can identifying these interactions improve targeted therapy development?
Novelty in comparison to Original article published on 09/05/2014
7. it appears that manual curation is necessary to assist in the building of relevant knowledge graphs in the biomedical fields to augment generative AI analysis
8. by itself, generative AI is not optimized for inference of higher concepts from biomedical text, and therefore, at this point, requires the input from human curators developing domain-specific knowledge graphs
9. The combination of ChatGPT5 and Knowledge graphs of this manually curated biomedical text added a further layer of complexity of gaps of knowledge not seen in the original curations including the need to study noncanonical signaling pathways like WNT and Hedgehog in smoker versus nonsmoker cohorts of lung cancer patients
A Comparison of Manual Expert-Curative and an LLM-based analysis of Knowledge Gaps in Non Small Lung Cancer Whole Exome Sequencing Studies and a Use Case Example of Chat GPT 5
Part A: Introduction to LLM, Knowledge Graph software InfraNodus, ChatGPT5 and Background Information on curated material for Test Case
The development of Large Language Models (LLMs), together with development of knowledge graphs, have facilitated the ability to analyze text and determine the relationships among the various concepts contained within series of texts. These concepts and relationships can be visualized, and new insights inferred from these visualizations. As a result, this type of analysis suggests new directions and lines of research.
Alternatively, these types of visualizations can also reveal gaps in knowledge which should be addressed. A new type of LLM and visualization tools have been developed to understand the gaps in knowledge in biomedical text.
Nodus Labs InfrNodus AI Knowledge Graph Software Tools Allow Text Relationship Visualization and Integrated AI Functionality
Infranodus makes knowlegde graphs from text and then is able to visualize the relationships between concepts (or nodes). In doing so, the tool also highlights the various knowledge gaps (or large differences between nodes) which can be used to investigate new hypotheses and research directions of previously univestigated relationships between concepts. This generates new research questions, in which these gaps can be used as prompts in the software’s integrated AI tool. The AI tool, much like a GPT, returns recommendations for research to be conducted in the area.
In addition, the InfraNodus software can detect if text is too biased on a particular concept or conclusion, and using a GPT3 or GPT4, can determine if the nodes are too dispersed and will recommend which gaps should be focused on.
The software can upload any biomedical text in various formats
A full demonstration is on their website but a good summary is found on their Youtube site at
Previously we had manually curated and analyzed the knowledge gaps from a series of publications on whole exome sequencing of biopsied tumors from cohorts of non small lung cancer patients. This curation (from 2016) is seen in the lower half of this updated link below and I separated with a bar and highlighted in Yellow as Text for AI Analysis.
Govindan R, Ding L, Griffith M, Subramanian J, Dees ND, Kanchi KL, Maher CA, Fulton R, Fulton L, Wallis J et al: Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 2012, 150(6):1121-1134.
Imielinski M, Berger AH, Hammerman PS, Hernandez B, Pugh TJ, Hodis E, Cho J, Suh J, Capelletti M, Sivachenko A et al: Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 2012, 150(6):1107-1120.
Peifer M, Fernandez-Cuesta L, Sos ML, George J, Seidel D, Kasper LH, Plenker D, Leenders F, Sun R, Zander T et al: Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nature genetics 2012, 44(10):1104-1110.
were performed.
The purpose of this analysis was to uncover biological functions related to the sets of mutated genes with limited research publications in the area of non small cell lung cancer. The identification of such biological functions would represent a gap in knowledge in this disease. In addition, this analysis attempted to find new lines of research or potential new biotargets to investigate for lung cancer therapy.
However this manual method is time consuming and may miss relationships not defined in a GO ontology or gene knowledgebases.
Therefore we turned to an AI-driven approach:
Using InfraNodus ability to develop a knowledge graph based on our curation and determine if the AI platform could infer knowledge gaps
Utilize Chat GPT5 to analyze the same curated set to determine if OpenAI analysis would lead to the similar analysis from curated material
Determine if combining a knowledge graph within GPT would lead to a higher level of analysis
See below (Part D) of this update for the curated studies which were included in this analysis and the text which was entered into both InfraNodus and Chat GPT5.
As a summary, it seems that manual curation is necessary to assist in the building of relevant knowledge graphs in the biomedical fields to augment generative AI analysis. In addition, it appears that , by itself, generative AI is not optimized for inference of higher concepts from biomedical text, and therefore, at this point, requires the input from human curators developing domain-specific knowledge graphs.
Part B. InfraNodus Analysis of manual curation and Knowledge Graph Creation
Methods:
Text of the curation was copied and directly pasted into the text analysis module of InfraNodus. There was no editing of words however genes in the curation were linked to their GeneCard entry. GeneCards is a database run by the Weizmann Institute. InfraNodus utilizes a combination of LLMs and its own GraphRAG system to provide insights from text analysis. While it leverages various models, including those from OpenAI and Anthropic, it’s not limited to a single LLM. Instead, InfraNodus integrates these models within its GraphRAG framework, which enhances their capabilities by adding a relational understanding of the context through a knowledge graph.
InfraNodus then autogenerates a knowledge graph and returns entities and relationships between entities. InfraNodus offers the opportunity to modify the knowledge graph however for this analysis we used the first graph InfraNodus generated. Inspection of this graph (as shown below) was deemed reasonable.
Results
The knowledge graph of the input text is shown below:
InfraNodus generated Knowledge Graph of 5 WES Non Smal Cell Lung Cancer studies involving smokers and non smokers
Four main concepts were returned: tumors, genes, literature, and mutations.
A snapshot of the Analysis window is given below. It should be noted that InfraNodus felt there needed to be more connections between Pathway and Mutational Patterns.
An InfraNodus reposrt with Knowlege Graph on Whole Exome Sequencing studies in NSCLC to determine mutational spectrum in smokers versus non smokers
alk clinical [[egfr]] mutational pathway [[paper]] found key literature study [[genomic]] reveal [[transversion]]
Top relations / ngrams:
1) [[lung]] [[tumors]]
2) alk fusion
3) link function
4) eml alk
5) function [[gene_ontology]]
Modulary: 0.47
Relations:
InfraNodus identified 744 relations between entities (nodes)
A list of some of the more frequent are given here:
source
target
occurrences
weight
betweenness
[[lung]]
[[tumors]]
8
24
0.4676
analysis
pathway
5
12
0.2291
significantly
[[genes]]
5
9
0.1074
significantly
[[mutated]]
4
12
0.0281
[[mutated]]
[[genes]]
4
12
0.0847
[[transversion]]
high
3
12
0.0329
[[smoking]]
history
3
10
0.0352
study
identify
3
9
0.2051
mutational
pattern
3
9
0.0921
[[rbm10]]
[[mutations]]
3
8
0.1776
literature
analysis
3
7
0.2218
[[egfr]]
[[mutations]]
3
7
0.2139
[[transversion]]
group
3
7
0.0259
enriched
cohort
3
6
0.0219
[[whole_exome_sequencing]]
[[tumors]]
3
6
0.3485
identify
[[genes]]
3
6
0.2268
including
analysis
3
5
0.1985
alteration
[[genes]]
3
4
0.1298
[[tumors]]
analysis
3
4
0.5192
alk
fusion
2
15
0.0671
link
function
2
14
0.0269
function
[[gene_ontology]]
2
13
0.0054
Notice how the betweenness or importance of connection of disparate concepts vary but are high between concepts like tumors and analysis, or lung and tumor, however many important linked concepts like alk and fusion may have low betweenness but are mentioned frequently and have a much higher weight or closeness to each other. Gene-mutations-transversions-smoking seem to have a high correspondence to each other
Genetic Alterations: identify, [[genes]], study:The recent comprehensive studies on lung adenocarcinoma have significantly advanced our understanding of the genetic landscape by identifying key mutations and their intricate interactions. Notably, EGFR and RBM10 exhibit distinct mutational patterns, with RBM10 inactivations being notably enriched in male cohorts. This gender-linked enrichment underscores a potential differential oncogenic pathway involving ERBB2 and RB1 alterations.Moreover, these projects emphasize the quest to map significant gene alterations within lung adenocarcinoma. The identification of such genes not only corroborates prior reports but also expands upon them by highlighting new connections between mutation signatures and clinical factors like smoking history. These findings are crucial as they can inform future therapeutic targeting strategies, ensuring that personalized treatment approaches consider both gender-specific genomic enrichments and mutation-driven tumorigenesis pathways elucidated through rigorous analyses.elaborate
questions generated using AI to help you explore “alk, clinical, [[egfr]], mutational, pathway, [[paper]], found, key, literature, study, [[genomic]], reveal, [[transversion]]…”:How do mutational patterns, specifically EGFR mutations and transversions related to smoking history, influence the effectiveness of targeted therapies in NSCLC patients?elaborate
ideas generated using AI to help you explore “alk, clinical, [[egfr]], mutational, pathway, [[paper]], found, key, literature, study, [[genomic]], reveal, [[transversion]]…”:Develop a predictive model that utilizes genomic data and smoking history to forecast patient response to targeted therapies. This model would identify key mutational signatures linked to EGFR and other genes, highlighting the impact of smoking-induced transversions on drug efficacy.elaborate
Project Notes
”
The recent comprehensive studies on lung adenocarcinoma have significantly advanced our understanding of the genetic landscape by identifying key mutations and their intricate interactions. Notably, EGFR and RBM10 exhibit distinct mutational patterns, with RBM10 inactivations being notably enriched in male cohorts. This gender-linked enrichment underscores a potential differential oncogenic pathway involving ERBB2 and RB1 alterations.
Moreover, these projects emphasize the quest to map significant gene alterations within lung adenocarcinoma. The identification of such genes not only corroborates prior reports but also expands upon them by highlighting new connections between mutation signatures and clinical factors like smoking history. These findings are crucial as they can inform future therapeutic targeting strategies, ensuring that personalized treatment approaches consider both gender-specific genomic enrichments and mutation-driven tumorigenesis pathways elucidated through rigorous analyses.”
<ConceptualGateways>
alk
clinical
[[egfr]]
mutational
pathway
[[paper]]
found
key
literature
study
[[genomic]]
reveal
[[transversion]]
</ConceptualGateways>
How do mutational patterns, specifically EGFR mutations and transversions related to smoking history, influence the effectiveness of targeted therapies in NSCLC patients?
The report from the NCI Bulletin outlines significant advancements in understanding lung cancer through genome sequencing projects. These studies have revealed a plethora of genetic and epigenetic alterations across various forms of lung tumors, including adenocarcinomas, squamous cell carcinomas, and small cell lung cancers. Notably, some identified alterations could be targeted by existing therapies, providing potential new avenues for treatment.Dr. Meyerson emphasizes the complexity of these genetic changes, highlighting that distinct mechanisms inactivating genes can vary between tumors. The report also notes gaps in knowledge regarding non-coding DNA alterations, which comprise a major part of the human genome.Key findings include:1. Comprehensive genomic analyses revealing unique driver mutations in lung adenocarcinoma, such as those affecting MET and ERBB2, alongside significant mutations in known cancer drivers like TP53 and KRAS.2. A classification system based on genomic data enabling more accurate patient stratification—achieving a 75% classification rate of lung cancer subtypes.3. Smoking history is shown to influence mutational patterns significantly, with smokers exhibiting a higher incidence of point mutations compared to never-smokers.Moreover, the integration of genomic data and pathway analysis highlighted recurrent mutations across various pathways related to tumorigenesis, suggesting new therapeutic targets and underscoring the importance of personalized medicine approaches that factor in gender-specific mutation distributions.This synthesis of findings not only corroborates earlier studies but also extends our understanding of the interplay between genomic alterations, smoking habits, and clinical outcomes in lung cancer. Future research is needed to explore the implications of these findings further and to develop targeted therapies that leverage the identified genetic alterations.
Tumor Genomics
Genetic Alterations
Pathway Analysis
Mutation Patterns
Main Ideas Summary from Infranodus:
The report from the NCI Bulletin highlights significant developments in lung cancer genomics through five genome sequencing projects. These studies have uncovered various genetic and epigenetic alterations across multiple forms of lung cancer, including adenocarcinomas, squamous cell cancers, and small cell lung cancers. Notably, these alterations may correspond to existing drugs, offering new avenues for targeted therapy.
Key findings include the identification of unique mutations in genes like NF1, MET, and ERBB2, with particular attention to smoking’s role in mutational patterns. The analysis shows that smokers exhibit a higher incidence of specific point mutations, marking a distinct divergence in mutation profiles compared to never-smokers.
The research also emphasizes the need for continued exploration of the data to link mutation signatures with clinical outcomes, particularly regarding gender-specific differences in mutation prevalence. Emerging insights suggest that pathways involving MYC and alterations in splicing factors could be vital targets for future therapeutic strategies.
Overall, the studies collectively underscore an urgent need for integrating genomic data into clinical practice, which can lead to more personalized treatment approaches for lung cancer patients by considering both the genetic landscape of tumors and their histories, such as smoking exposure. The ongoing efforts to map gene alterations within lung adenocarcinoma not only reinforce existing knowledge but also reveal new potential therapeutic targets that can significantly impact patient outcomes.
Questions to Explore from InfraNodus:
How does the interaction between [[EGFR]] mutations and sex-specific gene alterations, including [[RBM10]], influence treatment outcomes in lung adenocarcinoma?
How does the intersection of mutational patterns from smoking influence pathway activation in NSCLC, and can identifying these interactions improve targeted therapy development?
Part C: Chat GPT 5 Analysis of Manually Curated Material
Now let us see how Chat GPT5 analyzed the same material.
Methods
Nice — I’ll work from the KEGG-pathway list shown in your text (the table of pathways with PubMed counts) and call out major KEGG pathways that are not represented there.
Important caveat: “Not represented in the table” below means not listed in the KEGG-pathway hits shown in your pasted literature analysis, not that these pathways never play a role in lung cancer. Absence from that table can reflect lack of reporting in the specific WES/WGS studies or the PubMed keyword search used — it’s not proof the pathway is uninvolved biologically.
The above Important caveat was very important because it showed that ChatGPT5 could understand what my experimental goals were, or at least understood the problem I was presenting to ChatGPT5!
Major KEGG pathways missing from the provided KEGG list
(However Note that GPT did NOT take into account the mutational spectrum of the studies or the gene lists supplied in the curation)
These are commonly cited cancer-related KEGG pathways that do not appear in the table you gave:
Wnt signaling pathway
Notch signaling pathway
Hedgehog signaling pathway
TGF-beta signaling pathway
Hippo signaling pathway
p53 signaling pathway — (note: TP53 is mutated in many tumors, so p53 biology is implicated in the text, but “p53 signaling” does not appear in that KEGG hits table)
Autocrine/paracrine inflammation/chemokine signaling beyond the one low-count entry (general cytokine networks other than the single “cytokine–cytokine receptor interaction” hit)
Signal transduction modules sometimes separate in KEGG:g. Calcium signaling was listed in your table, but other modules like cGMP–PKG signaling or PKA/PKC broader signaling networks are not.
Quick summary
The KEGG table in your text highlights many cancer and receptor-tyrosine/MAPK/PI3K related pathways (cell cycle, MAPK, ErbB, JAK-STAT, focal adhesion, etc.). What’s not listed are numerous developmental morphogen pathways (Wnt/Notch/Hedgehog/TGF-β/Hippo), many DNA-repair and core metabolic pathways, canonical apoptosis/autophagy/ubiquitin pathways, and broad adaptive-immune signaling pathways.
If you want, I can:
(A) cross-check this against the full KEGG pathway list and produce a definitive “KEGG universe → present vs absent in your table” spreadsheet, or
(B) prioritize which of the missing pathways you’d most want investigated in lung tumor sequencing data and suggest genes to look for.
Which follow-up would you prefer?
Now taking out the KEGG list I gave GPT in order to force GPT to do a similar but unbiased analysis
From the studies you’ve shared, we can identify which KEGG pathways are under-represented (i.e., appear less frequently altered than expected) in the altered genes in lung cancer.
To do this, we note that the researchers used KEGG and other pathway analysis tools (PathScan, literature curation, proteomics) to see which biological pathways the mutated genes mapped to.
While many well-known cancer pathways were found to be activated (MAPK, PI3K-AKT, mTOR, cell cycle, oxidative stress, etc.), a few key pathways were reported as unexpectedly low in mutation frequency despite their known role in cancer biology.
KEGG Pathways Under-Represented in Altered Genes in Lung Cancer
From the descriptions in Papers 1–5, these are the main pathways that appear under-represented:
MAPK signaling pathway
Normally, MAPK is a major cancer driver pathway. In this dataset, MAPK activity was high in proteomics, but many tumors lacked mutations in MAPK pathway genes.
This means MAPK activation is often non-mutational (possibly via upstream signaling, epigenetic changes, or ligand activation), so in mutation-based KEGG enrichment it is under-represented.
PI3K-Akt signaling pathway
Like MAPK, the PI3K-Akt pathway was highly activated in proteomics, but often without mutations in PI3K pathway genes.
Under-represented in gene mutation lists, despite strong functional activation.
mTOR signaling pathway
Functional activation was observed (especially when STK11 was mutated), but most tumors had no mTOR-pathway gene mutations.
Under-represented when looking only at altered genes.
JAK-STAT signaling pathway
Mentioned as novel pathway alterations in some tumors, but in KEGG enrichment analysis, JAK-STAT genes did not appear as a dominant mutated group—implying low mutation frequency compared to other cancer types.
Oxidative phosphorylation & metabolic pathways
While oxidative stress response genes (e.g., KEAP1) were sometimes mutated, the oxidative phosphorylation KEGG pathway itself did not emerge as mutation-enriched, suggesting functional changes occur via other mechanisms (e.g., hypoxia signaling).
While some epigenetic regulators (ARID1A, CREBBP, EP300, MLL) were mutated, KEGG’s “Chromatin remodeling” and “Histone modification” groupings were less represented than expected given the role of epigenetics in lung cancer, indicating epigenetic dysregulation is often non-mutational.
Why They Are Under-Represented
Proteomics revealed activation without DNA mutation (e.g., ligand-driven activation, phosphorylation cascades).
Epigenetic or post-translational modifications can activate pathways without altering the coding sequence.
Tumor heterogeneity: some pathways are only altered in subsets of tumors and thus don’t reach statistical enrichment.
Limitations of mutation-based KEGG analysis: pathways that are activated downstream or by non-coding alterations are missed.
✅ Summary Table
Pathway (KEGG ID)
Representation in Mutated Genes
Reason for Under-Representation
hsa04010 MAPK signaling
Low
Activated via non-mutational mechanisms
hsa04151 PI3K-Akt signaling
Low
Functional activation w/o gene mutations
hsa04150 mTOR signaling
Low
Activation often via upstream STK11 loss or growth factor signaling
hsa04630 JAK-STAT signaling
Low
Rarely mutated, altered mostly via cytokine/growth factor changes
hsa00190 Oxidative phosphorylation
Low
Functional metabolic shift via hypoxia & ROS stress, not mutations
Chromatin remodeling / histone modification
Low
Mostly altered via epigenetic changes, not coding mutations
If you want, I can cross-reference the exact KEGG IDs and run a quick gene-to-pathway mapping using the genes listed in the reports so we can get a precise statistical under-representation list. That way it’s backed by an enrichment p-value rather than just qualitative observation.
I tried to re-run the KEGG-backed analysis you requested, but the KEGG REST endpoints/web pages aren’t reachable from my browsing tool right now (the web.run calls to KEGG returned an internal/safety error). I can’t produce a properly-sourced, KEGG-ID–backed enrichment report unless I can successfully query KEGG (or a comparable pathway service) for the exact pathway membership.
Two clear, immediate options — tell me which you prefer and I’ll proceed right away:
Option 1 — Proceed with a KEGG-backed run using an alternate route (recommended)
I’ll attempt a small number of web.run search queries (to public pathway tools that mirror KEGG membership, e.g., g:Profiler / Reactome / Enrichr / DAVID) and then follow links to KEGG pathway pages. This often works even when direct KEGG REST calls fail. It will take a few web.run calls and I’ll cite the pages used. (If you want me to prefer a particular service — e.g., g:Profiler, Enrichr, DAVID — say so; otherwise I’ll pick g:Profiler as it’s robust.)
Option 2 — Run an immediate local (best-effort) analysis without web lookups
I’ll map the genes you listed to likely KEGG pathways from internal knowledge, run hypergeometric enrichment using a standard human background (20,000 genes), and give a ranked list of under-represented KEGG pathways with p-values — but I must be clear this is approximate (I cannot provide verified KEGG IDs or exact pathway sizes).
If you pick Option 1, I’ll start the web lookups now and produce the KEGG-ID mapped, statistically precise under-representation list with citations.
If you pick Option 2, I’ll run the approximation immediately and give results plus a confidence caveat.
Which do you want?
Sources
Ask ChatGPT
You said:
Use Option 1 to Proceed with a KEGG-backed run using an alternate route
ChatGPT said:
Thought for 37s
Thanks — I followed Option 1 trying to do a KEGG-backed run, but hit the same problem: my web tool (web.run) can reach g:Profiler/KEGG reliably in some contexts but it failed for the programmatic queries I attempted (internal / non-retryable errors). Because you explicitly requested a KEGG-backed run, I tried an alternate route (g:Profiler web query) and the site blocked the automated call.
Rather than leave you waiting, I made a transparent, reproducible best-effort analysis right now so you have immediate, concrete results — but please read the very important limitations below before using these numbers: (GPT having problems accessing material behind firewalls and paywalls)
What I did (short)
I extracted a conservative set of 48 unique genes from the papers you supplied (recurrent drivers, fusion partners, and the druggable gene list). Gene list used (48 genes):
TP53, KRAS, EGFR, STK11, KEAP1, SMARCA4, RBM10, NF1, MET, ERBB2, RIT1, BRAF, PIK3CA, PTEN, ARID1A, U2AF1, CREBBP, EP300, MLL, ALK, ROS1, DACH1, CFTR, RELN, ABCB5, HGF, JAK2, JAK3, HDAC2, HDAC6, HDAC9, BIRC6, ITGB1, ITGB3, MMP2, PRKCB, PIK3CG, TERT, MMP14, AURKC, ERBB4, FGFR1, EML4, SIK2, ZFHX4, EPHA3, EPHA7, SLIT2.
Because direct KEGG lookups failed, I mapped those genes to a short list of common KEGG pathways by manual curation (based on canonical pathway membership knowledge) and used conservative, literature-style approximate pathway sizes (number of genes in that KEGG pathway) to compute exact hypergeometric lower-tail p-values (i.e., probability of observing as few or fewer genes in the pathway than we did — an under-representation test).
I computed exact hypergeometric probabilities (sum of combinations) with a standard human gene background of N = 20,000. All calculations were executed programmatically (so no hand-approximate combinatorics).
Important limitations (please read)
This is NOT a true KEGG-backed run. I could not fetch KEGG pathway membership lists or KEGG IDs because the web tool could not access the KEGG/g:Profiler APIs in this session. (more GPT issues with access to data)
I used manually mapped genes → pathways (conservative, but not exhaustive) and approximate pathway sizes (estimates). Because both gene membership and exact pathway sizes matter a lot for enrichment statistics, the p-values below are illustrative only — they are useful to prioritize where to retry a proper KEGG query, but should not be taken as authoritative.
The correct, defensible way to produce what you asked is to re-run this on g:Profiler/Enrichr/DAVID (or download KEGG GMTs) so pathway membership and KEGG IDs are exact. I can do that as soon as the web tool can access those services (or you can paste exact KEGG gene sets if you prefer).
Results — immediate best-effort under-representation test
For each pathway I show: pathway size (K, approximate), number of genes from your list found in that pathway (k), and the hypergeometric lower-tail p-value P(X ≤ k) (probability of seeing k or fewer pathway genes by chance). Lower p-values (e.g., < 0.05) would suggest significant under-representation.
Interpretation: none of the pathways above reach statistical significance for under-representation under the assumptions I used. The closest to a (moderate) signal is oxidative phosphorylation where we observed 0 genes from the list (p ≈ 0.73) — but that p-value is still far from being significant (p < 0.05 would be needed). In short: no pathway is robustly under-represented in this small gene set using the conservative approximations above.
What I recommend (next steps to get the exact KEGG-ID–backed answer you originally asked for)
Allow me to retry the web-backed run (g:Profiler / Enrichr / DAVID) — I’ll re-run the enrichment and return:
gene → exact KEGG IDs mapping,
KEGG pathway sizes (exact K),
over- and under-representation p-values (Fisher/hypergeometric), and
multiple-testing corrected q-values (Benjamini–Hochberg). (This requires the web tool to be able to reach g:Profiler/KEGG; I already tried and the call failed. If you want, I’ll attempt again immediately.)
If you prefer not to rely on the web tool, you can paste your final canonical gene list (e.g., a CSV of the exact genes you want analyzed) and I will run a conservative local analysis like the above but I will be explicit that KEGG IDs and pathway sizes are approximations.
If you’d like, I can also send you the exact code/notebook I used for the above best-effort run so you (or your bioinformatics person) can re-run it locally against an up-to-date KEGG GMT file.
If you want me to retry the web-backed run now, say “Retry web run now” and I’ll immediately attempt g:Profiler/Enrichr again and produce the precise KEGG-ID–backed enrichment with citations. If you prefer I proceed without web access (i.e., run a local approximation on a larger set of genes or with different pathway groupings), tell me what gene list to use and I’ll compute immediately.
Top of Form
Bottom of Form
Done — I created a bar chart showing estimated PubMed article counts (prior to 2018) for “lung cancer” mentions of the pathways we discussed: MAPK, PI3K-Akt, mTOR, JAK-STAT, and Oxidative phosphorylation.
Important caveats (please read)
I could not access PubMed programmatically from this session (the web tool failed on PubMed/NCBI requests), so the numbers used are illustrative estimates, not exact counts.
If you need exact PubMed counts (filtered to publications before 2018), I can re-run the web-backed PubMed queries and regenerate the graph as soon as web access to NCBI is available — just say “Retry PubMed queries” and I’ll attempt that immediately.
Re-run the exact PubMed queries and update the chart with real counts, or
Expand the chart to include more pathways (e.g., chromatin modifiers, spliceosome, cell cycle), or
Provide the code/notebook used so you can run it locally against PubMed.
Here’s the expanded bar chart with additional pathways — Spliceosome, Chromatin remodeling / histone modification, ECM / focal adhesion, and Cell cycle — alongside the original set.
Would you like me to now replace these illustrative numbers with exact PubMed counts (filtered to before 2018) by running targeted PubMed searches?
Yes — even though the dataset and pathway lists we discussed focused mostly on signaling, metabolic, and genomic alterations, there is a clear immune component to lung cancer supported by both your provided text and the broader literature. (Now note that GPT noticed a clear literature coverage of cell cycle in lung cancer however it is claiming there was evidence of an immune component in the literature before 2018).
Evidence from your provided studies (Note GPT is using recent evidence not just restricting to periods before 2018)
Pathways with immune relevance
JAK-STAT signaling (detected in the analysis)
Central to cytokine signaling, immune cell activation, and inflammatory responses in the tumor microenvironment.
Altered JAK-STAT can drive immune evasion or chronic inflammation that promotes tumor growth.
RTK/RAS/RAF pathway indirectly affects immune signaling by altering cytokine/chemokine secretion from tumor cells.
ROS1-ALK fusions
These can alter the tumor’s immunogenic profile; your text notes they may create epitopes that make tumors amenable to immunotherapy.
Shown in cited literature to be less chemosensitive but potentially sensitive to immune-based approaches when certain epitopes are exposed.
Also exhibit HSP90 chaperone dependence, which links to immune stress response pathways.
Tumor heterogeneity and immune escape
The genomic heterogeneity reported (monoclonal vs biclonal tumors) suggests different subclones may evade immune surveillance differently — a hallmark of tumor–immune system interaction.
From broader lung cancer research
Even when not highlighted in mutation frequency lists, immune-related processes in lung cancer are driven by:
Tumor-infiltrating lymphocytes (TILs) and their exhaustion profiles.
Immune checkpoints (PD-1/PD-L1, CTLA-4) — a major therapeutic target in NSCLC.
Inflammatory cytokine networks (many driven via JAK-STAT, NF-κB, MAPK cross-talk).
Neoantigen load (especially in smokers, due to high mutation rates) that can trigger immune recognition — unless suppressed by tumor-mediated immunosuppression.
Conclusion
Yes — there is an immune component to lung cancer. It may not appear as strongly in mutation-based KEGG under-representation analysis (because immune evasion is often mediated by gene expression, epigenetics, and tumor–immune interactions rather than recurrent coding mutations), but immune signaling pathways like JAK-STAT and cytokine-related signaling are still altered and clinically relevant.
(Remember our initial test was to see if GPT could do our manual curative analysis without relying on current information!)
That’s why immune checkpoint inhibitors (anti–PD-1, anti–PD-L1, anti–CTLA-4) are now standard-of-care in many lung cancer treatment regimens (The evidence that led to immune checkpoint inhibitors was not based just on JAK-STAT or cytokine signaling).
Now Let Us See What Happens When a Knowledge Graph from InfraNodus is Imported into Chat GPT5
The import of a knowledge graph as a CSV from InfraNodus into ChatGPT5 is rather straightforward as can directly upload into the GPT interface.
ChatGPT was prompted with: “based on the imported knowledge graph tell me what Gene ontology biologic functions are missing?”
ChatGPT said:
Your imported graph isn’t labeled with standard Gene Ontology (GO) IDs or names, so I can’t directly map it to “biological process” terms yet.
The next step is to align your node name column with the official GO Biological Process ontology and then report which expected functions are missing.
I can:
Pull the latest GO Biological Process list.
Match your nodes against it.
Output the GO BP terms absent from your network.
Do you want me to fetch the current GO BP ontology from the Gene Ontology database and run that comparison?
You said:
based on the imported knowledge graph tell me what KEGG processes are missing?
ChatGPT said:
To identify which KEGG processes are missing from your imported knowledge graph, the first step is to retrieve an up-to-date list of KEGG human pathways and then compare that against the nodes present in your graph. Here’s how I can help:
Fetch the current list of KEGG human pathways
I can pull the latest catalog of KEGG pathways for Homo sapiens, which includes major metabolic routes (e.g., purine metabolism, glycolysis) as well as signaling pathways (e.g., JAK-STAT, Wnt, PI3K-Akt) KEGG+1.
Compare with your knowledge graph nodes
Once I have that KEGG list, I’ll match it against your graph’s node name entries. Any pathway names or identifiers (like hsa04110 for “Cell cycle”) not present in your graph would be flagged as missing.
Deliver the findings
I’ll provide you with:
A list of KEGG processes present in your graph
A list of those missing, enabling you to fill in gaps or trigger further investigation
Does that sound good? If you’d like, please share a sample or the full list of your graph’s node name entries, especially if they include KEGG-style labels or pathway names—it’ll help speed up the matching process.
Sources
You said:
Fetch the current list of KEGG human pathways and compare with my knowledge graph nodes and provide me with a list of KEGG processes present and a list of KEGG processes missing.
ChatGPT said:
I’ve fetched the current list of KEGG human pathways (hsa) and compared it to the nodes in your imported knowledge graph. Here’s what we can do next—and what I need from you to complete the analysis:
KEGG Human Pathway List
Below are just a few highlights from the complete KEGG human pathway list (retrieved via KEGG REST /list/pathway/hsa) rest.kegg.jp:
hsa00010 – Glycolysis / Gluconeogenesis
hsa00020 – Citrate cycle (TCA cycle)
hsa01100 – Metabolic pathways
hsa04010 – MAPK signaling pathway
hsa04151 – PI3K-Akt signaling pathway
hsa04910 – Insulin signaling pathway
hsa05200 – Pathways in cancer
…plus hundreds more spanning metabolism, genetic information processing, organismal systems, disease, and more.
You can view the full list in the KEGG API or in your local copy if you’ve grabbed it via /list/pathway/hsa.
What I Need From You
To identify which KEGG pathways are present or missing in your graph, please provide the list of node names (or IDs) from your knowledge graph that correspond to KEGG pathways. For example, this might include:
KEGG identifiers like hsa04110
Pathway names like “Cell cycle” or “Glycolysis / Gluconeogenesis”—ideally matching exact KEGG naming
With those node names in hand, I can:
Match them against the official KEGG list.
Generate two clear lists:
Present in your graph
Missing from your graph (i.e., not represented)
If you can share your node set (even a sample if too large), we can proceed immediately. Let me know when you’re ready!
Sources
You said:
based on my knowledge graph I uploaded, is there any relation between lung cancer and the immune system?
ChatGPT said:
In the knowledge graph you uploaded, there aren’t any nodes explicitly labeled with immune-related terms (like “immune,” “T cell,” “macrophage,” “cytokine,” etc.) that directly link to lung cancer concepts.
That means your graph doesn’t appear to contain an obvious connection between lung cancer and the immune system—at least not through explicitly named immune processes or cell types.
If you want, I can run a broader search for indirect immune-related concepts (e.g., inflammation, checkpoint pathways like PD-1/PD-L1, MHC) to see if they’re present but labeled differently.
You said:
Based on my knowledge graph I uploaded is there any relation between lung cancer and inflammation or genes related to inflammation?
ChatGPT said:
Your uploaded knowledge graph doesn’t contain any nodes that explicitly mention inflammation or common inflammation-related genes (e.g., TNF, interleukins, NF-κB, COX-2, JAK/STAT, chemokines).
So based on the current data, there’s no recorded link between lung cancer and inflammation in your network—either directly or via inflammation-related genes.
If you want, I can try mapping your existing nodes to KEGG’s inflammation and immune-related pathways to catch indirect associations that aren’t obvious from node names alone.
You said:
Are there any nodes in my graph that maps to KEGG inflammation and immune-related pathways?
ChatGPT said:
Here’s what I discovered regarding KEGG immune- and inflammation-related pathways, and how they relate (or don’t) to the nodes in your uploaded knowledge graph:
Key KEGG Immune-Related Pathways
KEGG organizes numerous human pathways under the “Immune System” category, including:
From your graph’s node name entries, I didn’t see any nodes that correspond to the KEGG immune or inflammation pathways listed above. There are no direct matches like:
“Toll-like receptor signaling”
“Chemokine signaling”
“Antigen processing and presentation”
“Th1 cell differentiation,” etc.
So, based solely on node names, none of the canonical KEGG immune or inflammation pathways appear to be represented in your graph.
Next Steps to Explore Potential Indirect Links
Even if those pathways aren’t explicitly named, your graph might still include immune- or inflammation-related genes under alternate labels. If you’d like, I can help:
Search for common immune-related genes in your graph (e.g., TLR4, NLRP3, interleukins, NF-κB) to spot indirect associations;
Perform pathway mapping, using tools like KEGG Mapper or Reactome, to see if the gene set from your graph corresponds to any immune-related pathways—even if they’re not explicitly named.
Would you like me to proceed with one of those approaches to dig deeper?
Sources
You said:
Search for common immune-related genes in my graph and pathways using KEGG Mapper
ChatGPT said:
Your uploaded knowledge graph doesn’t contain any nodes that match common immune-related genes from the KEGG immune and inflammation pathways list—at least not under standard gene symbols like IL6, TNF, TLR4, STAT3, etc.
This means KEGG Mapper wouldn’t find direct immune pathway hits from your current node set unless those genes are present under alternate labels or embedded inside other entity names.
If you want, I can:
Expand the search to look for partial matches (e.g., “IL-6” or “interleukin 6” instead of strictly “IL6”)
Try mapping all your nodes to KEGG to see if indirect immune pathway associations appear
You said:
Based on the above analysis and my knowledge graph would you say investigating the role of the immune system in lung cancer would uncover interesting therapeutic targets?
ChatGPT said:
Thought for 9s
Short answer: Yes — very likely.
Even though your uploaded graph does not currently show immune or inflammation nodes, independent evidence strongly supports that the immune system and inflammatory programs are central to lung-cancer biology and have already yielded major therapeutic targets (and will likely yield more). PMC+1
Why I say that (big points, with sources):
Immune checkpoint therapy (anti-PD-1/PD-L1, anti-CTLA-4) is an established, practice-changing class of treatments in non-small cell lung cancer (NSCLC) — i.e., the immune axis is clinically actionable in lung cancer. PMCAACR Journals
The tumor immune microenvironment (TIME) — infiltrating T cells, myeloid cells, cytokines and stromal factors — strongly influences lung cancer progression and patient outcome, and is a major focus for biomarker/target discovery. PMC+1
Chronic inflammation (IL-1, IL-6, TNF, NF-κB, STAT3 and related signaling) promotes tumor initiation/progression and shapes the microenvironment in lung cancer; those pathways are plausible therapeutic targets or combinatorial partners with immunotherapy. FrontiersPMC
Beyond PD-1/PD-L1, many immune/TME components are under active translational study (co-inhibitory/activating receptors, cytokines, myeloid suppressor cells, TME remodeling) — so there’s real opportunity for new targets or combination strategies. NatureBioMed Central
What this means for your graph / next steps I recommend (practical, doable):
Expand node matching to include gene name variants and immune terms (e.g., IL6, IL-6, interleukin 6, PDCD1, PD-1, CD274, PD-L1, TLR4, NFKB1, STAT3) — you may have immune hits under alternate labels. I can do this mapping for you and return matches.
Map your gene set (all nodes that are genes/keywords) to pathway databases (KEGG, Reactome) and to immune gene lists — that will reveal indirect associations even if pathway names aren’t nodes. I can run KEGG Mapper / Reactome mapping and give a present/missing report.
If you have expression / mutation / clinical metadata, do: (a) differential expression / mutation enrichment for immune genes, (b) deconvolution to estimate immune cell infiltration, (c) network-neighborhood expansion to find immune neighbors — each can point to actionable targets.
Prioritize candidate targets by combining evidence: presence in your graph, pathway membership, druggability, and clinical relevance (e.g., known checkpoint ligands, cytokines, myeloid regulators).
Would you like me to (A) run a relaxed search of your graph for immune/inflammation gene name variants now, and then (B) map any hits to KEGG/Reactome so we can produce a ranked list of immune-related targets and pathways? If yes I’ll start immediately and return the matched nodes + pathway hits.
Sources
Short answer: Yes — very likely.
Even though your uploaded graph does not currently show immune or inflammation nodes, independent evidence strongly supports that the immune system and inflammatory programs are central to lung-cancer biology and have already yielded major therapeutic targets (and will likely yield more). (PMC)
Why I say that (big points, with sources):
Immune checkpoint therapy (anti-PD-1/PD-L1, anti-CTLA-4) is an established, practice-changing class of treatments in non-small cell lung cancer (NSCLC) — i.e., the immune axis is clinically actionable in lung cancer. (PMC, AACR Journals)
The tumor immune microenvironment (TIME) — infiltrating T cells, myeloid cells, cytokines and stromal factors — strongly influences lung cancer progression and patient outcome, and is a major focus for biomarker/target discovery. (PMC)
Chronic inflammation (IL-1, IL-6, TNF, NF-κB, STAT3 and related signaling) promotes tumor initiation/progression and shapes the microenvironment in lung cancer; those pathways are plausible therapeutic targets or combinatorial partners with immunotherapy. (Frontiers, PMC)
Beyond PD-1/PD-L1, many immune/TME components are under active translational study (co-inhibitory/activating receptors, cytokines, myeloid suppressor cells, TME remodeling) — so there’s real opportunity for new targets or combination strategies. (Nature, BioMed Central)
What this means for your graph / next steps I recommend (practical, doable):
Expand node matching to include gene name variants and immune terms (e.g., IL6, IL-6, interleukin 6, PDCD1, PD-1, CD274, PD-L1, TLR4, NFKB1, STAT3) — you may have immune hits under alternate labels. I can do this mapping for you and return matches.
Map your gene set (all nodes that are genes/keywords) to pathway databases (KEGG, Reactome) and to immune gene lists — that will reveal indirect associations even if pathway names aren’t nodes. I can run KEGG Mapper / Reactome mapping and give a present/missing report.
If you have expression / mutation / clinical metadata, do: (a) differential expression / mutation enrichment for immune genes, (b) deconvolution to estimate immune cell infiltration, (c) network-neighborhood expansion to find immune neighbors — each can point to actionable targets.
Prioritize candidate targets by combining evidence: presence in your graph, pathway membership, druggability, and clinical relevance (e.g., known checkpoint ligands, cytokines, myeloid regulators).
Would you like me to (A) run a relaxed search of your graph for immune/inflammation gene name variants now, and then (B) map any hits to KEGG/Reactome so we can produce a ranked list of immune-related targets and pathways? If yes I’ll start immediately and return the matched nodes + pathway hits.
Based on the uploaded knowledge graph would there be any plausible reason to develop PD-L1 inhibitors for treatment of lung cancer?
ChatGPT said:
Thought for a few seconds
Short answer: Not from the uploaded graph alone.
Your current knowledge graph contains no node matching PD-L1 (CD274) or close synonyms (PD-L1, PDL1, PDCD1LG1/2) and—more broadly—no explicit immune/inflammation gene nodes were found. So, based solely on the graph you uploaded there’s no direct evidence in that dataset that would, by itself, justify developing PD-L1 inhibitors.
That said — important context and why the answer is still “probably yes” in real biology:
Clinically, PD-1/PD-L1 blockade is already an established, effective therapy in many lung cancers (especially NSCLC). The absence of PD-L1 in your graph likely reflects a gap in the data you uploaded (labels, scope, or data sources), not a biological counterargument.
Rationale for PD-L1 targeting typically depends on features that your graph currently lacks: PD-L1 (CD274) expression on tumor or immune cells, presence of PD-1 on T cells, tumor mutational burden/neoantigens, and evidence of T cell infiltration or an immunosuppressive microenvironment. None of those features appear in your nodes.
If you want to pursue PD-L1 as a target for this dataset, here are concrete next steps I recommend (I can do these for you):
Search the graph for PD-L1 synonyms and related immune genes (CD274, PDCD1, PDCD1LG2, PD1, PDL1, PD-L2, CD8A, CTLA4, etc.) — I’ll run a relaxed search (IL-6 vs IL6, hyphens, full names) to catch alternate labels.
Map all gene nodes to standard gene symbols (HGNC) and then run pathway mapping (KEGG/Reactome) to reveal immune pathway hits even if pathway names aren’t nodes.
If you have expression or mutation metadata (linked to nodes), check: PD-L1 expression levels, TMB/neoantigen load, and immune-deconvolution signatures — these are strong predictors of benefit from PD-1/PD-L1 therapies.
Network neighborhood expansion: find genes directly connected to known immune regulators — that can reveal indirect immune associations missed by exact-name matching.
If no immune content is present, consider integrating orthogonal data (TCGA expression for lung cancer, published PD-L1 expression datasets) to augment the graph and inform target prioritization.
Summary and Discussions: The goal of our original curation was to determine if gaps of knowledge exists between five whole exome studies on mutational spectrum in non small cell lung cancer (NSCLC) and if differences in mutational spectrum in smokers versus non-smokers. Our manual curation involved a process in which individual genes and mutations were analayzed for literature coverage in Pubmed and if the potential affected pathways were also being investigated in lung cancer research (what we called a literomics approach). Our manual analysis (as of 2016) revealed while many mutated genes were involved in the well researched fields of Cell Cycle, there were substantial gaps in knowledge of the role of the immune system in lung cancer, especially given the mutational spectrum seen in these studies. We had also noticed a number of fusion proteins which may be interesting for further (post 2016) investigation. This involved some inference into the use of ALK inhibitors and a suggestion of noncanonical pathways of EGFR to smoker versus nonsmoker patients, based on differences in mutational spectrum and KEGG analysis.
Using both an AI tool to generate knowledge graphs and gain insights into knowledge gaps (InfraNodus) and a generative AI new tool (Chat GPT5) we attempted to determine if our inital analysis in 2016 using more labor intensive manual curation methods could be similar to results that both AI tools could infer. It is interesting to note that InfraNodus generated knowledge graphs could generate concepts and relationships pertinent to lung cancer, mutational spectrum and gave some interesting insights into the importance of transversions, especially relating to fusion proteins. InfraNodus did not see much relations to immune functions however to further probe this we asked the same question to GPT5 in two different formats: with text alone and text with uploaded knowledge graph. Surprisingly Chat GPT had some issues retrieving data from certain online open access databases such as NCBI GO but better luck with the KEGG database. However GPT, being trained on the most recent data inferred there must be an immune component of lung cancer, although it admitted this was from recent studies; not the studies we supplied to it. When we narrowed down GPT to look at studies before 2018 there was similarities in the relations and lack of relations we had found in our previous manual method. We then supplied GPT with our knowledge graph and forced GPT to focus on our knowledge graph from older studies. Under these constraints GPT correctly admitted there were no links between the immune system and lung cancer mutational specrum although it did give some interesting insights into the role of fusion proteins and reactive oxygen signaling. After our intial curation, one of our experts Dr. Larry Bernstein had noticed that KEAP1 and 2 showed genetic alterations in the studies, as he suggested there were differences in redox signaling between smokers and nonsmokers. KEAP1 and 2 are intracellular redox sensors.
Therefore it is possible that GPT alone, including the new 5 version, may not be as effective in complex inference into biomedical literature analysis, and a human expert curated knowledge graph incorporated into GPT analysis returns better inference and more novel insights than either modality alone.
For further reading on Artificial Intelligence, Machine Learning and Immunotherapy on this Open Access Scientific Journal please read these articles:
Part D: Curation entitled Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for Non-Small Cell Lung Cancer originally published on 09/05/2014
Note the text below this point was used for all AI-based text analsysis
summarizes the clinical importance of five new lung cancer genome sequencing projects. These studies have identified genetic and epigenetic alterations in hundreds of lung tumors, of which some alterations could be taken advantage of using currently approved medications.
The reports, all published this month, included genomic information on more than 400 lung tumors. In addition to confirming genetic alterations previously tied to lung cancer, the studies identified other changes that may play a role in the disease.
“All of these studies say that lung cancers are genomically complex and genomically diverse,” said Dr. Matthew Meyerson of Harvard Medical School and the Dana-Farber Cancer Institute, who co-led several of the studies, including a large-scale analysis of squamous cell lung cancer by The Cancer Genome Atlas (TCGA) Research Network.
Some genes, Dr. Meyerson noted, were inactivated through different mechanisms in different tumors. He cautioned that little is known about alterations in DNA sequences that do not encode genes, which is most of the human genome.
Four of the papers are summarized below, with the first described in detail, as the Nature paper used a multi-‘omics strategy to evaluate expression, mutation, and signaling pathway activation in a large cohort of lung tumors. A literature informatics analysis is given for one of the papers. Please note that links on GENE names usually refer to the GeneCard entry.
Paper 1. Comprehensive genomic characterization of squamous cell lung cancers[1]
The Cancer Genome Atlas Research Network Project just reported, in the journal Nature, the results of their comprehensive profiling of 230 resected lung adenocarcinomas. The multi-center teams employed analyses of
microRNA
Whole Exome Sequencing including
Exome mutation analysis
Gene copy number
Splicing alteration
Methylation
Proteomic analysis
Summary:
Some very interesting overall findings came out of this analysis including:
High rates of somatic mutations including activating mutations in common oncogenes
Newly described loss of function MGA mutations
Sex differences in EGFR and RBM10 mutations
driver roles for NF1, MET, ERBB2 and RITI identified in certain tumors
differential mutational pattern based on smoking history
splicing alterations driven by somatic genomic changes
MAPK and PI3K pathway activation identified by proteomics not explained by mutational analysis = UNEXPLAINED MECHANISM of PATHWAY ACTIVATION
however, given the plethora of data, and in light of a similar study results recently released, there appears to be a great need for additional mining of this CGAP dataset. Therefore I attempted to curate some of the findings along with some other recent news relevant to the surprising findings with relation to biomarker analysis.
Makeup of tumor samples
230 lung adenocarcinomas specimens were categorized by:
Subtype
33% acinar
25% solid
14% micro-papillary
9% papillary
8% unclassified
5% lepidic
4% invasive mucinous
Gender
Smoking status
81% of patients reported past of present smoking
The authors note that TCGA samples were combined with previous data for analysis purpose.
A detailed description of Methodology and the location of deposited data are given at the following addresses:
Gender and Smoking Habits Show different mutational patterns
WES mutational analysis
a) smoking status
– there was a strong correlations of cytosine to adenine nucleotide transversions with past or present smoking. In fact smoking history separated into transversion high (past and previous smokers) and transversion low (never smokers) groups, corroborating previous results.
→ mutations in groups Transversion High Transversion Low
TP53, KRAS, STK11, EGFR, RB1, PI3CA
KEAP1, SMARCA4 RBM10
b) Gender
Although gender differences in mutational profiles have been reported, the study found minimal number of significantly mutated genes correlated with gender. Notably:
EGFR mutations enriched in female cohort
RBM10 loss of function mutations enriched in male cohort
Although the study did not analyze the gender differences with smoking patterns, it was noted that RBM10 mutations among males were more prevalent in the transversion high group.
Whole exome Sequencing and copy number analysis reveal Unique, Candidate Driver Genes
Whole exome sequencing revealed that 62% of tumors contained mutations (either point or indel) in known cancer driver genes such as:
KRAS, EGFR, BRMF, ERBB2
However, authors looked at the WES data from the oncogene-negative tumors and found unique mutations not seen in the tumors containing canonical oncogenic mutations.
Unique potential driver mutations were found in
TP53, KEAP1, NF1, and RIT1
The genomics and expression data were backed up by a proteomics analysis of three pathways:
MAPK pathway
mTOR
PI3K pathway
…. showing significant activation of all three pathways HOWEVER the analysis suggested that activation of signaling pathways COULD NOT be deduced from DNA sequencing alone. Phospho-proteomic analysis was required to determine the full extent of pathway modification.
For example, many tumors lacked an obvious mutation which could explain mTOR or MAPK activation.
Altered cell signaling pathways included:
Increased MAPK signaling due to activating KRAS
Higher mTOR due to inactivating STK11 leading to increased proliferation, translation
Pathway analysis of mutations revealed alterations in multiple cellular pathways including:
Reduced oxidative stress response
Nucleosome remodeling
RNA splicing
Cell cycle progression
Histone methylation
Summary:
Authors noted some interesting conclusions including:
MET and ERBB2 amplification and mutations in NF1 and RIT1 may be unique driver events in lung adenocarcinoma
Possible new drug development could be targeted to the RTK/RAS/RAF pathway
MYC pathway as another important target
Cluster analysis using multimodal omics approach identifies tumors based on single-gene driver events while other tumor have multiple driver mutational events (TUMOR HETEROGENEITY)
Paper 2. A Genomics-Based Classification of Human Lung Tumors[2]
3,726 point mutations and more than 90 indels in the coding sequence
Smokers with lung cancer show 10× the number of point mutations than never-smokers
Novel lung cancer genes, including DACH1, CFTR, RELN, ABCB5, and HGF were identified
Tumor samples from males showed high frequency of MYCBP2 MYCBP2 involved in transcriptional regulation of MYC.
Variant allele frequency analysis revealed 10/17 tumors were at least biclonal while 7/17 tumors were monoclonal revealing majority of tumors displayed tumor heterogeneity
Novel pathway alterations in lung cancer include cell-cycle and JAK-STAT pathways
14 fusion proteins found, including ROS1-ALK fusion. ROS1-ALK fusions have been frequently found in lung cancer and is indicative of poor prognosis[4].
Novel metabolic enzyme fusions
Alterations were identified in 54 genes for which targeted drugs are available. Drug-gable mutant targets include: AURKC, BRAF, HGF, EGFR, ERBB4, FGFR1, MET, JAK2, JAK3, HDAC2, HDAC6, HDAC9, BIRC6, ITGB1, ITGB3, MMP2, PRKCB, PIK3CG, TERT, KRAS, MMP14
Table. Validated Gene-Fusions Obtained from Ref-Seq Data
Note: Gene columns contain links for GeneCard while Gene function links are to the gene’s GO (Gene Ontology) function.
There has been a recent literature on the importance of the EML4-ALK fusion protein in lung cancer. EML4-ALK positive lung tumors were found to be les chemo sensitive to cytotoxic therapy[5] and these tumor cells may exhibit an epitope rendering these tumors amenable to immunotherapy[6]. In addition, inhibition of the PI3K pathway has sensitized EMl4-ALK fusion positive tumors to ALK-targeted therapy[7]. EML4-ALK fusion positive tumors show dependence on the HSP90 chaperone, suggesting this cohort of patients might benefit from the new HSP90 inhibitors recently being developed[8].
Table. Significantly mutated genes (point mutations, insertions/deletions) with associated function.
Table. Literature Analysis of pathways containing significantly altered genes in NSCLC reveal putative targets and risk factors, linkage between other tumor types, and research areas for further investigation.
Note: Significantly mutated genes, obtained from WES, were subjected to pathway analysis (KEGG Pathway Analysis) in order to see which pathways contained signicantly altered gene networks. This pathway term was then used for PubMed literature search together with terms “lung cancer”, “gene”, and “NOT review” to determine frequency of literature coverage for each pathway in lung cancer. Links are to the PubMEd search results.
KEGG pathway Name
# of PUBMed entries containing Pathway Name, Gene ANDLung Cancer
A few interesting genetic risk factors and possible additional targets for NSCLC were deduced from analysis of the above table of literature including HIF1-α, mIR-31, UBQLN1, ACE, mIR-193a, SRSF1. In addition, glioma, melanoma, colorectal, and prostate and lung cancer share many validated mutations, and possibly similar tumor driver mutations.
please click on graph for larger view
Paper 4. Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing[9]
Exome and genome characterization of somatic alterations in 183 lung adenocarcinomas
12 somatic mutations/megabase
U2AF1, RBM10, and ARID1A are among newly identified recurrently mutated genes
Structural variants include activating in-frame fusion of EGFR
Epigenetic and RNA deregulation proposed as a potential lung adenocarcinoma hallmark
Summary
Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for more than 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole-genome sequence analysis revealed frequent structural rearrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.
Paper 5. Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer[10]
Highlights
Whole exome and transcriptome (RNASeq) sequencing 29 small-cell lung carcinomas
High mutation rate 7.4 protein-changing mutations/million base pairs
Inactivating mutations in TP53 and RB1
Functional mutations in CREBBP, EP300, MLL, PTEN, SLIT2, EPHA7, FGFR1 (determined by literature and database mining)
The mutational spectrum seen in human data also present in a Tp53-/- Rb1-/- mouse lung tumor model
Curator Graphical Summary of Interesting Findings From the Above Studies
The above figure (please click on figure) represents themes and findings resulting from the aforementioned studies including
questions which will be addressed in Future Postson this site.
UPDATED 10/10/2021
The following article uses RNASeq to screen lung adenocarcinomas for fusion proteins in patients with either low or high tumor mutational burden. Findings included presence of MET fusion proteins in addition to other fusion proteins irrespective if tumors were driver negative by DNASeq screening.
High Yield of RNA Sequencing for Targetable Kinase Fusions in Lung Adenocarcinomas with No Mitogenic Driver Alteration Detected by DNA Sequencing and Low Tumor Mutation Burden
Source:
High Yield of RNA Sequencing for Targetable Kinase Fusions in Lung Adenocarcinomas with No Mitogenic Driver Alteration Detected by DNA Sequencing and Low Tumor Mutation Burden
RymaBenayed, MichaelOffin, KerryMullaney, PurvilSukhadia, KellyRios, PatriceDesmeules, RyanPtashkin, HelenWon, JasonChang, DarraghHalpenny, Alison M.Schram, Charles M.Rudin, David M.Hyman, Maria E.Arcila, Michael F.Berger, AhmetZehir, Mark G.Kris, AlexanderDrilon and MarcLadanyi
Purpose: Targeted next-generation sequencing of DNA has become more widely used in the management of patients with lung adenocarcinoma; however, no clear mitogenic driver alteration is found in some cases. We evaluated the incremental benefit of targeted RNA sequencing (RNAseq) in the identification of gene fusions and MET exon 14 (METex14) alterations in DNA sequencing (DNAseq) driver–negative lung cancers.
Experimental Design: Lung cancers driver negative by MSK-IMPACT underwent further analysis using a custom RNAseq panel (MSK-Fusion). Tumor mutation burden (TMB) was assessed as a potential prioritization criterion for targeted RNAseq.
Results: As part of prospective clinical genomic testing, we profiled 2,522 lung adenocarcinomas using MSK-IMPACT, which identified 195 (7.7%) fusions and 119 (4.7%) METex14 alterations. Among 275 driver-negative cases with available tissue, 254 (92%) had sufficient material for RNAseq. A previously undetected alteration was identified in 14% (36/254) of cases, 33 of which were actionable (27 in-frame fusions, 6 METex14). Of these 33 patients, 10 then received matched targeted therapy, which achieved clinical benefit in 8 (80%). In the 32% (81/254) of DNAseq driver–negative cases with low TMB [0–5 mutations/Megabase (mut/Mb)], 25 (31%) were positive for previously undetected gene fusions on RNAseq, whereas, in 151 cases with TMB >5 mut/Mb, only 7% were positive for fusions (P < 0.0001).
Conclusions: Targeted RNAseq assays should be used in all cases that appear driver negative by DNAseq assays to ensure comprehensive detection of actionable gene rearrangements. Furthermore, we observed a significant enrichment for fusions in DNAseq driver–negative samples with low TMB, supporting the prioritization of such cases for additional RNAseq.
Translational Relevance
Inhibitors targeting kinase fusions have shown dramatic and durable responses in lung cancer patients, making their comprehensive detection critical. Here, we evaluated the incremental benefit of targeted RNA sequencing (RNAseq) in the identification of gene fusions in patients where no clear mitogenic driver alteration is found by DNA sequencing (DNAseq)–based panel testing. We found actionable alterations (kinase fusions or MET exon 14 skipping) in 13% of cases apparently driver negative by previous DNAseq testing. Among the driver-negative samples tested by RNAseq, those with low tumor mutation burden (TMB) were significantly enriched for gene fusions when compared with the ones with higher TMB. In a clinical setting, such patients should be prioritized for RNAseq. Thus, a rational, algorithmic approach to the use of targeted RNA-based next-generation sequencing (NGS) to complement large panel DNA-based NGS testing can be highly effective in comprehensively uncovering targetable gene fusions or oncogenic isoforms not just in lung cancer but also more generally across different tumor types.
Wake Up and Smell the Fusions: Single-Modality Molecular Testing Misses Drivers
by Kurtis D.Davies and Dara L.Aisner
Abstract
Multitarget assays have become common in clinical molecular diagnostic laboratories. However, all assays, no matter how well designed, have inherent gaps due to technical and biological limitations. In some clinical cases, testing by multiple methodologies is needed to address these gaps and ensure the most accurate molecular diagnoses.
In this issue of Clinical Cancer Research, Benayed and colleagues illustrate the growing need to consider multiple molecular testing methodologies for certain clinical specimens (1). The rapidly expanding list of actionable molecular alterations across cancer types has resulted in the wide adoption of multitarget testing approaches, particularly those based on next-generation sequencing (NGS). NGS-based assays are commonly viewed as “one-stop shops” to detect a vast array of molecular variants. However, as Benayed and colleagues discuss, even well-designed and highly vetted NGS assays have inherent gaps that, under certain circumstances, are ideally addressed by analyzing the sample using an alternative approach.
In the article, the authors examined a cohort of lung adenocarcinoma patient samples that had been deemed “driver- negative” via MSK-IMPACT, an FDA-cleared test that is widely considered by experts in the field to be one of the best examples of a DNA-based large gene panel NGS assay (2). Of 589 driver-negative cases, 254 had additional material amenable for a different approach: RNA-based NGS designed specifically for gene fusion and oncogenic gene isoform detection. After accounting for quality control failures, 232 samples were successfully sequenced, and, among these, 36 samples (representing an astonishing 15.5% of tested cases) were found to be positive for a driver gene fusion or oncogenic isoform that had not been detected by DNA-based NGS. The real-world value derived from this orthogonal testing schema was more than theoretical, with 8 of 10 (80%) patients demonstrating clinical benefit when treated according to the alteration identified via the RNA-based approach.
To detect gene rearrangements that lead to oncogenic gene fusions (and to detect mutations and insertions/deletions that lead to MET exon 14 skipping), MSK-IMPACT employs hybrid capture-based enrichment of selected intronic regions from genomic DNA. While this approach has proven to be successful in a variety of settings, there are associated limitations that were determined in this study to underlie the discrepancies between MSK-IMPACT and the RNA-based assay. First, some introns that are involved in clinically actionable rearrangement events are very large, thus requiring substantial sequencing capital that can represent a disproportionate fraction of the assay. Despite the ability via NGS to perform sequencing at a large scale, this sequencing capacity is still finite, and thus decisions must be made to sacrifice coverage of certain large genomic regions to ensure sufficient sequencing depth for other desired genomic targets. In the case of MSK-IMPACT (and most other DNA-based NGS assays), certain important introns in NTRK3 and NRG1 are not included in covered content, simply because they are too large (>90 Kb each). The second primary problem with DNA-based analysis of introns is that they often contain highly repetitive elements that are extremely difficult to assess via NGS due to their recurring presence across the genome. Attempts to sequence these regions are largely unfruitful because any sequencing data obtained cannot be specifically aligned/mapped to the desired targeted region of the genome (3). This is particularly true for intron 31 of ROS1, because it contains two repetitive long interspersed nuclear elements, and many DNA-based assays, including MSK-IMPACT, poorly cover this intron (4). In this study by Benayed and colleagues, the most common discrepant alteration was fusion involving ROS1, which accounted for 10 of 36 (28%) cases. At least six of these, those that demonstrated fusion to ROS1 exon 32, were likely directly explained by incomplete intron 31 sequencing. RNA-based analysis is able to overcome the above described limitations owing to the simple fact that sequencing is focused on exons post-splicing and the need to sequence introns is entirely avoided (Fig. 1).
Schematic representation of underlying genomic complexities that can lead to false-negative gene fusion results in DNA-based NGS analysis. In some cases, RNA-based approaches may overcome the limitations of DNA-based testing.
Lack of sufficient intronic coverage could not account for all of the discrepancies between DNA-based and RNA-based analysis however. Six samples in the cohort were found to be positive for MET exon 14 skipping based on RNA. In five of these, genomic alterations in MET introns 13 or 14 were observed, however they did not conform to canonical splice site alterations and thus were not initially called (although this was addressed by bioinformatics updates). In RNA-based testing, however, determination of exon skipping is simplified such that, regardless of the specific genomic alteration that interferes with splicing, absence of the exon in the transcript is directly observed (5). In another two of the discrepant cases, tumor purity was observed to be low in the sample, meaning that the expected variant allele frequency (VAF) for a genomic event would also likely be low, potentially below detectable levels. However, overexpression of the fusions at the transcript level was theorized to compensate for low VAF (Fig. 1). Additional explanations for discordant findings between the assays included sample-specific poor sequencing in selected introns and complex rearrangements that hindered proper capture (Fig. 1).
The take home message from Benayed and colleagues is simply this: there is no perfect assay that will detect 100% of the potential actionable alterations in patient samples. Even an extremely well designed, thoroughly vetted, and FDA-cleared assay such as MSK-IMPACT will have inherent and unavoidable “holes” due to intrinsic limitations. The solution to this dilemma, as adeptly described by Benayed and colleagues, is additional testing using a different approach. While in an ideal world every clinical tumor sample would be tested by multiple modalities to ensure the most comprehensive clinical assessment, the reality is that these samples are often scant and testing is fiscally burdensome (and often not reimbursed). Therefore, algorithms to determine which samples should be reflexed to secondary assays after testing with a primary assay are critical for maximizing benefit. In this study, the first algorithmic step was lack of an identified driver (because activated oncogenic drivers tend to exist exclusively of each other), which amounted to 23% of samples tested with the primary assay. In addition, the authors found a significantly higher rate of actionable gene fusions in samples with a low (<5 mut/Mb) tumor mutational burden, meaning that this metric, which was derived from the primary assay, could also be used to help inform decision making regarding additional testing. While this scenario is somewhat specific to lung cancer, similar approaches could be prescribed on a cancer type–specific basis.
These findings should be considered a “wake-up call” for oncologists in regard to the ordering and interpretation of molecular testing. It is clear from these and other published findings that advanced molecular analysis has limitations that require nuanced technical understanding. As this arena evolves, it is critical for oncologists (and trainees) to gain an increased comprehension of how to identify when the “gaps” in a test might be most clinically relevant. This requires a level of technical cognizance that has been previously unexpected of clinical practitioners, yet is underscored by the reality that opportunities for effective targeted therapy can and will be missed if the treating oncologist is unaware of how to best identify patients for whom additional testing is warranted. This study also highlights the mantra of “no test is perfect” regardless of prestige of the testing institution, number of past tests performed, or regulatory status. NGS, despite its benefits, does not mean all-encompassing. It is only through the adaptability of laboratories to utilize knowledge such as is provided by Benayed and colleagues that advances in laboratory medicine can be quickly deployed to maximize benefits for oncology patients.
Govindan R, Ding L, Griffith M, Subramanian J, Dees ND, Kanchi KL, Maher CA, Fulton R, Fulton L, Wallis J et al: Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 2012, 150(6):1121-1134.
Takeuchi K, Soda M, Togashi Y, Suzuki R, Sakata S, Hatano S, Asaka R, Hamanaka W, Ninomiya H, Uehara H et al: RET, ROS1 and ALK fusions in lung cancer. Nature medicine 2012, 18(3):378-381.
Morodomi Y, Takenoyama M, Inamasu E, Toyozawa R, Kojo M, Toyokawa G, Shiraishi Y, Takenaka T, Hirai F, Yamaguchi M et al: Non-small cell lung cancer patients with EML4-ALK fusion gene are insensitive to cytotoxic chemotherapy. Anticancer research 2014, 34(7):3825-3830.
Yoshimura M, Tada Y, Ofuzi K, Yamamoto M, Nakatsura T: Identification of a novel HLA-A 02:01-restricted cytotoxic T lymphocyte epitope derived from the EML4-ALK fusion gene. Oncology reports 2014, 32(1):33-39.
Workman P, van Montfort R: EML4-ALK fusions: propelling cancer but creating exploitable chaperone dependence. Cancer discovery 2014, 4(6):642-645.
Imielinski M, Berger AH, Hammerman PS, Hernandez B, Pugh TJ, Hodis E, Cho J, Suh J, Capelletti M, Sivachenko A et al: Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 2012, 150(6):1107-1120.
Peifer M, Fernandez-Cuesta L, Sos ML, George J, Seidel D, Kasper LH, Plenker D, Leenders F, Sun R, Zander T et al: Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nature genetics 2012, 44(10):1104-1110.
Other posts on this site which refer to Lung Cancer and Cancer Genome Sequencing include:
CSHL, UCLA & Einstein to Lead Roundtable Discussions on Single-Cell Sequencing
Interactive discussions on three of the key questions researchers are facing when considering single-cell analysis will be held on the second day of the Single-Cell Sequencing Conference at Next Generation Dx Summit, taking place August 20-21, 2014 in Washington, DC. For full program details and to register, please visit NextGenerationDx.com/Single-Cell-Sequencing.Making Single-Cell Analysis Cost Effective for Clinical Use
Moderator: James Hicks, Ph.D., Research Professor, Cancer Genomics, Cold Spring Harbor Laboratory
Methods for capture: What are the tradeoffs?
Combining RNA, DNA and protein analysis
What genomic assays are most informative?
Can assays be certifiable?
Finding a Needle in a Haystack: Towards Diagnosing Rare Soft Tissue Cancer Stem Cells (CSCs) Moderator: Michael Masterman-Smith, Ph.D., Entrepreneurial Scientist, UCLA California NanoSystems Institute
Rethinking companion diagnostics for cancer to incorporate analysis of CSCs
Current direct methodologies of CSC detection/isolation
Current proxy methodologies of CSC detection/isolation
The hope and promise of single-cell assay tools and technologies
Why Single-Cell Sequencing? Moderator: Jan Vijg, Ph.D., Professor and Chairman, Genetics, Albert Einstein College of Medicine
Sample limitations, e.g., prenatal diagnostics and CTCs
Sample limitations, e.g., prenatal diagnostics and CTCs
To study cell-to-cell variation, e.g., in tumors as well as normal tissues
To overcome technological constraints, e.g., detecting somatic mutations
Cell-to-cell fluctuations in gene expression can easily impair function, yet can be undetectable by measuring averages
Sequencing data from bulk DNA or RNA from multiple cells provide global information on average states of cell populations. But with whole-genome amplification and NGS, researchers can detect variation in individual cancer cells and dissect tumor evolution. Such cancer genome sequencing will improve oncology by detecting rare tumor cells early, measuring intra-/intertumor heterogeneity, guiding chemotherapy and controlling drug resistance. The Single-Cell Sequencing conference explores the latest strategies, data analyses and clinical considerations that influence and aid cancer diagnosis, prognosis and prediction and will lead to individualized cancer therapy.
Sessions include presentations spanning the opportunities of clinical single-cell analysis from:
Sunney Xie, Ph.D., Mallinckrodt Professor. Chemistry and Chemical Biology, Harvard University
Maximilian Diehn, M.D., Ph.D., Assistant Professor, Radiation Oncology, Stanford Cancer Institute, Institute for Stem Cell Biology & Regenerative Medicine, Stanford University
Denis Smirnov, Associate Scientific Director, US Biomarker Oncology, Janssen R&D US
James Hicks, Ph.D., Research Professor, Cancer Genomics, Cold Spring Harbor Laboratory
Jan Vijg, Ph.D., Professor and Chairman, Genetics, Albert Einstein College of Medicine
John F. Zhong, Ph.D., Associate Professor, Pathology, University of Southern California School of Medicine
Mark Hills, Ph.D., Research Scientist, Peter M. Lansdorp Laboratory, BC Cancer Research Centre
Michael Masterman-Smith, Ph.D., Entrepreneurial Scientist, UCLA California NanoSystems Institute
Parveen Kumar, Research Scientist, Thierry Voet Laboratory, Human Genetics, University of Leuven
Peter Nemes, Ph.D., Assistant Professor, Chemistry, George Washington University
Theresa Zhang, Ph.D., Vice President, Research Services, Personal Genome Diagnostics
Yong Wang, Ph.D., Senior Postdoctoral Fellow, Nicholas E. Navin Laboratory, Genetics, Bioinformatics, MD Anderson Cancer Center
Zivana Tezak, Ph.D., Associate Director, Science and Technology, Personalized Medicine, Office of In Vitro Diagnostic Device Evaluation and Safety (OIVD), Center for Devices and Radiological Health (CDRH), FDA
Recommended Pre-Conference Courses
NGS Data Analysis – Determining Clinical Utility of Genome Variants Monday, August 18 | 9:00am – 12:00pm This course will explore the strategies of genomic data analysis and interpretation, an emergent discipline that seeks to deliver better answers from NGS data so that patients and their physicians can determine informed healthcare decisions. View Details
NGS as a Diagnostics Platform Monday, August 18 | 2:00pm – 5:00pm The focus of this short course will be on understanding the use of NGS in clinical diagnosis, practical implementation of NGS in clinical laboratories and analysis of large data sets by using bioinformatics tools to parse and interpret data in relation to the clinical phenotype. The concluding presentation will be dedicated to quality and standardization of NGS assays. View Details