Healthcare analytics, AI solutions for biological big data, providing an AI platform for the biotech, life sciences, medical and pharmaceutical industries, as well as for related technological approaches, i.e., curation and text analysis with machine learning and other activities related to AI applications to these industries.
Cancer Genomics: Multiomic Analysis of Single Cells and Tumor Heterogeneity
Curator: Stephen J. Williams, PhD
4.3.7 Cancer Genomics: Multiomic Analysis of Single Cells and Tumor Heterogeneity, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 4: Single Cell Genomics
To better design treatments for cancer, it is important to understand the heterogeneity in tumors and how this contributes to metastasis. To examine this process, Bian et al. used a single-cell triple omics sequencing (scTrio-seq) technique to examine the mutations, transcriptome, and methylome within colorectal cancer tumors and metastases from 10 individual patients. The analysis provided insights into tumor evolution, linked DNA methylation to genetic lineages, and showed that DNA methylation levels are consistent within lineages but can differ substantially among clones.
Although genomic instability, epigenetic abnormality, and gene expression dysregulation are hallmarks of colorectal cancer, these features have not been simultaneously analyzed at single-cell resolution. Using optimized single-cell multiomics sequencing together with multiregional sampling of the primary tumor and lymphatic and distant metastases, we developed insights beyond intratumoral heterogeneity. Genome-wide DNA methylation levels were relatively consistent within a single genetic sublineage. The genome-wide DNA demethylation patterns of cancer cells were consistent in all 10 patients whose DNA we sequenced. The cancer cells’ DNA demethylation degrees clearly correlated with the densities of the heterochromatin-associated histone modification H3K9me3 of normal tissue and those of repetitive element long interspersed nuclear element 1. Our work demonstrates the feasibility of reconstructing genetic lineages and tracing their epigenomic and transcriptomic dynamics with single-cell multiomics sequencing.
Global SCNA patterns (250-kb resolution) of CRC01. Each row represents an individual cell. The subclonal SCNAs used for identifying genetic sublineages were marked and indexed; for details, see fig. S6B. On the top of the heatmap, the amplification or deletion frequency of each genomic bin (250 kb) of the non-hypermutated CRC samples from the TCGA Project and patient CRC01’s cancer cells are shown.
” data-icon-position=”” data-hide-link-title=”0″>
Fig. 1Reconstruction of genetic lineages with scTrio-seq2.
Global SCNA patterns (250-kb resolution) of CRC01. Each row represents an individual cell. The subclonal SCNAs used for identifying genetic sublineages were marked and indexed; for details, see fig. S6B. On the top of the heatmap, the amplification or deletion frequency of each genomic bin (250 kb) of the non-hypermutated CRC samples
Single-cell RNA-seq helps in finding intra-tumoral heterogeneity in pancreatic cancer
Reporter and Curator: Dr. Sudipta Saha, Ph.D.
4.3.6 Single-cell RNA-seq helps in finding intra-tumoral heterogeneity in pancreatic cancer, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 4: Single Cell Genomics
Pancreatic cancer is a significant cause of cancer mortality; therefore, the development of early diagnostic strategies and effective treatment is essential. Improvements in imaging technology, as well as use of biomarkers are changing the way that pancreas cancer is diagnosed and staged. Although progress in treatment for pancreas cancer has been incremental, development of combination therapies involving both chemotherapeutic and biologic agents is ongoing.
Cancer is an evolutionary disease, containing the hallmarks of an asexually reproducing unicellular organism subject to evolutionary paradigms. Pancreatic ductal adenocarcinoma (PDAC) is a particularly robust example of this phenomenon. Genomic features indicate that pancreatic cancer cells are selected for fitness advantages when encountering the geographic and resource-depleted constraints of the microenvironment. Phenotypic adaptations to these pressures help disseminated cells to survive in secondary sites, a major clinical problem for patients with this disease.
The immune system varies in cell types, states, and locations. The complex networks, interactions, and responses of immune cells produce diverse cellular ecosystems composed of multiple cell types, accompanied by genetic diversity in antigen receptors. Within this ecosystem, innate and adaptive immune cells maintain and protect tissue function, integrity, and homeostasis upon changes in functional demands and diverse insults. Characterizing this inherent complexity requires studies at single-cell resolution. Recent advances such as massively parallel single-cell RNA sequencing and sophisticated computational methods are catalyzing a revolution in our understanding of immunology.
PDAC is the most common type of pancreatic cancer featured with high intra-tumoral heterogeneity and poor prognosis. In the present study to comprehensively delineate the PDAC intra-tumoral heterogeneity and the underlying mechanism for PDAC progression, single-cell RNA-seq (scRNA-seq) was employed to acquire the transcriptomic atlas of 57,530 individual pancreatic cells from primary PDAC tumors and control pancreases. The diverse malignant and stromal cell types, including two ductal subtypes with abnormal and malignant gene expression profiles respectively, were identified in PDAC.
The researchers found that the heterogenous malignant subtype was composed of several subpopulations with differential proliferative and migratory potentials. Cell trajectory analysis revealed that components of multiple tumor-related pathways and transcription factors (TFs) were differentially expressed along PDAC progression. Furthermore, it was found a subset of ductal cells with unique proliferative features were associated with an inactivation state in tumor-infiltrating T cells, providing novel markers for the prediction of antitumor immune response. Together, the findings provided a valuable resource for deciphering the intra-tumoral heterogeneity in PDAC and uncover a connection between tumor intrinsic transcriptional state and T cell activation, suggesting potential biomarkers for anticancer treatment such as targeted therapy and immunotherapy.
scPopCorn: A New Computational Method for Subpopulation Detection and their Comparative Analysis Across Single-Cell Experiments
Reporter and Curator: Dr. Sudipta Saha, Ph.D.
4.2.5 scPopCorn: A New Computational Method for Subpopulation Detection and their Comparative Analysis Across Single-Cell Experiments, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 4: Single Cell Genomics
Present day technological advances have facilitated unprecedented opportunities for studying biological systems at single-cell level resolution. For example, single-cell RNA sequencing (scRNA-seq) enables the measurement of transcriptomic information of thousands of individual cells in one experiment. Analyses of such data provide information that was not accessible using bulk sequencing, which can only assess average properties of cell populations. Single-cell measurements, however, can capture the heterogeneity of a population of cells. In particular, single-cell studies allow for the identification of novel cell types, states, and dynamics.
One of the most prominent uses of the scRNA-seq technology is the identification of subpopulations of cells present in a sample and comparing such subpopulations across samples. Such information is crucial for understanding the heterogeneity of cells in a sample and for comparative analysis of samples from different conditions, tissues, and species. A frequently used approach is to cluster every dataset separately, inspect marker genes for each cluster, and compare these clusters in an attempt to determine which cell types were shared between samples. This approach, however, relies on the existence of predefined or clearly identifiable marker genes and their consistent measurement across subpopulations.
Although the aligned data can then be clustered to reveal subpopulations and their correspondence, solving the subpopulation-mapping problem by performing global alignment first and clustering second overlooks the original information about subpopulations existing in each experiment. In contrast, an approach addressing this problem directly might represent a more suitable solution. So, keeping this in mind the researchers developed a computational method, single-cell subpopulations comparison (scPopCorn), that allows for comparative analysis of two or more single-cell populations.
The performance of scPopCorn was tested in three distinct settings. First, its potential was demonstrated in identifying and aligning subpopulations from single-cell data from human and mouse pancreatic single-cell data. Next, scPopCorn was applied to the task of aligning biological replicates of mouse kidney single-cell data. scPopCorn achieved the best performance over the previously published tools. Finally, it was applied to compare populations of cells from cancer and healthy brain tissues, revealing the relation of neoplastic cells to neural cells and astrocytes. Consequently, as a result of this integrative approach, scPopCorn provides a powerful tool for comparative analysis of single-cell populations.
This scPopCorn is basically a computational method for the identification of subpopulations of cells present within individual single-cell experiments and mapping of these subpopulations across these experiments. Different from other approaches, scPopCorn performs the tasks of population identification and mapping simultaneously by optimizing a function that combines both objectives. When applied to complex biological data, scPopCorn outperforms previous methods. However, it should be kept in mind that scPopCorn assumes the input single-cell data to consist of separable subpopulations and it is not designed to perform a comparative analysis of single cell trajectories datasets that do not fulfill this constraint.
Several innovations developed in this work contributed to the performance of scPopCorn. First, unifying the above-mentioned tasks into a single problem statement allowed for integrating the signal from different experiments while identifying subpopulations within each experiment. Such an incorporation aids the reduction of biological and experimental noise. The researchers believe that the ideas introduced in scPopCorn not only enabled the design of a highly accurate identification of subpopulations and mapping approach, but can also provide a stepping stone for other tools to interrogate the relationships between single cell experiments.
Featuring Computational and Systems Biology Program at Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute (SKI), The Dana Pe’er Lab
Reporter: Aviva Lev-Ari, PhD, RN
Article ID #270: Featuring Computational and Systems Biology Program at Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute (SKI), The Dana Pe’er Lab. Published on 6/16/2019
WordCloud Image Produced by Adam Tubman
4.2.2 Featuring Computational and Systems Biology Program at Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute (SKI), The Dana Pe’er Lab, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 4: Single Cell Genomics
A lecture by Dana Pe’er is included, below in the eProceedings which I generated in Real Time on 6/14/2019 @MIT
eProceeding 2019 Koch Institute Symposium – 18th Annual Cancer Research Symposium – Machine Learning and Cancer, June 14, 2019, 8:00 AM-5:00 PM ET MIT Kresge Auditorium, 48 Massachusetts Ave, Cambridge, MA
The Pe’er lab combines single cell technologies, genomic datasets and machine learning algorithms to address fundamental questions in biomedical science. Empowered by recent breakthrough technologies like massive parallel single cell RNA-sequencing, we ask questions such as: How do multi-cellular organisms develop from a single cell, resulting in the vast diversity of progenitor and terminal cell types? How does a cell’s regulatory circuit control the dynamics of signal processing and how do these circuits rewire over the course of development? How does an ensemble of cells function together to execute a multi-cellular response, such as an immune response to pathogen or cancer? We will also address more medically oriented questions such as: How do regulatory circuits go awry in disease? What is the consequence of intra-tumor heterogeneity? Can we characterize the tumor immune eco-system to gain a better understanding of when or why immunotherapy works or does not work? A key goal is to use this characterization of the tumor immune eco-system to personalize immunotherapy.
Dana Pe’er, PhD
Chair, Computational and Systems Biology Program, SKI; Scientific Director, Metastasis & Tumor Ecosystems Center
Research Focus
Computational Biologist Dana Pe’er combines single cell technologies, genomic datasets and machine learning techniques to address fundamental questions addressing regulatory cell circuits, cellular development, tumor immune eco-system, genotype to phenotype relations and precision medicine.
Computational biologists combine findings in biology with computer algorithms and databases to conduct biological research on powerful computers, using sophisticated software — so-called “dry” laboratories — in ways that complement and strengthen traditional laboratory and clinical research. The aim is to build computer models that simulate biological processes from the molecular level up to the organism as a whole and to use these models to make useful predictions.
Computational biology can help interpret detailed molecular profiles of cancerous and noncancerous cells, molecular response profiles of therapeutic agents, and a person’s genetic profile to assist in the development of better diagnostics and prognostics, as well as improved therapies. Intelligent use of computational methods using detailed molecular and genomic data is expected to reduce the trial and error of drug development and possibly lead to shorter, more accurate clinical trials.
Three Technology Leaders in Single Cell Sequencing: 10X Genomics, Illumina and MissionBio
Reporter: Aviva Lev-Ari, PhD, RN
We review below only Three Technology Leaders in Single Cell Sequencing. There are other players in this research technology space.
WHAT IS SINGLE CELL GENOMICS?
By Nicole Davis, Ph.D.
4.2.4 Three Technology Leaders in Single Cell Sequencing: 10X Genomics, Illumina and MissionBio, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 4: Single Cell Genomics
The average read-out from conventional bulk sequencing misses the rare events and underlying genetic
diversity within and across cell populations. To resolve heterogeneity and improve patient stratification,
therapy selection, and disease monitoring we need insights into mutation co-occurrence cell population,
and zygosity within every single cell.
Clonal Resolution with Single-Cell Precision
Complex disease evolves, so understanding genetic variability — including mutation co-occurrence at the single-cell level — is vitally important for clinical researchers to break the cycle of treatment response, resistance and relapse.
CLONAL DIVERSITY REVEALED
High Sensitivity to Reveal True Heterogeneity
The Tapestri Platform revolutionizes the capability to directly assess the clonal architecture of a sample with detection of mutation co-occurrence patterns. Rather than inferring variants that co-occur within a subclone from comparable bulk variant allele frequencies, single-cell resolution uncovers the true distribution of genotypes and their segregation patterns across subclones.
Platform Features
Targeted and accurate SNVs and indel variant calling
Single-cell DNA throughput up to 10,000 cells
Simple workflow
User-friendly bioinformatics software
Customizable content
Detect rare subclones down to 0.1%
Resolve clonal architecture
Identify mutation co-occurrence
How the Tapestri Platform Works
Step 3: After running your sequencer, proceed with our downstream analysis and visualization software. Mission Bio provides dedicated bioinformatics support to help you discover biological and clinical insights
PAGES54 – 56 Data Analysis – most important pages in the Report, $669
Single-cell sequencing poses unique challenges for data analysis. Individual mammalian cells contain 50,000–300,000 transcripts, and gene expression values among individual cells can vary significantly.206 Although several hundred thousand transcripts may be expressed per individual cell, up to 85% of these are present at only 1–100 copies.207 Therefore, it is critically important in scRNA-Seq to capture low-abundance mRNA transcripts and amplify the synthesized cDNA to ensure that all transcripts are ultimately represented uniformly in the library.208,209 Spike-in quantification standards of known abundance can help distinguish technical variability/ noise from biologically meaningful gene expression changes.210 Molecular indexing can also correct for sequencing biases,211.212 and recent improvements in automated sample handling can reduce technical variability even more.213
PAGE 64
Genome & Transcriptome Sequencing
Genome & transcriptome sequencing (G&T-Seq) is a protocol that can separate and sequence genomic DNA and full-length mRNA from single cells.276 In this method, single cells are isolated and lysed. RNA is captured using biotinylated oligo(dT) capture primers and separated from DNA using streptavidin-coated magnetic beads. SmartSeq2 is used to amplify captured RNA on the bead, while MDA is used to amplify DNA. After sequencing, integrating DNA and RNA sequences provides insights into the gene-expression profile of single cells (Table 4)
PAGE 66
Genomic DNA and mRNA Sequencing
DR-Seq studies the genomic and transcriptomic relationship of single cells via sequencing. Nucleic acid amplification prior to physical separation reduces sample loss and the risk of contamination. DR-Seq involves multiple amplification steps, including the quasilinear amplification technique similar to MALBAC. First, mRNAs are reverse-transcribed from lysed single cells using poly(dT) primers with Ad-1x adapters, producing single-stranded cDNA (sscDNA). The Ad-1x adapter sequence contains cell-identifying barcodes, 5’ Illumina adapters, and a T7 promoter. Next, both gDNA and sscDNA are amplified simultaneously via quasilinear WGA with Ad-2 primers. These primers are similar to MALBAC adapters, containing 8 random nucleotides for random priming followed by a constant 27-nucleotide
PAGE 73
Single-Cell Methylome & Transcriptome Sequencing
scM&T-Seq allows parallel analysis of both epigenetic and gene expression patterns from single cells using Smart-Seq2 and scBS-Seq. scM&T-Seq is built upon G&T-Seq, but instead of using MDA for DNA sequencing, it uses scBS-Seq to interrogate DNA methylation patterns. First, single cells are isolated and individually lysed. Then, mRNAs are isolated using streptavidin-coupled mRNA capture primers, physically separating them from DNA strands. Smart-Seq2 is used to generate cDNA libraries from the mRNA, which involves reverse transcription with template switching and tagmentation. DNA libraries are prepared via scBS-Seq, which involves bisulfite conversion of DNA strands to identify methylated cytosines. Both libraries are now ready for sequencing (Table 9).
PAGE 78
RNA METHODS
Low-level RNA detection refers to both detection of rare RNA molecules in a cellfree environment (such as circulating tumor RNA) and the expression patterns of single cells. Tissues consist of a multitude of different cell types, each with a distinctly different set of functions. Even within a single cell type, the transcriptomes are highly dynamic and reflect temporal, spatial, and cell cycle–dependent changes. Cell harvesting, handling, and technical issues with sensitivity and bias during amplification add additional levels of complexity. To resolve this multitiered complexity would require analyzing many thousands of cells. The use of unique barcodes has greatly increased the number of samples that can be multiplexed and pooled at little to no decrease in reads associated with each sample. Recent improvements in cell capture and sample preparation will provide more information, faster, and at lower cost.310,311 This development promises to expand our understanding of cell function fundamentally, with significant implications for research and human health. 312
PAGR 87
Genome & Transcriptome Sequencing
G&T-Seq is a protocol that can separate and sequence genomic DNA and full-length mRNA from single cells.327 In this method, single cells are isolated and lysed. RNA is captured using biotinylated oligo(dT) capture primers and separated from DNA using streptavidin-coated magnetic beads. Smart-Seq2 is used to amplify captured RNA on the bead, while MDA is used to amplify DNA. After sequencing, integrating DNA and RNA sequences provides insights into the gene-expression profile of single cells.
PAGE 88
Genomic DNA and mRNA Sequencing
DR-Seq studies the genomic and transcriptomic relationship of single cells via sequencing. Nucleic acid amplification prior to physical separation reduces sample loss and the risk of contamination. DR-Seq involves multiple amplification steps, including the quasilinear amplification technique similar to MALBAC.
PAGE 89
T Cell–Receptor Chain Pairing
Functional TCRs are heterodimeric proteins composed of unique combinations of α and β chains. For an accurate functional analysis, both subunits must be sequenced together to avoid disrupting the α- and β-chain pairing during the cell lysis step.333
Flow cell–surface reverse-transcription sequencing (FRT-Seq) is a transcriptomesequencing technique developed in 2010.339 It is strand-specific, free of amplification, and is compatible with paired-end sequencing. To begin with, poly(A)+ RNA samples are fragmented by metal-ion hydrolysis and dephosphorylated.
PAGE 97
Single-Cell RNA Barcoding and Sequencing
Single-cell RNA barcoding and sequencing (SCRB-Seq) is a cost-efficient, multiplexed scRNA-Seq technique. SCRB-Seq isolates single cells into wells using FACS. After cell lysis, poly(A)+ mRNAs are annealed to a custom primer containing a poly(T) tract, UMI, well barcode, and biotin. Template-switching reverse transcription and PCR amplification are carried out on the mRNA, generating barcoded full-length cDNA. cDNA strands from all wells are pooled together to be purified. They are amplified by PCR and purified further. cDNA libraries are prepared using the Nextera XT kit with modified i5 primers. The resultant cDNA fragments are size-selected for 300–800 bp and sequenced (Table 31).
10x Genomics is building tools for scientific discovery that reveal and address the true complexities of biology and disease. Through a combination of novel microfluidics, chemistry and bioinformatics, our award-winning Chromium System is enabling researchers around the world to more fully understand the fundamentals of biology at unprecedented resolution and scale.
For the first time, scRNA-Seq is enabling a cell-by-cell molecular and cellular characterization of hundreds of thousands of cells within the same sample. Complex systems, like those found in the immune system, can be explored without limits.
Transcriptome analysis has made the leap from bulk population-based studies to the single cell, and scientists are harnessing this new degree of resolution with remarkable ingenuity.
Single-cell RNA sequencing (scRNA-Seq) allows you to ask and answer questions that require single-cell resolution on a scale that suits your experimental needs, from hundreds to millions of cells. Are you truly tracking your cells’ transcriptomes, or are you just reading into the averages?
SINGLE CELL GENOMICS 2019, September 24-26, 2019, Djurönäset, Stockholm, Sweden
Reporter: Aviva Lev-Ari, PhD, RN
4.1.6 SINGLE CELL GENOMICS 2019 – sometimes the sum of the parts is greater than the whole, September 24-26, 2019, Djurönäset, Stockholm, Sweden http://www.weizmann.ac.il/conferences/SCG2019/single-cell-genomics-2019, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 4: Single Cell Genomics
Single cell genomics has emerged as a revolutionary technology transforming nearly every field of biomedical research. Through its many applications (single cell genome sequencing, single cell transcriptomics, various single cell epigenetic profiling approaches, and spatially resolved methods), researchers can characterize the genetic and functional properties of individual cells in their native conditions, leading to numerous experimental and clinical opportunities. As technology is leaping forward, many critical questions are arising:
• How can the behavior of groups of thousands or tens of thousands of single cells be analyzed and modeled?
• How can samples of precise single-cell-states be converted to inferred cellular behaviour, in space and time?
• How can multimodal single-cell datasets be integrated?
• What can we learn about cell-cell interactions?
• What are the immediate implications to fields like neuroscience, immunology, cancer research and stem cells?
• What will the longer-term impacts be for clinical research and practice?
The conference will bring together many of the pioneers and leading experts in the field to three days of extensive, interdisciplinary and informal discussion. Our goal is to create a forum where knowledge is shared, hoping to define together the agenda of this new community. The meeting will include presentations from invited leaders and several selected abstracts, a poster session and many opportunities for interaction. We encourage students and postdocs to participate by presenting abstracts.
The second annual PureTech Health BIG Summit brings together an elite ensemble of leading scientific researchers, investors, and CEOs and R&D leaders from major pharmaceutical, technology, and biotech companies.
The BIG Summit is designed to stimulate ideas that will have an impact on existing pipelines and catalyze future interactions among a group of delegates that represent leaders and innovators in their fields.
Please follow the discussion on Twitter using #BIGAxisSummit
By invitation only; registration is non-transferable.
For more information, please contact PureTechHealthSummit@PureTechHealth.com
Back for final sessions at #BIGAxisSummit. @PureTechH Jim Harper of Sonde Health talking about how voice data — pacing, fine motor articulation, oscillation — can point the way to objective, quantitative measures for detecting and monitoring depression.
Paul Biondi at #BIGAxisSummit : What makes big deals happen is financial, and *deep conviction* of a big future fit. Disproportionate valuation from bidders is expected.
Love this. We often reduce everything to mathematical analyses to champion or ridicule deals. Not that simple
Bob Langer (@MIT) asks how #lymphatics affected by #aging. Santambrogio: typically blame aging #immune cells for increased disease, but aging affects lymphatics too (less efficient trafficking shown). Rejuvenating these could affect several aging-related diseases #BigAxisSummit
Single-cell Genomics: Directions in Computational and Systems Biology – Contributions of Prof. Aviv Regev @Broad Institute of MIT and Harvard, Cochair, the Human Cell Atlas Organizing Committee with Sarah Teichmann of the Wellcome Trust Sanger Institute
Curator: Aviva Lev-Ari, PhD, RN
4.1.3 Single-cell Genomics: Directions in Computational and Systems Biology – Contributions of Prof. Aviv Regev @Broad Institute of MIT and Harvard, Cochair, the Human Cell Atlas Organizing Committee with Sarah Teichmann of the Wellcome Trust Sanger Institute, Volume 2 (Volume Two: Latest in Genomics Methodologies for Therapeutics: Gene Editing, NGS and BioInformatics, Simulations and the Genome Ontology), Part 4: Single Cell Genomics
Dana Pe’er, PhD, now chair of computational and systems biology at the Sloan Kettering Institute at the Memorial Sloan Kettering Cancer Center and a member of the Human Cell Atlas Organizing Committee,
what really sets Regev apart is the elegance of her work. Regev, says Pe’er, “has a rare, innate ability of seeing complex biology and simplifying it and formalizing it into beautiful, abstract, describable principles.”
Dr. Aviv Regev, an MIT biology professor who is also chair of the faculty of the Broad and director of its Klarman Cell Observatory and Cell Circuits Program, was reviewing a newly published white paper detailing how the Human Cell Atlas is expected to change the way we diagnose, monitor, and treat disease at a gathering of international scientists at Israel’s Weizmann Institute of Science, 10/2017.
For Regev, the importance of the Human Cell Atlas goes beyond its promise to revolutionize biology and medicine. As she once put it, without an atlas of our cells, “we don’t really know what we’re made of.”
Regev, turned to a technique known as RNA interference (she now uses CRISPR), which allowed her to systematically shut genes down. Then she looked at which genes were expressed to determine how the cells’ response changed in each case. Her team singled out 100 different genes that were involved in regulating the response to the pathogens—some of which weren’t previously known to be involved in immune function. The study, published in Science, generated headlines.
The project, the Human Cell Atlas, aims to create a reference map that categorizes all the approximately 37 trillion cells that make up a human. The Human Cell Atlas is often compared to the Human Genome Project, the monumental scientific collaboration that gave us a complete readout of human DNA, or what might be considered the unabridged cookbook for human life. In a sense, the atlas is a continuation of that project’s work. But while the same DNA cookbook is found in every cell, each cell type reads only some of the recipes—that is, it expresses only certain genes, following their DNA instructions to produce the proteins that carry out a cell’s activities. The promise of the Human Cell Atlas is to reveal which specific genes are expressed in every cell type, and where the cells expressing those genes can be found.
Regev says,
The final product, will amount to nothing less than a “periodic table of our cells,” a tool that is designed not to answer one specific question but to make countless new discoveries possible.
Sequencing the RNA of the cells she’s studying can tell her only so much. To understand how the circuits change under different circumstances, Regev subjects cells to different stimuli, such as hormones or pathogens, to see how the resulting protein signals change.
“the modeling step”—creating algorithms that try to decipher the most likely sequence of molecular events following a stimulus. And just as someone might study a computer by cutting out circuits and seeing how that changes the machine’s operation, Regev tests her model by seeing if it can predict what will happen when she silences specific genes and then exposes the cells to the same stimulus.
By sequencing the RNA of individual cancer cells in recent years—“Every cell is an experiment now,” she says—she has found remarkable differences between the cells of a single tumor, even when they have the same mutations. (Last year that work led to Memorial Sloan Kettering’s Paul Marks Prize for Cancer Research.) She found that while some cancers are thought to develop resistance to therapy, a subset of melanoma cells were resistant from the start. And she discovered that two types of brain cancer, oligodendroglioma and astrocytoma, harbor the same cancer stem cells, which could have important implications for how they’re treated.
As a 2017 overview of the Human Cell Atlas by the project’s organizing committee noted, an atlas “is a map that aims to show the relationships among its elements.” Just as corresponding coastlines seen in an atlas of Earth offer visual evidence of continental drift, compiling all the data about our cells in one place could reveal relationships among cells, tissues, and organs, including some that are entirely unexpected. And just as the periodic table made it possible to predict the existence of elements yet to be observed, the Human Cell Atlas, Regev says, could help us predict the existence of cells that haven’t been found.
This year alone it will fund 85 Human Cell Atlas grants. Early results are already pouring in.
In March, Swedish researchers working on cells related to human development announced they had sequenced 250,000 individual cells.
In May, a team at the Broad made a data set of more than 500,000 immune cells available on a preview site.
The goal, Regev says, is for researchers everywhere to be able to use the open-source platform of the Human Cell Atlas to perform joint analyses.
Eric Lander, PhD, the founding director and president of the Broad Institute and a member of the Human Cell Atlas Organizing Committee, likens it to genomics.
“People thought at the beginning they might use genomics for this application or that application,” he says. “Nothing has failed to be transformed by genomics, and nothing will fail to be transformed by having a cell atlas.”
“How did we ever imagine we were going to solve a problem without single-cell resolution?”
NIH to Award Up to $12M to Fund DNA, RNA Sequencing Research: single-cell genomics, sample preparation, transcriptomics and epigenomics, and genome-wide functional analysis.
Single-cell RNA sequencing (scRNA-seq) is a recent technology that enables fine-grained discovery of cellular subtypes and specific cell states. It routinely uses machine learning methods, such as feature learning, clustering, and classification, to assist in uncovering novel information from scRNA-seq data. However, current methods are not well suited to deal with the substantial amounts of noise that is created by the experiments or the variation that occurs due to differences in the cells of the same type. Here, we develop a new hybrid approach, Deep Unsupervised Single-cell Clustering (DUSC), that integrates feature generation based on a deep learning architecture with a model-based clustering algorithm, to find a compact and informative representation of the single-cell transcriptomic data generating robust clusters. We also include a technique to estimate an efficient number of latent features in the deep learning model. Our method outperforms both classical and state-of-the-art feature learning and clustering methods, approaching the accuracy of supervised learning. We applied DUSC to single-cell transcriptomics dataset obtained from a triple-negative breast cancer tumor to identify potential cancer subclones accentuated by copy-number variation and investigate the role of clonal heterogeneity. Our method is freely available to the community and will hopefully facilitate our understanding of the cellular atlas of living organisms as well as provide the means to improve patient diagnostics and treatment.
Jenny Mjösberg and Rickard Sandberg are principal investigators at Karolinska Institutet’s Department of Medicine, Huddinge and Department of Cell and Molecular Biology, respectively. Credit: Stefan Zimmerman.
A relatively newly discovered group of immune cells known as ILCs have been examined in detail in a new study published in the journal Nature Immunology. By analysing the gene expression in individual tonsil cells, scientists at Karolinska Institutet have found three previously unknown subgroups of ILCs, and revealed more about how these cells function in the human body.
Innate lymphoid cells (ILCs) are a group of immune cells that have only relatively recently been discovered in humans. Most of current knowledge about ILCs stems from animal studies of e.g. inflammation or infection in the gastrointestinal tract. There is therefore an urgent need to learn more about these cells in humans.
Previous studies have shown that ILCs are important for maintaining the barrier function of the mucosa, which serves as a first line of defence against microorganisms in the lungs, intestines and elsewhere. However, while there is growing evidence to suggest that ILCs are involved in diseases such as inflammatory bowel disease, asthma and intestinal cancer, basic research still needs to be done to ascertain exactly what part they play.
Two research groups, led by Rickard Sandberg and Jenny Mjösberg, collaborated on a study of ILCs from human tonsils. To date, three main groups of human ILCs are characterized. In this present study, the teams used a novel approach that enabled them to sort individual tonsil cells and measure their expression across thousands of genes. This way, the researchers managed to categorise hundreds of cells, one by one, to define the types of ILCs found in the human tonsils.
Unique gene expression profiles
Rickard Sandberg, credit: Stefan Zimmerman,
“We used cluster analyses to demonstrate that ILCs congregate into ILC1, ILC2, ILC3 and NK cells, based on their unique gene expression profiles,” says Professor Sandberg at Karolinska Institutet’sDepartment of Cell and Molecular Biology, and the Stockholm branch of Ludwig Cancer Research. “Our analyses also discovered the expression of numerous genes of previously unknown function in ILCs, highlighting that these cells are likely doing more than what we previously knew.”
By analysing the gene expression profiles (or transcriptome) of individual cells, the researchers found that one of the formerly known main groups could be subdivided.
Jenny Mjösberg, credit: Stefan Zimmerman.
“We’ve identified three new subgroups of ILC3s that evince different gene expression patterns and that differ in how they react to signalling molecules and in their ability to secrete proteins,” says Dr Mjösberg at Karolinska Institutet’s Department of Medicine in Huddinge, South Stockholm. “All in all, our study has taught us a lot about this relatively uncharacterised family of cells and our data will serve as an important resource for other researchers.”
The study was financed by grants from a number of bodies, including the Swedish Research Council, the Swedish Cancer Society, the EU Framework Programme for Research and Innovation, the Swedish Society for Medical Research, the Swedish Foundation for Strategic Research and Karolinska Institutet.
Using a new assay method to study tumor cells, researchers at the University of California, San Diego School of Medicine and UC San Diego Moores Cancer Center have found evidence of clonal evolution in chronic lymphocytic leukemia (CLL). The assay method distinguishes features of leukemia cells that indicate whether the disease will be aggressive or slow-moving, a key factor in when and how patients are treated.
The findings are published in the July 26, 2012 First Edition online issue of Blood.
The progression of CLL is highly variable, dependent upon the rate and effects of accumulating monoclonal B cells in the blood, marrow, and lymphoid tissues. Some patients are symptom-free for years and do not require treatment, which involves the use of drugs that can cause significant side effects and are not curative. In other patients, however, CLL is relatively aggressive and demands therapeutic intervention soon after diagnosis.
“Our study shows that there may not be a sharp dividing line between the more aggressive and less aggressive forms of CLL,” said Thomas J. Kipps, MD, PhD, Evelyn and Edwin Tasch Chair in Cancer Research and senior author of the study. “Instead, it seems that over time the leukemia cells of patients with indolent disease begin to use genes similar to those that are generally used by CLL cells of patients with aggressive disease. In other words, prior to requiring therapy, the patterns of genes expressed by CLL cells appear to converge, regardless of whether or not the patient had aggressive versus indolent disease at diagnosis.”
Existing markers for aggressive or indolent disease are mostly fixed and have declining predictive value the longer the patient is from his or her initial diagnosis. When the blood sample is collected, these markers cannot reliably predict whether a CLL patient will need therapy soon, particularly when the patient has had the diagnosis of CLL for many years.
Kipps and colleagues studied thousands of genes, particularly those that code for proteins, in a group of 130 CLL patients with varying risks of disease progression. They identified 38 prognostic subnetworks of interacting genes and proteins that, at the time of sample collection, indicate the relative the aggressiveness of the disease and predict when the patient will require therapy. They confirmed their work using the method on two other, smaller CLL patient cohorts in Germany and Italy.
The subnetworks offer greater predictive value because they are based not on expression levels of individual genes or proteins, but on how they dynamically interact and change over time, influencing the course of the CLL and patient symptoms.
“In a sense, we looked at families rather than individuals,” said Kipps. “If you find in an interconnected family where most genes or proteins are expressed at higher levels, it becomes more likely that these genes and proteins have functional significance.”
He added that while the subnetworks abound in data, their complexity actually makes them easy to interpret and understand. “It’s like when you look out of a window and see the sky, clouds, trees, people, cars. You’re getting tremendous amounts of information that individually doesn’t tell you much. But when you look at the scene as a whole, you see patterns and networks. This work is similar. We’re taking all of the individual gene expression patterns and making sense of them as a whole. We’re more able to more clearly see how they control and regulate function.”
The findings help define how CLL — and perhaps other cancers — evolve over time, becoming more aggressive and deadly. “It’s as if each tumor has a clock which determines how frequently it may acquire the chance changes that make it behave more aggressively. Although the rates can vary, it appears that tumors march down similar pathways, which converge over time to a point where they become aggressive enough to require therapy.”
The study may alter how scientists think about CLL and how clinicians treat the disease: whether it is better to wait for later stages of the disease when tumor cells are more fragile and easier to kill, or treat early-stage indolent tumor cells aggressively, when they are fewer in number but harder to find and more resistant to therapy.
Abstract:
The clinical course of patients with chronic lymphocytic leukemia (CLL) is heterogeneous. Several prognostic factors have been identified that can stratify patients into groups that differ in their relative tendency for disease progression and/or survival. Here, we pursued a subnetwork-based analysis of gene expression profiles to discriminate between groups of patients with disparate risks for CLL progression. From an initial cohort of 130 patients, we identified 38 prognostic subnetworks that could predict the relative risk for disease progression requiring therapy from the time of sample collection, more accurately than established markers. The prognostic power of these subnetworks then was validated on two other cohorts of patients. We noted reduced divergence in gene expression between leukemia cells of CLL patients classified at diagnosis with aggressive versus indolent disease over time. The predictive subnetworks vary in levels of expression over time but exhibit increased similarity at later time points prior to therapy, suggesting that degenerate pathways apparently converge into common pathways that are associated with disease progression. As such, these results have implications for understanding cancer evolution and for the development of novel treatment strategies for patients with CLL.