Feeds:
Posts
Comments

Posts Tagged ‘RNA sequencing’


RNA in synthetic biology

Larry H. Bernstein, MD, FCAP, Curator

LPBI

 

RNA May Surpass DNA in Precision Medicine

http://www.genengnews.com/gen-news-highlights/rna-may-surpass-dna-in-precision-medicine/81252507/

 

Scientists based at the Translational Genomics Research Institute have published a review heralding the promise of RNA sequencing (RNA-seq) for precision medicine. The scientists also note that progress will be needed on analytical, bioinformatics, and regulatory fronts, particularly in light of the transcriptome’s variety, dynamism, and wealth of detail. In this image, one aspect of RNA-seq is shown, the alignment with intron-split short reads. It reflects the alignment of mRNA sequence obtained via high-throughput sequencing and the expected behavior of the alignment to the reference genome when the read falls in an exon–exon junction. [Rgocs, Wikipedia]
http://www.genengnews.com/Media/images/GENHighlight/thumb_Mar22_2016_Rgocs_RNASeqAlignment1872484040.jpg

 

It’s not an either/or situation. Both DNA sequencing and RNA sequencing hold clinical promise—diagnostically, prognostically, and therapeutically. It must be said, however, that RNA sequencing reflects the dynamic nature of gene expression, shifting with the vagaries of health and disease. Also, RNA sequencing captures more biochemical complexity, in the sense that it allows for the detection of a wide variety of RNA species, including mRNA, noncoding RNA, pathogen RNA, chimeric gene fusions, transcript isoforms, and splice variants, and provides the capability to quantify known, predefined RNA species and rare RNA transcript variants within a sample.

All these potential advantages were cited in a paper that appeared March 21 in Nature Reviews Genetics, in an article entitled, “Translating RNA Sequencing into Clinical Diagnostics: Opportunities and Challenges.” The paper, contributed by scientists based at the Translational Genomics Research Institute (TGen), was definitely optimistic about the clinical utility of RNA sequencing, but it also highlighted the advances that would have to occur if RNA sequencing is to achieve its promise.

In general, the very things that make RNA sequencing so interesting are the same things that make it so challenging. RNA sequencing would take the measure of a world—the transcriptome—that is incredibly rich. To capture all the relevant subtleties of the transcriptome, scientists will have to develop sensitive, precise, and trustworthy analytical techniques. What’s more, scientists will need to find efficient and reliable means of processing and interpreting all of the transcriptome data they will collect. Finally, they will need to continue integrating RNA-based knowledge with DNA-based knowledge. That is, RNA sequencing results can be used to guide the interpretation of DNA sequencing results.

In their Nature Reviews Genetics paper, the TGen scientists review the state of RNA sequencing and offer specific recommendations to enhance its clinical utility. The TGen scientists make a special point about the promise held by extracellular RNA (exRNA). Because exRNA can be monitored by simply taking a blood sample, as opposed to taking a tumor biopsy, it could serve as a noninvasive diagnostic indicator of disease.

“Detection of gene fusions and differential expression of known disease-causing transcripts by RNA-seq represent some of the most immediate opportunities,” wrote the authors. “However, it is the diversity of RNA species detected through RNA-seq that holds new promise for the multi-faceted clinical applicability of RNA-based measures, including the potential of extracellular RNAs as non-invasive diagnostic indicators of disease.”

The first test measuring exRNA was released earlier this year, the paper said, for use measuring specific exRNAs in lung cancer patients. And, the potential for using RNA-seq in cancer is expanding rapidly. Commercial RNA-seq tests are now available, and they provide the opportunity for clinicians to profile cancer more comprehensively and use this information to guide treatment selection for their patients.

In addition, the authors reported on several recent applications for RNA-seq in the diagnosis and management of infectious diseases, such as monitoring for drug-resistant populations during therapy and tracking the origin and spread of the Ebola virus.

Despite these advances, the authors also sounded a few cautionary notes. “There are currently few agreed upon methods for isolation or quantitative measurements and a current lack of quality controls that can be used to test platform accuracy and sample preparation quality,” they wrote. “Analytical, bioinformatics, and regulatory challenges exist, and ongoing efforts toward the establishment of benchmark standards, assay optimization for clinical conditions and demonstration of assay reproducibility are required to expand the clinical utility of RNA-seq.”

Overall, the authors remain hopeful that precision medicine will embrace RNA sequencing. For example, lead author Sara Byron, research assistant professor in TGen’s Center for Translational Innovation, said, “RNA is a dynamic and diverse biomolecule with an essential role in numerous biological processes. From a molecular diagnostic standpoint, RNA-based measurements have the potential for broad application across diverse areas of human health, including disease diagnosis, prognosis, and therapeutic selection.”

 

RNA Bacteriophages May Open New Path to Fighting Antibiotic-Resistant Infections

http://www.genengnews.com/gen-news-highlights/rna-bacteriophages-may-open-new-path-to-fighting-antibiotic-resistant-infections/81252521/

http://www.genengnews.com/Media/images/GENHighlight/thumb_Mar25_2016_Wikimedia_RNABacteriophages2091791481.jpg

Micrograph image of RNA bacteriophages attached to part of the bacterium E. coli. A new study at Washington University School of Medicine in St. Louis suggests that bacteriophages made of RNA, a close chemical cousin of DNA, likely play a much larger role in shaping the bacterial makeup of worldwide habitats than previously recognized. [Graham Beards/Wikimedia]

Scientists at Washington University School of Medicine in St. Louis report that bacteriophages made of RNA likely play a much larger role in shaping the bacterial makeup of worldwide habitats than previously recognized. Their study (“Hyperexpansion of RNA Bacteriophage Diversity”), published in PLOS Biology, identified 122 new types of RNA bacteriophages in diverse ecological niches, providing an opportunity for scientists to define their contributions to ecology and potentially to exploit them as novel tools to fight bacterial infections, particularly those that are resistant to antibiotics.

“Lots of DNA bacteriophages have been identified, but there’s an incredible lack of understanding about RNA bacteriophages,” explained senior author David Wang, Ph.D., associate professor of molecular microbiology. “They have been largely ignored—relatively few were known to exist, and for the most part, scientists haven’t bothered to look for them. This study puts RNA bacteriophages on the map and opens many new avenues of exploration.”

Dr. Wang estimates that of the more than 1500 bacteriophages that have been identified, 99% of them have DNA genomes. The advent of large-scale genome sequencing has helped scientists identify DNA bacteriophages in the human gut, skin, and blood, as well as in the environment, but few researchers have looked for RNA bacteriophages in those samples (doing so requires that RNA be isolated from the samples and then converted back to DNA before sequencing).

As part of the new study, first author and graduate student Siddharth Krishnamurthy, and the team, including Dan Barouch, M.D., Ph.D., of Beth Israel Deaconess Medical Center and Harvard Medical School, identified RNA bacteriophages by analyzing data from samples taken from the environment, such as oceans, sewage, and soils, and from aquatic invertebrates including crabs, sponges, and barnacles, as well as insects, mice, and rhesus macaques.

RNA bacteriophages have been shown to infect Gram-negative bacteria, which have become increasingly resistant to antibiotics and are the source of many infections in health care settings. But the researchers also showed for the first time that these bacteriophages also may infect Gram-positive bacteria, which are responsible for strep and staph infections as well as MRSA (methicillin-resistant Staphylococcus aureus).

“What we know about RNA bacteriophages in any environment is limited,” Dr. Wang said. “But you can think of bacteriophages and bacteria as having a predator–prey relationship. We need to understand the dynamics of that relationship. Eventually, we’d like to manipulate that dynamic to use phages to selectively kill particular bacteria.”

 

Hyperexpansion of RNA Bacteriophage Diversity

Siddharth R. Krishnamurthy , Andrew B. Janowski , Guoyan Zhao , Dan Barouch
24 Mar 2016 | PLOS Biology   
   http://dx.doi.org:/10.1371/journal.pbio.1002409

Bacteriophage modulation of microbial populations impacts critical processes in ocean, soil, and animal ecosystems. However, the role of bacteriophages with RNA genomes (RNA bacteriophages) in these processes is poorly understood, in part because of the limited number of known RNA bacteriophage species. Here, we identify partial genome sequences of 122 RNA bacteriophage phylotypes that are highly divergent from each other and from previously described RNA bacteriophages. These novel RNA bacteriophage sequences were present in samples collected from a range of ecological niches worldwide, including invertebrates and extreme microbial sediment, demonstrating that they are more widely distributed than previously recognized. Genomic analyses of these novel bacteriophages yielded multiple novel genome organizations. Furthermore, one RNA bacteriophage was detected in the transcriptome of a pure culture of Streptomyces avermitilis, suggesting for the first time that the known tropism of RNA bacteriophages may include gram-positive bacteria. Finally, reverse transcription PCR (RT-PCR)-based screening for two specific RNA bacteriophages in stool samples from a longitudinal cohort of macaques suggested that they are generally acutely present rather than persistent.

Bacteriophages (viruses that infect bacteria) can alter biological processes in numerous ecosystems. While there are numerous studies describing the role of bacteriophages with DNA genomes in these processes, the role of bacteriophages with RNA genomes (RNA bacteriophages) is poorly understood. This gap in knowledge is in part because of the limited diversity of known RNA bacteriophages. Here, we begin to address the question by identifying 122 novel RNA bacteriophage partial genome sequences present in metagenomic datasets that are highly divergent from each other and previously described RNA bacteriophages. Additionally, many of these sequences contained novel properties, including novel genes, segmentation, and host range, expanding the frontiers of RNA bacteriophage genomics, evolution, and tropism. These novel RNA bacteriophage sequences were globally distributed from numerous ecological niches, including animal-associated and environmental habitats. These findings will facilitate our understanding of the role of the RNA bacteriophage in microbial communities. Furthermore, there are likely many more unrecognized RNA bacteriophages that remain to be discovered.

 

Read Full Post »


Insights into Brain Structure

Larry H. Bernstein, MD, FCAP, Curator

FPBI

 

Can Big Genomic Data Reveal the Fundamental Units of the Brain?

Aaron Kroll     http://www.bio-itworld.com/2016/1/20/can-big-genomic-data-reveal-fundamental-units-brain.html

January 20, 2016 | An adult mouse’s brain, an object not much bigger than the last joint of your pinky finger, contains around 75 million neurons. At the Allen Institute for Brain Science in Seattle, the Mouse Cell Types program, led by Hongkui Zeng, is trying to figure out just how many varieties of neurons make up this vast complex, and what makes each one unique.

Zeng’s research focuses on the primary visual cortex, a tiny sliver of the brain where signals from the eyes are processed and interpreted. Because vision is a relatively well-defined process, it’s thought to be a good model for connecting the behavior of individual neurons to larger brain functions.

“You really can’t understand a system until you understand its parts,” says Bosiljka Tasic, a founding member of the Mouse Cell Types program.

This month, Zeng’s team published a study in Nature Neuroscience that takes advantage of new technological developments to get a fine-grained look at the molecular toolkits of single neurons. Using newly refined methods to isolate single cells, Zeng’s lab collected over 1,600 brain cells from the visual cortexes of adult mice, intact and in good shape for sequencing. With advances in highly parallel, unbiased RNA sequencing, the group was able to measure each cell’s entire “transcriptome”―the array of RNA molecules that indicate which genes are actively producing proteins―at a depth that reveals even the scarcest RNA traces.

To a shocking extent, those parts are still a mystery. Many supposed cell types are based on little more than what you can see through a microscope: a neuron’s shape, or the pattern of rootlike dendrites extending from its body. These morphological traits, though important, are hard to see in full, and even harder to track methodically across thousands or millions of cells.

“We think this is probably the most comprehensive survey of a cortical area,” says Tasic, who co-led the study with her colleague Vilas Menon. “Many studies that are coming out now do very shallow sequencing… We wanted to go deeper.” With a median of 8.7 million sequencing reads per cell, the authors discovered a wealth of new RNA markers that define discrete groups of neurons. Some of these markers suggest that known cell types in the brain can be split into smaller sub-categories. A few even stake out rare types of neurons that may be new to science.

Yet the data collected for this study also confirms that the brain’s biology is neither tidy nor easy to unravel.

“There is this obsession in the field, and in many other areas of biology, that people always want cleanliness and discreteness,” Tasic says. Instead, her efforts to classify neurons have shown that “types” can be slippery, and many cells straddle the line between closely related groups. As projects like this one seek to redefine cell types for the genomics age, scientists will have to face these ambiguities and consider what they can tell us about the nature of the brain.

Patterns within Patterns

Whole transcriptomes provide an impressive amount of data with which to organize cells, but that data is hard to interpret in an unbiased way. “We’re trying, in some sense, to solve two problems simultaneously,” says Vilas Menon, co-lead author of the paper. “We’re trying to cluster the genes, and also to cluster the cells.”

To disentangle these problems, the team performed an iterative analysis. First, their software looked for RNA markers that diverged most widely between different cells, using those markers to sort all the cells in the study into large clusters. Then, they wiped the slate clean, looking for brand-new markers within each cluster to split the cells step by step into smaller groups. The smallest possible divisions, in which no new RNA markers could strongly distinguish cells from one another, became the group’s proposed “cell types.”

The researchers used two different computational methods to define clusters, but both revealed the same basic hierarchy of types. “In general, the higher level splits correspond to what’s already known for these broad classes of neurons,” says Menon. For instance, the first split simply divided all the neurons in their data from a handful of other cell types present in the brain, like the glial cells that support the brain’s physical structure. The second split separated GABAergic cells, which mostly damp down chemical signals in the brain, from glutamatergic cells, which mostly spark and amplify signals.

Beyond this point, the patterns became more revealing. Within the glutamatergic cells, for example, later clustering tended to split neurons according to how deeply they were embedded in the cortex. A mouse’s primary visual cortex is organized in six layers, and the Allen Institute’s transcriptome data suggests that the neurons in each layer may be closely related to one another, or have similar functions that require the same genes to be activated. Yet the GABAergic cells did not split out so naturally by layer, implying that their development may follow very different rules.

At the narrowest levels of clustering, the genes that defined cell types sometimes came as complete surprises. Within a group of GABAergic neurons known for producing high levels of the hormone somatostatin, the authors found a subtype of cells expressing an additional gene called Chodl. “Nobody has ever heard of this marker Chodl,” says Tasic. “But it’s the most beautiful pattern you’ve ever seen, because it’s only in that cell type. This is the beauty of transcriptomics.”

With luck, genes like Chodl will provide new clues to the roles of specific cell types. If no other neurons make use of this gene, it’s reasonable to think it may have a very specialized function. But even if that’s not the case, highly unique markers like Chodl are invaluable for studying neurons more closely, letting scientists design new molecular and genetic tools to target single cell types for follow-up research.

“I see this as a first step in allowing us to selectively manipulate cell types,” says Tasic. “And then you can do all sorts of things to those cells. You can label them specifically, and study their morphology. You can perturb them. You can inactivate them. I think this will be the way to truly understand what these different cells do.”

Mountains and Ridges

“Technically, this is a very impressive achievement,” says Joshua Sanes, a neurobiologist at the Harvard Center for Brain Science. “It’s using a really nice combination of state-of-the-art methods to address what, to me, is a big problem in neurobiology.”

Like the researchers at the Allen Institute, Sanes is interested in the problem of defining cell types. (Both his group and Hongkui Zeng’s receive funding from the national BRAIN Initiative, which has provided grants for big data-gathering projects to attack this question.) It’s a vexing issue, both because it requires such an immense amount of data to address, and because biology again and again rejects easy categories.

To Sanes, one of the most interesting aspects of Tasic and Menon’s paper is their decision to point out neurons with traits of more than one cell type. Unlike other groups that may exclude ambiguous data from analysis, the Allen Institute accepted cells with “intermediate” transcriptomes as important findings of their study. In some cases―most notably, a class of glutamatergic neurons in layer four of the cortex―these intermediate cells are so abundant that two or more supposedly separate “types” almost seem to merge together.

“That could mean that, although some cells are in types, there’s a certain amount of slipperiness,” says Sanes. “It’s been pretty hard to define neurons in a way that will help research move forward.”

It’s possible that some classes of neurons don’t exist in discrete types at all, but include a spectrum of cells expressing different mixes of the same genes. Or transcriptomes may just not be the best way to define cell types―because neurons of the same type change their RNA arsenals depending on their stage of development, or the chemical signals they’re responding to.

“Some parts of the overall phenotypic landscape may have features of a continuum,” says Tasic, but that doesn’t mean that her group’s proposed cell types are not useful ways of thinking about neurobiology. “If there are two mountains that are connected by a ridge, there are still two mountains. The fact that you have a ridge is fine. Maybe that’s biology.”

From Rosetta Stones to Searchable Databases

Tasic, Menon, and their colleagues identified 49 cell types altogether, but the number is less important than the process that produced it. Almost certainly, there are still new cell types to discover, and perhaps further divisions within the types the Allen Institute has identified.

“I think it’s extremely unlikely they’ve gotten all the types,” says Sanes. “It’s terrific, but it’s not like you should think of this as a complete catalogue.” To isolate single neurons, the Allen Institute used a method called FACS, which relies on sampling many different strains of transgenic mice to collect both abundant and rare cell types. The authors agree that this approach leaves open the possibility that some rare types were not sampled, and future studies will use different methods of capturing single cells, adding yet more data to the mix. (At his lab, Sanes is working with a new method called Drop-seq, which the Allen Institute also plans to adopt.)

For work like this to be meaningful, it’s not necessary for the Allen Institute to come up with a complete encyclopedia of cell types on its own. What is essential is that the data be made easily available to neuroscientists everywhere, to compare with their own studies and gradually refine with new discoveries.

Today, this is far from assured. A lot of research on cell types is only available through journal articles, and there are few standards for formatting data so it can be shared and understood across institutions. This is apparent in some of the detective work that Zeng’s team did to see if their proposed cell types matched any previously identified types. Tasic, Menon, and colleagues trawled through the scientific literature looking for what they called “Rosetta stones,” unique molecular features that could clearly be seen in their own transcriptome data.

In the future, this work could be made almost automatic, especially as objective data types like RNA sequencing information become more common. Just a few weeks ago, many of the first recipients of BRAIN Initiative grants―including both Zeng and Sanes―met in Bethesda, Md., to discuss plans for sharing neurobiological data, and ways to make that data more uniform and searchable.

“I think the BRAIN Initiative has been helpful in drawing attention and funding,” says Sanes. “The NIH is doing everything it can to ensure data sharing, and I think the community is going along with that well.”

In the meantime, Zeng’s group has released their raw transcriptome data to GEO, an NIH-supported database of RNA information, and made an annotated version of their data available online on the Allen Institute website. Tasic and Menon hope that outside researchers will use these resources to design more detailed studies of specific neuron types. Neuroscience is still in the earliest stages of data gathering, but to truly understand the brain, scientists will eventually have to make the leap into exploring function, cell type by cell type.

“We can find genes that are differentially expressed at the level of the whole brain, but we really don’t know what these genes do,” Tasic says. “Once you see that this gene is expressed in a specific type, you can formulate a hypothesis.”

 

http://casestudies.brain-map.org/celltaxb

 

Adult mouse cortical cell taxonomy revealed by single cell transcriptomics

Bosiljka Tasic, et al.

Nature Neuroscience(2016)   http://dx.doi.org:/10.1038/nn.4216

Nervous systems are composed of various cell types, but the extent of cell type diversity is poorly understood. We constructed a cellular taxonomy of one cortical region, primary visual cortex, in adult mice on the basis of single-cell RNA sequencing. We identified 49 transcriptomic cell types, including 23 GABAergic, 19 glutamatergic and 7 non-neuronal types. We also analyzed cell type–specific mRNA processing and characterized genetic access to these transcriptomic types by many transgenic Cre lines. Finally, we found that some of our transcriptomic cell types displayed specific and differential electrophysiological and axon projection properties, thereby confirming that the single-cell transcriptomic signatures can be associated with specific cellular properties.

Read Full Post »


Decipher Units of Brain

Larry H. Bernstein, MD, FCAP, Curator

LPBI

 

Can Big Genomic Data Reveal the Fundamental Units of the Brain?

Aaron Krol   http://www.bio-itworld.com/2016/1/20/can-big-genomic-data-reveal-fundamental-units-brain.html

 

January 20, 2016 | An adult mouse’s brain, an object not much bigger than the last joint of your pinky finger, contains around 75 million neurons. At the Allen Institute for Brain Science in Seattle, the Mouse Cell Types program, led by Hongkui Zeng, is trying to figure out just how many varieties of neurons make up this vast complex, and what makes each one unique.

Zeng’s research focuses on the primary visual cortex, a tiny sliver of the brain where signals from the eyes are processed and interpreted. Because vision is a relatively well-defined process, it’s thought to be a good model for connecting the behavior of individual neurons to larger brain functions.

“You really can’t understand a system until you understand its parts,” says Bosiljka Tasic, a founding member of the Mouse Cell Types program.

To a shocking extent, those parts are still a mystery. Many supposed cell types are based on little more than what you can see through a microscope: a neuron’s shape, or the pattern of rootlike dendrites extending from its body. These morphological traits, though important, are hard to see in full, and even harder to track methodically across thousands or millions of cells.

This month, Zeng’s team published a study in Nature Neuroscience that takes advantage of new technological developments to get a fine-grained look at the molecular toolkits of single neurons. Using newly refined methods to isolate single cells, Zeng’s lab collected over 1,600 brain cells from the visual cortexes of adult mice, intact and in good shape for sequencing. With advances in highly parallel, unbiased RNA sequencing, the group was able to measure each cell’s entire “transcriptome”―the array of RNA molecules that indicate which genes are actively producing proteins―at a depth that reveals even the scarcest RNA traces.

“We think this is probably the most comprehensive survey of a cortical area,” says Tasic, who co-led the study with her colleague Vilas Menon. “Many studies that are coming out now do very shallow sequencing… We wanted to go deeper.” With a median of 8.7 million sequencing reads per cell, the authors discovered a wealth of new RNA markers that define discrete groups of neurons. Some of these markers suggest that known cell types in the brain can be split into smaller sub-categories. A few even stake out rare types of neurons that may be new to science.

Yet the data collected for this study also confirms that the brain’s biology is neither tidy nor easy to unravel.

“There is this obsession in the field, and in many other areas of biology, that people always want cleanliness and discreteness,” Tasic says. Instead, her efforts to classify neurons have shown that “types” can be slippery, and many cells straddle the line between closely related groups. As projects like this one seek to redefine cell types for the genomics age, scientists will have to face these ambiguities and consider what they can tell us about the nature of the brain.

Patterns within Patterns

Whole transcriptomes provide an impressive amount of data with which to organize cells, but that data is hard to interpret in an unbiased way. “We’re trying, in some sense, to solve two problems simultaneously,” says Vilas Menon, co-lead author of the paper. “We’re trying to cluster the genes, and also to cluster the cells.”

To disentangle these problems, the team performed an iterative analysis. First, their software looked for RNA markers that diverged most widely between different cells, using those markers to sort all the cells in the study into large clusters. Then, they wiped the slate clean, looking for brand-new markers within each cluster to split the cells step by step into smaller groups. The smallest possible divisions, in which no new RNA markers could strongly distinguish cells from one another, became the group’s proposed “cell types.”

The researchers used two different computational methods to define clusters, but both revealed the same basic hierarchy of types. “In general, the higher level splits correspond to what’s already known for these broad classes of neurons,” says Menon. For instance, the first split simply divided all the neurons in their data from a handful of other cell types present in the brain, like the glial cells that support the brain’s physical structure. The second split separated GABAergic cells, which mostly damp down chemical signals in the brain, from glutamatergic cells, which mostly spark and amplify signals.

Beyond this point, the patterns became more revealing. Within the glutamatergic cells, for example, later clustering tended to split neurons according to how deeply they were embedded in the cortex. A mouse’s primary visual cortex is organized in six layers, and the Allen Institute’s transcriptome data suggests that the neurons in each layer may be closely related to one another, or have similar functions that require the same genes to be activated. Yet the GABAergic cells did not split out so naturally by layer, implying that their development may follow very different rules.

At the narrowest levels of clustering, the genes that defined cell types sometimes came as complete surprises. Within a group of GABAergic neurons known for producing high levels of the hormone somatostatin, the authors found a subtype of cells expressing an additional gene called Chodl. “Nobody has ever heard of this marker Chodl,” says Tasic. “But it’s the most beautiful pattern you’ve ever seen, because it’s only in that cell type. This is the beauty of transcriptomics.”

With luck, genes like Chodl will provide new clues to the roles of specific cell types. If no other neurons make use of this gene, it’s reasonable to think it may have a very specialized function. But even if that’s not the case, highly unique markers like Chodl are invaluable for studying neurons more closely, letting scientists design new molecular and genetic tools to target single cell types for follow-up research.

“I see this as a first step in allowing us to selectively manipulate cell types,” says Tasic. “And then you can do all sorts of things to those cells. You can label them specifically, and study their morphology. You can perturb them. You can inactivate them. I think this will be the way to truly understand what these different cells do.”

Mountains and Ridges

“Technically, this is a very impressive achievement,” says Joshua Sanes, a neurobiologist at the Harvard Center for Brain Science. “It’s using a really nice combination of state-of-the-art methods to address what, to me, is a big problem in neurobiology.”

Like the researchers at the Allen Institute, Sanes is interested in the problem of defining cell types. (Both his group and Hongkui Zeng’s receive funding from the national BRAIN Initiative, which has provided grants for big data-gathering projects to attack this question.) It’s a vexing issue, both because it requires such an immense amount of data to address, and because biology again and again rejects easy categories.

To Sanes, one of the most interesting aspects of Tasic and Menon’s paper is their decision to point out neurons with traits of more than one cell type. Unlike other groups that may exclude ambiguous data from analysis, the Allen Institute accepted cells with “intermediate” transcriptomes as important findings of their study. In some cases―most notably, a class of glutamatergic neurons in layer four of the cortex―these intermediate cells are so abundant that two or more supposedly separate “types” almost seem to merge together.

“That could mean that, although some cells are in types, there’s a certain amount of slipperiness,” says Sanes. “It’s been pretty hard to define neurons in a way that will help research move forward.”

It’s possible that some classes of neurons don’t exist in discrete types at all, but include a spectrum of cells expressing different mixes of the same genes. Or transcriptomes may just not be the best way to define cell types―because neurons of the same type change their RNA arsenals depending on their stage of development, or the chemical signals they’re responding to.

“Some parts of the overall phenotypic landscape may have features of a continuum,” says Tasic, but that doesn’t mean that her group’s proposed cell types are not useful ways of thinking about neurobiology. “If there are two mountains that are connected by a ridge, there are still two mountains. The fact that you have a ridge is fine. Maybe that’s biology.”

From Rosetta Stones to Searchable Databases

Tasic, Menon, and their colleagues identified 49 cell types altogether, but the number is less important than the process that produced it. Almost certainly, there are still new cell types to discover, and perhaps further divisions within the types the Allen Institute has identified.

“I think it’s extremely unlikely they’ve gotten all the types,” says Sanes. “It’s terrific, but it’s not like you should think of this as a complete catalogue.” To isolate single neurons, the Allen Institute used a method called FACS, which relies on sampling many different strains of transgenic mice to collect both abundant and rare cell types. The authors agree that this approach leaves open the possibility that some rare types were not sampled, and future studies will use different methods of capturing single cells, adding yet more data to the mix. (At his lab, Sanes is working with a new method called Drop-seq, which the Allen Institute also plans to adopt.)

For work like this to be meaningful, it’s not necessary for the Allen Institute to come up with a complete encyclopedia of cell types on its own. What is essential is that the data be made easily available to neuroscientists everywhere, to compare with their own studies and gradually refine with new discoveries.

Today, this is far from assured. A lot of research on cell types is only available through journal articles, and there are few standards for formatting data so it can be shared and understood across institutions. This is apparent in some of the detective work that Zeng’s team did to see if their proposed cell types matched any previously identified types. Tasic, Menon, and colleagues trawled through the scientific literature looking for what they called “Rosetta stones,” unique molecular features that could clearly be seen in their own transcriptome data.

In the future, this work could be made almost automatic, especially as objective data types like RNA sequencing information become more common. Just a few weeks ago, many of the first recipients of BRAIN Initiative grants―including both Zeng and Sanes―met in Bethesda, Md., to discuss plans for sharing neurobiological data, and ways to make that data more uniform and searchable.

“I think the BRAIN Initiative has been helpful in drawing attention and funding,” says Sanes. “The NIH is doing everything it can to ensure data sharing, and I think the community is going along with that well.”

In the meantime, Zeng’s group has released their raw transcriptome data to GEO, an NIH-supported database of RNA information, and made an annotated version of their data available online on the Allen Institute website. Tasic and Menon hope that outside researchers will use these resources to design more detailed studies of specific neuron types. Neuroscience is still in the earliest stages of data gathering, but to truly understand the brain, scientists will eventually have to make the leap into exploring function, cell type by cell type.

“We can find genes that are differentially expressed at the level of the whole brain, but we really don’t know what these genes do,” Tasic says. “Once you see that this gene is expressed in a specific type, you can formulate a hypothesis.”

 

Adult mouse cortical cell taxonomy revealed by single cell transcriptomics

Bosiljka Tasic, et al.       Nature Neuroscience(2016)       http://dx.doi.org:/10.1038/nn.4216

Nervous systems are composed of various cell types, but the extent of cell type diversity is poorly understood. We constructed a cellular taxonomy of one cortical region, primary visual cortex, in adult mice on the basis of single-cell RNA sequencing. We identified 49 transcriptomic cell types, including 23 GABAergic, 19 glutamatergic and 7 non-neuronal types. We also analyzed cell type–specific mRNA processing and characterized genetic access to these transcriptomic types by many transgenic Cre lines. Finally, we found that some of our transcriptomic cell types displayed specific and differential electrophysiological and axon projection properties, thereby confirming that the single-cell transcriptomic signatures can be associated with specific cellular properties.

 

Cell types summary and relationships.close

Cell types summary and relationships.

(ac) Constellation diagrams showing core and intermediate cells for all cell types. Core cells (N = 1,424 total, 664 GABAergic, 609 glutamatergic, 151 non-neuronal) are represented by colored disks

 

Read Full Post »


Size Matters

Larry H. Bernstein, MD, FCAP, Curator

LPBI

 

MinION Sequencing Untangles RNA Transcripts in a Difficult Gene

By Aaron Krol

http://www.bio-itworld.com/2015/11/3/minion-sequencing-untangles-rna-transcripts-difficult-gene.html

 

RNA isoforms are distinct versions of the same isoforms quotegene. Through a process called alternative splicing, the different subunits, or “exons,” that make up a gene can be reshuffled in new combinations. Many genes have two or more mutually exclusive exons, and which ones are actually expressed as RNA and protein can have big effects on cellular behavior ― in effect, expanding the protein arsenal of the genome.

 

November 3, 2015 | Brenton Graveley received his first MinION shipment in April 2014, at his lab at the University of Connecticut’s Institute of Systems Genomics. His lab was among the first to unwrap one of the candy bar-sized DNA sequencers made by Oxford Nanopore Technologies, and although its accuracy was shaky and its throughput low, right away Graveley and his colleagues could see it was producing real DNA data.

“I’m still amazed to this day that it works at all,” Graveley says. “It’s like Star Trek.”

A lot of buzz around the MinION has focused on its tiny size: early adopters have plotted to take MinIONs into outbreak zones and species-hunting tromps through the rainforest, working with bare-bones labs and laptop computers. But for Graveley, the size of the DNA strands the MinION reads is just as exciting as the size of the sequencer itself. That’s because most other sequencers rely on picking up chemical reactions that become more error-prone over time, meaning DNA can only be read in short fragments. The MinION, which reads genetic material by observing single molecules of DNA as they pass through extremely narrow “nanopores,” keeps producing data for as long as DNA is moving through the pore.

“You get the read length of whatever fragment you put into the MinION,” he says. “We’ve gotten reads that are over 100 kilobases,” hundreds or even thousands of times longer than researchers can expect with most other technologies.

Now, in a paper published in Genome Biology, Graveley and two of his lab members, post-doc Mohan Bolisetty and PhD student Gopinath Rajadinakaran, have shown how these read lengths can help explain the cellular behavior of Dscam1, one of the most difficult-to-study genes known to science. Related to a gene in humans that has been linked to Down syndrome ― the name stands for “Down Syndrome Cell Adhesion Molecule” ―Dscam1 plays a fundamental role in forming the architecture of insect brains. This single gene can produce thousands of subtly different proteins, an ability that makes it both a fascinating subject of research, and almost impossible to understand using standard sequencing technology.

 

Determining exon connectivity in complex mRNAs by nanopore sequencing

Mohan T. Bolisetty12, Gopinath Rajadinakaran1 and Brenton R. Graveley1*
Genome Biology 2015, 16:204       http://dx.doi.org:/10.1186/s13059-015-0777-z                    http://genomebiology.com/2015/16/1/204

Short-read high-throughput RNA sequencing, though powerful, is limited in its ability to directly measure exon connectivity in mRNAs that contain multiple alternative exons located farther apart than the maximum read length. Here, we use the Oxford Nanopore MinION sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be a powerful method for comprehensive transcriptome characterization.

High throughput RNA sequencing has revolutionized genomics and our understanding of the transcriptomes of many organisms. Most eukaryotic genes encode pre-mRNAs that are alternatively spliced [1]. In many genes, alternative splicing occurs at multiple places in the transcribed pre-mRNAs that are often located farther apart than the read lengths of most current high throughput sequencing platforms. As a result, several transcript assembly and quantitation software tools have been developed to address this [2], [3]. While these computational approaches do well with many transcripts, they generally have difficulty assembling transcripts of genes that express many isoforms. In fact, we have been unable to successfully assemble transcripts of complex alternatively spliced genes such as Dscam1 or Mhc using any transcript assembly software (data not shown). These software tools also have difficulty quantitating transcripts that have many isoforms, and for genes with distantly located alternatively spliced regions, they can only infer, and not directly measure, which isoforms may have been present in the original RNA sample [4]. For example, consider a gene containing two alternatively spliced exons located 2 kbp away from one another in the mRNA. If each exon is observed to be included at a frequency of 50 % from short read sequence data, it is impossible to determine whether there are two equally abundant isoforms that each contain or lack both exons, or four equally abundant isoforms that contain both, neither, or only one or the other exon.

Pacific Bioscience sequencing can generate read lengths sufficient to sequence full length cDNA isoforms and several groups have recently reported the use of this approach to characterize the transcriptome [5]. However, the large capital expense of this platform can be a prohibitive barrier for some users. Thus, it remains difficult to accurately and directly determine the connectivity of exons within the same transcript. The MinION nanopore sequencer from Oxford Nanopore requires a small initial financial investment, can generate extremely long reads, and has the potential to revolutionize transcriptome characterization, as well as other areas of genomics.

Several eukaryotic genes can encode hundreds to thousands of isoforms. For example, inDrosophila, 47 genes encode over 1,000 isoforms each [6]. Of these, Dscam1 is the most extensively alternatively spliced gene known and contains 115 exons, 95 of which are alternatively spliced and organized into four clusters [7]. The exon 4, 6, 9, and 17 clusters contain 12, 48, 33, and 2 exons, respectively. The exons within each cluster are spliced in a mutually exclusive manner and Dscam1 therefore has the potential to generate 38,016 different mRNA and protein isoforms. The variable exon clusters are also located far from one another in the mRNA and the exons within each cluster are up to 80 % identical to one another at the nucleotide level. Together, these characteristics present numerous challenges to characterize exon connectivity within full-length Dscam1 transcripts for any sequencing platform. Furthermore, though no other gene is as complex as Dscam1, many other genes have similar issues that confound the determination of exon connectivity.

We are interested in developing methods to perform simple and robust long-read sequencing of individual isoforms of Dscam1 and other complex alternatively spliced genes. Here, we use the Oxford Nanopore MinION to sequence ‘full-length’ cDNAs from four Drosophila genes – Rdl, MRP,Mhc, and Dscam1 – and identify a total of 7,899 distinct isoforms expressed by these four genes.

 

Similarity between alternative exons

We were interested in determining the feasibility of using the MinION nanopore sequencer to characterize the connectivity of distantly located exons in the mRNAs expressed from genes with complex splicing patterns. For the purposes of these experiments, we have focused on fourDrosophila genes with increasingly complex patterns of alternative splicing (Fig. 1). Resistant to dieldrin (Rdl) contains two clusters, each containing two mutually exclusive exons and therefore has the potential to generate four different isoforms (Fig. 1a). Multidrug-Resistance like Protein 1(MRP) contains two mutually exclusive exons in cluster 1 and eight mutually exclusive exons in cluster 2, and can generate 16 possible isoforms (Fig. 1b). Myosin heavy chain (Mhc) can potentially generate 180 isoforms due to five clusters of mutually exclusive exons – clusters 1 and 5 contain two exons, clusters 2 and 3 each contain three exons, and cluster 4 contains five exons. Finally, Dscam1 contains 12 exon 4 variants, 48 exon 6 variants, 33 exon 9 variants (Fig. 1d), and two exon 17 variants (not shown) and can potentially express 38,016 isoforms. For this study, however, we have focused only on the exon 3 through exon 10 region of Dscam1, which encompasses the 93 exon 4, 6, and 9 variants, and 19,008 potential isoforms (Fig. 1d).

thumbnail

Fig. 1. Schematic of the exon-intron structures of the genes examined in this study. a The Rdl gene contains two clusters (cluster one and two) which each contain two mutually exclusive exons. b The MRP gene contains contains two and eight mutually exclusive exons in clusters 1 and 2, respectively. Mhc contains two mutually exclusive exons in clusters 1 and 5, three mutually exclusive exons in clusters 2 and 3, and five mutually exclusive exons in cluster 4. The Dscam1 gene contains 12, 48, and 33 mutually exclusive exons in the exon 4, 6, and 9 clusters, respectively. For each gene, the constitutive exons are colored blue, while the variable exons are colored yellow, red, orange, green, or light blue

Because our nanopore sequence analysis pipeline uses LAST to perform alignments [8], we aligned all of the Rdl, MRP, Mhc, and Dscam1 exons within each cluster to one another using LAST to determine the extent of discrimination needed to accurately assign nanopore reads to a specific exon variant. For Rdl, each variable exon was only aligned to itself, and not to the other exon in the same cluster (data not shown). For MRP, the two exons within cluster 1 only align to themselves, and though the eight variable exons in cluster 2 do align to other exons, there is sufficient specificity to accurately assign nanopore reads to individual exons (Fig. 2a). For Mhc, the variable exons in cluster 1 and cluster 5 do not align to other exons, and the variable exons in cluster 2, cluster 3, and cluster 4 again align with sufficient discrimination to identify the precise exon present in the nanopore reads (Fig. 2b). Finally, for Dscam1, the difference in the LAST alignment scores between the best alignment (each exon to itself) and the second, third, and fourth best alignments are sufficient to identify the Dscam1 exon variant (Fig. 2c). This analysis indicates that for each gene in this study, LAST alignment scores are sufficiently distinct to identify the variable exons present in each nanopore read.

thumbnail

Fig. 2. Similarity distance between the variable alternative exons of MRP,Mhc, and Dscam1. a Violin plots of the LAST alignment scores of each variable exon within MRP cluster 1 and MRP cluster 2 to themselves and the second (2nd) best alignments. b Violin plots of the LAST alignment scores of each variable exon within each Mhc cluster to themselves and the second (2nd) best alignments. c Violin plots of the LAST alignment scores of each variable exon within each Dscam1 cluster to themselves (1st), and to the exons with the second (2nd), third (3rd) and fourth (4th) best alignments

Optimizing template switching in Dscam1 cDNA libraries

Template switching can occur frequently when libraries are prepared by PCR and can confound the interpretation of results [9], [10]. For example, CAM-Seq [11] and a similar method we independently developed called Triple-Read sequencing [12] to characterize Dscam1 isoforms, were found to have excessive template switching due to amplification during the library prep protocols. To assess template switching in our current study, we generated a spike-in mixture of in vitro transcribed RNAs representing six unique Dscam1 isoforms – Dscam1 4.2,6.32,9.31 , Dscam14.1,6.46,9.30 , Dscam1 4.3,6.33,9.9 , Dscam1 4.12,6.44,9.32 , Dscam1 4.7,6.8,9.15 , and Dscam1 4.5,6.4,9.4. We used 10 pg of this control spike-in mixture and prepared libraries for MinION sequencing by amplifying the exon 3 through exon 10 region for 20, 25, or 30 cycles of RT-PCR. We then end-repaired and dA-tailed the fragments, ligated adapters, and sequenced the samples on a MinION (7.3) for 12 h each. We obtained 33,736, 8,961, and 7,511 base-called reads from the 20, 25, and 30 cycle libraries, respectively. Consistent with the size of the exon 3 to 10 cDNA fragment being 1,806–1,860 bp in length, depending on the precise combination of exons it contains, most reads we observed were in this size range (Fig. 3a). We used Poretools [13] to convert the raw output files into fasta format and then used LAST to align the reads to a LAST database containing each variable exon. From these alignments, we identified reads that mapped to all three exon clusters, as well as the exon with the best alignment score within each cluster. When examining the alignments to each cluster independently, we found that for these spike-in libraries, all reads mapped uniquely to the exons present in the input isoforms. Therefore, any observed isoforms that were not present in the input pool were a result of template switching during the RT-PCR and library prep protocol and not due to false alignments or sequencing errors.

thumbnail

Fig. 3. Optimized RT-PCR minimizes template-switching for MinION sequencing. a Histogram of read lengths from MinION sequencing ofDscam1 spike-ins from the library generated using 25 cycles of PCR. bBar plot indicating the extent of template switching in Dscam1 spike-ins at different PCR cycles (left). The blue portions indicate the fraction of reads corresponding to input isoforms while the red portions correspond to the fraction of reads corresponding to template-switched isoforms. On the right, plots of the rank order versus number of reads (log10) for the 20, 25, and 30 cycle libraries. The blue dots indicate input isoforms while the red portions correspond to template-switched isoforms

When comparing the combinations of exons within each read to the input isoforms, we observed that 32 % of the reads from the 30 cycle library corresponded to isoforms generated by template switching (Fig. 3b). The template-switched isoforms observed by the greatest number of reads in the 30 cycle library were due to template switching between the two most frequently sequenced input isoforms. In most cases, template switching occurred somewhere within exon 7 or 8 and resulted in a change in exon 9. However, the extent of template switching was reduced to only 1 % in the libraries prepared using 25 cycles, and to 0.2 % in the libraries prepared using 20 cycles of PCR (Fig. 3b). Again, for these two libraries the most frequently sequenced template-switched isoforms involved the input isoforms that were also the most frequently sequenced. These experiments demonstrate that the MinION nanopore sequencer can be used to sequence ‘full length’ Dscam1 cDNAs with sufficient accuracy to identify isoforms and that the cDNA libraries can be prepared in a manner that results in a very small amount of template switching.

Dscam1 isoforms observed in adult heads

To explore the diversity of Dscam1 isoforms expressed in a biological sample, we prepared aDscam1 library from RNA isolated from D. melanogaster heads prepared from mixed male and female adults using 25 cycles of PCR and sequenced it for 12 h on the MinION nanopore sequencer obtaining a total of 159,948 reads of which 78,097 were template reads, 48,474 were complement reads, and 33,377 were 2D reads (Fig. 4a). We aligned the reads individually to the exon 4, 6, and 9 variants using LAST. A total of 28,971 reads could be uniquely or preferentially aligned to a single variant in all three clusters. For further analysis, we used all 16,419 2D read alignments and 31 1D reads when both template and complement aligned to same variant exons (not all reads with both a template and complement yield a 2D read). The remaining 12,521 aligned reads were 1D reads where there was either only a template or complement read, or when the template and complement reads disagreed with one another and were therefore not used further. We observed 92 of the 93 potential exon 4, 6, or 9 variants – only exon 6.11 was not observed in any read (Fig. 4f). To assess the accuracy of the results we performed RT-PCR using primers in the flanking constitutive exons that contained Illumina sequencing primers to separately amplify the Dscam1exon 4, 6, and 9 clusters from the same RNA used to prepare the MinION libraries, and sequenced the amplicons on an Illumina MiSeq. The frequency of variable exon use in each cluster was extremely consistent between the two methods (R 2  = 0.95, Fig. 5a).

Fig. 4. MinION sequencing of Dscam1 identified 7,874 isoforms. aHistogram of read length distribution for Drosophila head samples. b The total number of Dscam1 isoforms identified from MinION sequencing. cCumulative distribution of Dscam1 isoforms with respect to expression. dViolin plot of the number of isoforms identified using 100 random pools of the indicated number of reads. e Plot of the estimated number of total isoforms present in the library using the capture-recapture method with two random pools of the indicated number of reads. The shaded blue area indicates the 95 % confidence interval. f Deconvoluted expression of Dscam1 exon cluster variants (top) and the isoform connectivity of two highly expressed Dscam1 isoforms (bottom)

thumbnail

Fig. 5. Accuracy of Dscam1 sequencing results. a Comparison of the frequency of variable exon inclusion for the Dscam1 exon 4 (yellow), 6 (red), and 9 (orange) clusters as determined by nanopore sequencing or by amplicon sequencing using an Illumina MiSeq. b Percent identities (left) or LAST alignment scores (right) of full-length template, complement, and two directions (sequencing both template and complements) nanopore read alignments

Over their entire lengths, the 2D reads that map specifically to one exon 4, 6, and 9 variants map with an average 90.37 % identity and an average LAST score of approximately 1,200 (Fig. 5b). The 16,450 full length reads correspond to 7,874 unique isoforms, or 42 % of the 18,612 possible isoforms given the exon 4, 6, and 9 variants observed. We note, however, that while 4,385 isoforms were represented by more than one read, 3,516 of isoforms were represented by only one read indicating that the depth of sequencing has not reached saturation (Fig. 4b and c). This was further confirmed by performing a bootstrapped subsampling analysis (Fig. 4d) and by using the capture-recapture method to attempt to assess the complexity of isoforms present in the library (Fig. 4e), which suggests that over 11,000 isoforms are likely to be present, though even this analysis has not yet reached saturation. The most frequently observed isoforms were Dscam14.1,6.12,9.30 and Dscam1 4.1,6.1,9.30 which were observed with 30 and 25 reads, respectively (Fig. 4e). In conclusion, these results demonstrate the practical application of using the MinION nanopore sequencer to identify thousands of distinct Dscam1 isoforms in a single biological sample.

Nanopore sequencing of ‘full-length’ Rdl, MRP, and Mhc isoforms

To extend this approach to other genes with complex splicing patterns, we focused on Rdl, MRP, and Mhc which have the potential to generate four, 16, and 180 isoforms, respectively. We prepared libraries for each of these genes by RT-PCR using primers in the constitutive exons flanking the most distal alternative exons using 25 cycles of PCR, pooled the three libraries and sequenced them together on the MinION nanopore sequencer for 12 h obtaining a total of 22,962 reads. The input libraries for Rdl, MRP, and Mhc were 567 bp, 1,769-1,772 bp, and 3,824 bp, respectively. The raw reads were aligned independently to LAST indexes of each cluster of variable exons. The alignment results were then used to assign reads to their respective libraries, identify reads that mapped to all variable exon clusters for each gene, and the exon with the best alignment score within each cluster. In total, we obtained 301, 337, and 112 full length reads forRdl (Fig. 6), MRP (Fig. 7), and Mhc (Fig. 8), respectively. For Rdl, both variable exons in each cluster was observed, and accordingly all four possible isoforms were observed, though in each case the first exon was observed at a much higher frequency than the second exon (Fig. 6d). Interestingly, the ratio of isoforms containing the first versus second exon in the second cluster is similar for isoforms containing either the first exon or the second exon in the first cluster indicating that the splicing of these two clusters may be independent. For MRP, both exons in the first cluster were observed and all but one of the exons in the second cluster (exon B) were observed, though the frequency at which the exons in both clusters were used varied dramatically (Fig. 7d). For example, within the first cluster, exon B was observed 333 times while exon A was observed only four times. Similarly, in the second cluster, exon A was observed 157 times whereas exons B, E, F, and G were observed 0 times, thrice, once, and twice, respectively, and exons D, E, and H were observed between 40 and 76 times. As a result, we observed only nine MRP isoforms. For Mhc, we again observed strong biases in the exons observed in each of the five clusters (Fig. 8d). In the first cluster, exon B was observed more frequently than exon A. In the second cluster, 109 of the reads corresponded to exon A, while exons B and C were observed by only two and one read, respectively. In the third cluster, exon A was not observed at all while exons B and C were observed in roughly 80 % and 20 % of reads, respectively. In the fourth cluster, exon A was observed only once, exons B and C were not observed at all, exon E was observed 13 times while exon D was present in all of the remaining reads. Finally, in the fifth cluster, only exon B was observed. As with MRP, these strong biases and near or complete absences of exons in some of the clusters severely reduces the number of possible isoforms that can be observed. In fact, of the 180 potential isoforms encoded by Mhc, we observed only 12 isoforms. Various Mhc isoforms are known to be expressed in striking spatial and temporally restricted patterns [14] and thus it is likely that other Mhc isoforms that we did not observe, could be observed by sequencing other tissue samples.

thumbnail

Fig. 6. MinION sequencing of Rdl identified four isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)

thumbnail

Fig. 7. MinION sequencing of MRP identified nine isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)

thumbnail

Fig. 8. MinION sequencing of Mhc identified 12 isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)

Conclusions

Here we have demonstrated that nanopore sequencing with the Oxford Nanopore MinION can be used to easily determine the connectivity of exons in a single transcript, including Dscam1, the most complicated alternatively spliced gene known in nature. This is an important advance for several reasons. First, because short-read sequence data cannot be used to conclusively determine which exons are present in the same RNA molecule, especially for complex alternatively spliced genes, long-read sequence data are necessary to fully characterize the transcript structure and exon connectivity of eukaryotic transcriptomes. Second, although the Pacific Bioscience platform can perform long-read sequencing, there are several differences between it and the Oxford Nanopore MinION that could cause users to choose one platform over the other. In general, the quality of the sequence generated by the Pacific Bioscience is higher than that currently generated by the Oxford Nanopore MinION. This is largely due to the fact that each molecule is sequenced multiple times on the Pacific Bioscience platform yielding a high quality consensus sequence whereas on the Oxford Nanopore MinION, each molecule is sequenced at most twice (in the template and complement). We have previously used the Pacific Bioscience platform to characterize Dscam1 isoforms and found that it works well, though due to the large amount of cDNA needed to generate the libraries, many cycles of PCR are necessary and we observed an extensive amount of template switching, making it impractical to use for these experiments (BRG, unpublished data). However, over the past year that we have been involved in the MAP, the quality of sequence has steadily increased. As this trend is likely to continue, the difference in sequence quality between these two platforms is almost certain to shrink. Nonetheless, as we demonstrate, the current quality of the data is more than sufficient to allow us to accurately distinguish between highly similar alternatively spliced isoforms of the most complex gene in nature. Third, the ability to accurately characterize alternatively spliced transcripts with the Oxford Nanopore MinION makes this technology accessible to a much broader range of researchers than was previously possible. This is in part due to the fact that, in contrast to all other sequencing platforms, very little capital expense is needed to acquire the sequencer. Moreover, the MinION is truly a portable sequencer that could literally be used in the field (provided one has access to an Internet connection), and due to its size, almost no laboratory space is required for its use.

Although nanopore sequencing has many exciting and potentially disruptive advantages, there are several areas in which improvement is needed. First, although we were able to accurately identify over 7,000 Dscam1 isoforms with an average identity of full-length alignments >90 %, there are several situations in which this level of accuracy will be insufficient to determine transcript structure. For instance, there are many micro-exons in the human genome [15], and these exons would be difficult to identify if they overlapped a portion of a read that contained errors. Additionally, small unannotated exons could be difficult to identify for similar reasons. Second, the current number of usable reads is lower than that which will be required to perform whole transcriptome analysis. One issue that plagues transcriptome studies is that the majority of the sequence generated comes from the most abundant transcripts. Thus, with the current throughput, numerous runs would be needed to generate a sufficient number of reads necessary to sample transcripts expressed at a low level. In fact, this is one reason that we chose in this study, to begin by targeting specific genes rather than attempting to sequence the entire transcriptome. We do note, however, that over the past year of our participation in the MAP, the throughput of the Oxford Nanopore MinION has increased, and it is reasonable to expect additional improvements in throughput that should make it possible to generate a sufficient number of long reads to deeply interrogate even the most complex transcriptome.

In conclusion, we anticipate that nanopore sequencing of whole transcriptomes, rather than targeted genes as we have performed here, will be a rapid and powerful approach for characterizing isoforms, especially with improvements in the throughput and accuracy of the technology, and the simplification and/or elimination of the time-consuming library preparations.

 

The Tangled Transcriptome

Graveley’s lab studies the transcriptome, the mass of RNA molecules in living cells whose job is to translate DNA into proteins. The transcriptome is a sort of snapshot of which parts of the genome are active at a given time and place. Which genes are transcribed into RNA, and in what quantities, changes from organ to organ and even cell to cell, and can vary over an organism’s lifetime or in response to environmental changes.

Of particular interest to Graveley are those RNA molecules than can take different shapes, or “isoforms,” depending on random chance or what the cell needs at a particular time. RNA isoforms are distinct versions of the same isoforms quotegene. Through a process called alternative splicing, the different subunits, or “exons,” that make up a gene can be reshuffled in new combinations. Many genes have two or more mutually exclusive exons, and which ones are actually expressed as RNA and protein can have big effects on cellular behavior ― in effect, expanding the protein arsenal of the genome.

“For the entire field of transcriptomics and gene function, knowing what isoforms are expressed is critical,” says Graveley. “Most genes are complicated, especially in humans, and have alternative splicing that occurs at multiple places.”

That brings us to the challenge of Dscam1, the world record holder for alternative splicing. In fruit flies, a particularly well-studied model organism, Dscam1 is made up of 115 exons, only 20 of which are always transcribed into RNA. The other 95 exist in four “clusters” of mutually exclusive exons, and as a result, over 38,000 possible isoforms of Dscam1 have been predicted.

“This is by far, an order of magnitude, more than any other gene,” Graveley explains. This flexibility makes sense in light of Dscam1’s function. The protein it makes helps to “identify” single neurons in the insect brain, making them distinct enough from their neighbors for these cells to assemble a neural circuit on principles of like avoiding like. In experiments where Dscam1 has been altered to make fewer RNA isoforms, the neural wiring breaks down during development, sometimes severely enough to kill the flies.

Dscam1 also plays a role in the insect immune system, another reason for it to produce a huge variety of isoforms. Each of these molecules might be more or less effective at fighting certain pathogens.

It’s frustratingly hard, however, to figure out exactly which isoforms are in a specific sample. Graveley has been working on Dscam1 in fruit flies for more than a decade, but very basic questions remain unanswered: are some isoforms more common, or more important, than others? Are all the theoretical isoforms expressed? Do the isoforms have different behaviors, or are they just arbitrary ways of tagging neurons?

Size Matters

The trouble is the current state of the art in sequencing technology, which reads just a couple of hundred DNA bases at a time. That works great for identifying which exons are present in the transcriptome, but it’s no good for saying which mix of exons any specific strand of RNA is carrying. Different exons can lie thousands of bases apart on the RNA molecule, and there’s no way to bridge the gap between reads.

Graveley has tried a lot of solutions. He’s used the outdated Sanger sequencing method, which is much slower and more labor-intensive than modern sequencers, but does span longer reads. His lab also worked out a roundabout way of reconstructing RNA transcripts with contemporary Illumina sequencers, through a combination of chemistry and computational approaches.

“It worked,” he says, “but it was complicated by a lot of library preparation artifacts, and you basically had to jury-rig a genome analyzer to do something it was not supposed to do.”

Graveley’s preferred method is to use a sequencer produced by Pacific Biosciences, which, like the MinION, is built on long-read, single-molecule technology. PacBio sequencing is much better established than nanopores, and its results are known to be reliable; it also has the high throughput typical of modern instruments. For researchers working on alternative splicing, it’s clearly the technology to beat.

Unfortunately, it’s also very expensive. So Graveley’s team set out to learn whether the MinION, a low-throughput but extremely cheap alternative, could be an adequate substitute.

For the Genome Biology paper, the team focused on a 1.8-kilobase region of Dscam1 RNA that covers 93 of the gene’s 95 alternatively spliced exons. To get their samples, they crushed fruit fly heads, isolated Dscam1 RNA from the sample using a polymerase, and reverse-transcribed it into cDNA for sequencing. They also sequenced transcripts of three other alternatively spliced genes, Rdl, MRP, and Mhc.

splicing quote

The biggest concern for new applications of the MinION is its shaky accuracy. While most sequencers can achieve comfortably over 99% consensus with reference sequences, Graveley’s group has seen only about 90% identity with the MinION. That’s actually a little better than most MinION users have managed, although the device’s accuracy has been steadily improving. Users have had to pick their projects carefully to account for this: the device is pretty reliable in resequencing studies that map DNA reads to known references, but it’s still a dubious choice for sequencing unknown genetic material from scratch (although it’s been tried).

To accurately pin down the exact isoforms in the transcriptome, the MinION didn’t have to read every RNA molecule perfectly, but it did have to come close enough to decisively tell one exon from another ― and inDscam1, those exons could be as much as 80% identical.

In fact, Graveley and his co-authors found that the MinION was very capable of this. Out of around 33,000 high-quality Dscam1 reads pulled off the sequencer, almost 29,000 were a strong match for one and only one combination of exons. To further check their accuracy, the team also sequenced the same sample on Illumina technology. While the Illumina sequencer could not give whole isoforms, it did show the same proportions of different exons, suggesting that the MinION gave a complete and unbiased picture of the sample.

“Alternative splicing, it turns out, is probably one of the ideal applications for this platform,” Graveley says. “Even with a gene as complicated as this one, we’re able to accurately distinguish the isoforms from one another. Unless you have very, very small exons, or two exons that are almost identical to each other, the accuracy is good enough.”

Make Way for PromethION

The results are good news for researchers studying the transcriptome, but the MinION probably won’t push out other methods for dealing with alternative splicing just yet. Its low throughput means that at best it can cover a very small portion of the transcriptome with each run ― and that means isolating targeted RNA transcripts, a process that can introduce new biases into the data.

“You need a lot of reads to get the whole transcriptome, and what happens is you end up sequencing boring genes like actin and tubulin, the really abundantly expressed things,” Graveley explains. Still, his data from this experiment was good enough to replicate a few earlier findings: for instance, that Dscam1 does appear to make every predicted isoform. In this experiment, his lab observed almost half the possible isoforms, containing 92 of 93 possible exons.

Meanwhile, Oxford Nanopore Technologies is working on a new instrument, the PromethION, which will contain 48 MinION-style flow cells in a battery. Graveley has already signed on to be one of the first recipients, in an access program that is likely to start in the winter.

Judging by studies like this one, the PromethION stands a good chance of becoming the instrument of choice for large-scale RNA sequencing. With Dscam1, Graveley hopes to reach high enough throughput to do functional studies, seeking to learn whether different combinations of isoforms give rise to physical or behavioral differences. He also wants to look at human genes with high levels of alternative splicing, and to test whether the MinION can accurately count total numbers of RNA isoforms.

“The fact that you can use this technology to characterize whole isoforms is very exciting,” Graveley says. “It’s going to help us start characterizing the transcriptome in ways that have been very difficult.”

 

 

 

Read Full Post »


Reporter: Ritu Saxena, Ph.D.

Diabetes currently affects more than 336 million people worldwide, with healthcare costs by diabetes and its complications of up to $612 million per day in the US alone.  The islets of Langerhans, miniature endocrine organs within the pancreas, are essential regulators of blood glucose homeostasis and play a key role in the pathogenesis of diabetes.  Islets of Langerhans are composed of several types of endocrine cells.  The α- and β-cells are the most abundant and also the most important in that they secrete hormones (glucagon and insulin, respectively) crucial for glucose homeostasis (Bosco D, et al, Diabetes, May 2010;59(5):1202-10).

Diabetes is a ‘bihormonal’ disease, involving both insulin deficiency and excess glucagon.  For decades, insulin deficiency was considered to be the sole reason for diabetes; however, recent studies emphasize excess glucagon as an important part of diabetes etiology.  Thus, insulin-secreting β cells and glucagon-secreting α cells maintain physiological blood glucose levels, and their malfunction drives diabetes development.  Increasing the number of insulin-producing β cells while decreasing the number of glucagon-producing α cells, either in vitro in donor pancreatic islets before transplantation into type 1 diabetics or in vivo in type 2 diabetics, is a promising therapeutic avenue.  A huge leap has been taken in this direction by the researchers at the University of Pennsylvania (Philadelphia, PA) in collaboration with Oregon Health and Science University (Portland, OR), USA by demonstrating that α to β cell reprogramming could be promoted by manipulating the histone methylation signature of human pancreatic islets.  In fact, the treatment of cultured pancreatic islets with a histone methyltransferase inhibitor leads to colocalization of both glucagon and insulin and glucagon and insulin promoter factor 1 (PDX1) in human islets and colocalization of both glucagon and insulin in mouse islets.  The research findings were published in the Journal of Clinical Investigation.

Study design: First step was to study and analyze the epigenetic and transcriptional landscape of human pancreatic human pancreatic α, β, and exocrine cells using ChIP and RNA sequencing.  Study design for determination of the transcriptome and differential histone marks included the dispersion and FACS to of human islets to obtain cell populations highly enriched for α, β, and exocrine (duct and acinar) cells.  Then, chromatin was prepared for ChIP analysis using antibodies for histone modifications, H3K4me3 (represents gene activation) and H3K27me3 (represents gene repression).  RNA-Sequencing analysis was then performed to determine mRNA and lncRNA.  Sample purity was confirmed using qRT-PCR of insulin and glucagon expression levels of the individual α and β cell population revealing high sample purity.

Results:

  • Long noncoding transcripts: Long noncoding RNA molecules have been implicated as important developmental regulators, cell lineage allocators, and contributors to disease development.  The authors discovered 12 cell–specific and 5 α cell–specific noncoding (lnc) transcripts, indicative of the valuable research resource represented from transcriptome data.  Recently discovered lncRNA molecules in islets are regulated during development and dysregulated in type 2 diabetic islets.
  • Monovalent histone modification landscapes shared among three cell types:  Monovalent H3K4me3-enriched regions, indicative of gene activation, were identified and compared in α, β, and exocrine cells.  Strikingly, the vast majority of monovalently H3K4me3-marked genes were shared among the 3 pancreatic cell lineages (83%–95%), reflecting both their related function in protein secretion and common embryonic descent. Similarly, a high degree of overlap was observed in H3K27me3 modification patterns in all the three cell types (73%–83%).
  • Bivalent histone modifications (H3K4me3 and H3K27me3) were high in α cells: Bernstein colleagues observed bivalent marks to be common in undifferentiated cells, such as ES cells and pluripotent progenitor cells, and in most cases, one of the histone modification marks was lost during differentiation, accompanying lineage specification (Bernstein BE, et al, Cell, 21 Apr 2006; 125(2):315-26).  α cells exhibited many more genes bivalently marked, followed by β cells and exocrine cells.  Bivalent state was remarkably similar to that of hESC, suggesting a more plastic epigenomic state for α cells.
  • Monovalent histone modifications were high in β cells: Thousands of the genes that were in bivalent state in α cells were in a monovalent state, carrying only the activating or repressing mark.
  • Inhibition of histone methyltransferases led to partial cell-fate conversion: Adenosine dialdehye (Adox), a drug that interferes with histone methylation and decreases H3K27me3, when administered in human islet tissue, led to decrease of H3K27me3 enrichment at the 3 gene loci that are originally expressed bivalently in α cells and monovalently in β cells:  MAFA, PDX1 and ARX.  Adox resulted in the occasional cooccurrence of glucagon and insulin granules within the same islet cell, which was not observed in untreated islets.  Thus, inhibition of histone methyltransferases leads to partial endocrine cell-fate conversion.

Conclusion:  α cells have been reprogrammed into β cell fate in various mouse models.  The reason, as proposed by the authors, might be the presence of more bivalently marked genes that confers a more plastic epigenomic state of the cells that probably drives them to the β cell fate.  Therefore, using epigenomic information of different cell types in pancreatic islets and harnessing it for subsequent manipulation of their epigenetic signature could be utilized to reprogram cells and hence provide a path for diabetes therapy.

Source: Bramswig NC, et al, Epigenomic plasticity enables human pancreatic α to β cell reprogramming. J Clin Invest, 22 Feb 2013. pii: 66514.

Related reading on Pharmaceutical Intelligence:

Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes

Therapeutic Targets for Diabetes and Related Metabolic Disorders

Reprogramming cell fate

CRACKING THE CODE OF HUMAN LIFE: Recent Advances in Genomic Analysis and Disease – Part IIC

2013 Genomics: The Era Beyond the Sequencing of the Human Genome: Francis Collins, Craig Venter, Eric Lander, et al.

Genome-Wide Detection of Single-Nucleotide and Copy-Number Variation of a Single Human Cell

SNAP: Predict Effect of Non-synonymous Polymorphisms: How well Genome Interpretation Tools could Translate to the Clinic

Genomic Endocrinology and its Future

Read Full Post »


Curator: Aviva Lev-Ari, PhD, RN

Sunitinib brings Adult acute lymphoblastic leukemia (ALL) to Remission – RNA Sequencing – FLT3 Receptor Blockade

https://pharmaceuticalintelligence.com/2012/07/09/sunitinib-brings-adult-all-to-remission-rna-sequencing/

 

Read Full Post »


Curator: Aviva Lev-Ari, PhD, RN

Sunitinib brings Adult acute lymphoblastic leukemia (ALL) to Remission – RNA Sequencing – FLT3 Receptor Blockade

https://pharmaceuticalintelligence.com/2012/07/09/sunitinib-brings-adult-all-to-remission-rna-sequencing/

 

Read Full Post »

Older Posts »