Posts Tagged ‘isoforms’

Size Matters

Larry H. Bernstein, MD, FCAP, Curator



MinION Sequencing Untangles RNA Transcripts in a Difficult Gene

By Aaron Krol


RNA isoforms are distinct versions of the same isoforms quotegene. Through a process called alternative splicing, the different subunits, or “exons,” that make up a gene can be reshuffled in new combinations. Many genes have two or more mutually exclusive exons, and which ones are actually expressed as RNA and protein can have big effects on cellular behavior ― in effect, expanding the protein arsenal of the genome.


November 3, 2015 | Brenton Graveley received his first MinION shipment in April 2014, at his lab at the University of Connecticut’s Institute of Systems Genomics. His lab was among the first to unwrap one of the candy bar-sized DNA sequencers made by Oxford Nanopore Technologies, and although its accuracy was shaky and its throughput low, right away Graveley and his colleagues could see it was producing real DNA data.

“I’m still amazed to this day that it works at all,” Graveley says. “It’s like Star Trek.”

A lot of buzz around the MinION has focused on its tiny size: early adopters have plotted to take MinIONs into outbreak zones and species-hunting tromps through the rainforest, working with bare-bones labs and laptop computers. But for Graveley, the size of the DNA strands the MinION reads is just as exciting as the size of the sequencer itself. That’s because most other sequencers rely on picking up chemical reactions that become more error-prone over time, meaning DNA can only be read in short fragments. The MinION, which reads genetic material by observing single molecules of DNA as they pass through extremely narrow “nanopores,” keeps producing data for as long as DNA is moving through the pore.

“You get the read length of whatever fragment you put into the MinION,” he says. “We’ve gotten reads that are over 100 kilobases,” hundreds or even thousands of times longer than researchers can expect with most other technologies.

Now, in a paper published in Genome Biology, Graveley and two of his lab members, post-doc Mohan Bolisetty and PhD student Gopinath Rajadinakaran, have shown how these read lengths can help explain the cellular behavior of Dscam1, one of the most difficult-to-study genes known to science. Related to a gene in humans that has been linked to Down syndrome ― the name stands for “Down Syndrome Cell Adhesion Molecule” ―Dscam1 plays a fundamental role in forming the architecture of insect brains. This single gene can produce thousands of subtly different proteins, an ability that makes it both a fascinating subject of research, and almost impossible to understand using standard sequencing technology.


Determining exon connectivity in complex mRNAs by nanopore sequencing

Mohan T. Bolisetty12, Gopinath Rajadinakaran1 and Brenton R. Graveley1*
Genome Biology 2015, 16:204          

Short-read high-throughput RNA sequencing, though powerful, is limited in its ability to directly measure exon connectivity in mRNAs that contain multiple alternative exons located farther apart than the maximum read length. Here, we use the Oxford Nanopore MinION sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be a powerful method for comprehensive transcriptome characterization.

High throughput RNA sequencing has revolutionized genomics and our understanding of the transcriptomes of many organisms. Most eukaryotic genes encode pre-mRNAs that are alternatively spliced [1]. In many genes, alternative splicing occurs at multiple places in the transcribed pre-mRNAs that are often located farther apart than the read lengths of most current high throughput sequencing platforms. As a result, several transcript assembly and quantitation software tools have been developed to address this [2], [3]. While these computational approaches do well with many transcripts, they generally have difficulty assembling transcripts of genes that express many isoforms. In fact, we have been unable to successfully assemble transcripts of complex alternatively spliced genes such as Dscam1 or Mhc using any transcript assembly software (data not shown). These software tools also have difficulty quantitating transcripts that have many isoforms, and for genes with distantly located alternatively spliced regions, they can only infer, and not directly measure, which isoforms may have been present in the original RNA sample [4]. For example, consider a gene containing two alternatively spliced exons located 2 kbp away from one another in the mRNA. If each exon is observed to be included at a frequency of 50 % from short read sequence data, it is impossible to determine whether there are two equally abundant isoforms that each contain or lack both exons, or four equally abundant isoforms that contain both, neither, or only one or the other exon.

Pacific Bioscience sequencing can generate read lengths sufficient to sequence full length cDNA isoforms and several groups have recently reported the use of this approach to characterize the transcriptome [5]. However, the large capital expense of this platform can be a prohibitive barrier for some users. Thus, it remains difficult to accurately and directly determine the connectivity of exons within the same transcript. The MinION nanopore sequencer from Oxford Nanopore requires a small initial financial investment, can generate extremely long reads, and has the potential to revolutionize transcriptome characterization, as well as other areas of genomics.

Several eukaryotic genes can encode hundreds to thousands of isoforms. For example, inDrosophila, 47 genes encode over 1,000 isoforms each [6]. Of these, Dscam1 is the most extensively alternatively spliced gene known and contains 115 exons, 95 of which are alternatively spliced and organized into four clusters [7]. The exon 4, 6, 9, and 17 clusters contain 12, 48, 33, and 2 exons, respectively. The exons within each cluster are spliced in a mutually exclusive manner and Dscam1 therefore has the potential to generate 38,016 different mRNA and protein isoforms. The variable exon clusters are also located far from one another in the mRNA and the exons within each cluster are up to 80 % identical to one another at the nucleotide level. Together, these characteristics present numerous challenges to characterize exon connectivity within full-length Dscam1 transcripts for any sequencing platform. Furthermore, though no other gene is as complex as Dscam1, many other genes have similar issues that confound the determination of exon connectivity.

We are interested in developing methods to perform simple and robust long-read sequencing of individual isoforms of Dscam1 and other complex alternatively spliced genes. Here, we use the Oxford Nanopore MinION to sequence ‘full-length’ cDNAs from four Drosophila genes – Rdl, MRP,Mhc, and Dscam1 – and identify a total of 7,899 distinct isoforms expressed by these four genes.


Similarity between alternative exons

We were interested in determining the feasibility of using the MinION nanopore sequencer to characterize the connectivity of distantly located exons in the mRNAs expressed from genes with complex splicing patterns. For the purposes of these experiments, we have focused on fourDrosophila genes with increasingly complex patterns of alternative splicing (Fig. 1). Resistant to dieldrin (Rdl) contains two clusters, each containing two mutually exclusive exons and therefore has the potential to generate four different isoforms (Fig. 1a). Multidrug-Resistance like Protein 1(MRP) contains two mutually exclusive exons in cluster 1 and eight mutually exclusive exons in cluster 2, and can generate 16 possible isoforms (Fig. 1b). Myosin heavy chain (Mhc) can potentially generate 180 isoforms due to five clusters of mutually exclusive exons – clusters 1 and 5 contain two exons, clusters 2 and 3 each contain three exons, and cluster 4 contains five exons. Finally, Dscam1 contains 12 exon 4 variants, 48 exon 6 variants, 33 exon 9 variants (Fig. 1d), and two exon 17 variants (not shown) and can potentially express 38,016 isoforms. For this study, however, we have focused only on the exon 3 through exon 10 region of Dscam1, which encompasses the 93 exon 4, 6, and 9 variants, and 19,008 potential isoforms (Fig. 1d).


Fig. 1. Schematic of the exon-intron structures of the genes examined in this study. a The Rdl gene contains two clusters (cluster one and two) which each contain two mutually exclusive exons. b The MRP gene contains contains two and eight mutually exclusive exons in clusters 1 and 2, respectively. Mhc contains two mutually exclusive exons in clusters 1 and 5, three mutually exclusive exons in clusters 2 and 3, and five mutually exclusive exons in cluster 4. The Dscam1 gene contains 12, 48, and 33 mutually exclusive exons in the exon 4, 6, and 9 clusters, respectively. For each gene, the constitutive exons are colored blue, while the variable exons are colored yellow, red, orange, green, or light blue

Because our nanopore sequence analysis pipeline uses LAST to perform alignments [8], we aligned all of the Rdl, MRP, Mhc, and Dscam1 exons within each cluster to one another using LAST to determine the extent of discrimination needed to accurately assign nanopore reads to a specific exon variant. For Rdl, each variable exon was only aligned to itself, and not to the other exon in the same cluster (data not shown). For MRP, the two exons within cluster 1 only align to themselves, and though the eight variable exons in cluster 2 do align to other exons, there is sufficient specificity to accurately assign nanopore reads to individual exons (Fig. 2a). For Mhc, the variable exons in cluster 1 and cluster 5 do not align to other exons, and the variable exons in cluster 2, cluster 3, and cluster 4 again align with sufficient discrimination to identify the precise exon present in the nanopore reads (Fig. 2b). Finally, for Dscam1, the difference in the LAST alignment scores between the best alignment (each exon to itself) and the second, third, and fourth best alignments are sufficient to identify the Dscam1 exon variant (Fig. 2c). This analysis indicates that for each gene in this study, LAST alignment scores are sufficiently distinct to identify the variable exons present in each nanopore read.


Fig. 2. Similarity distance between the variable alternative exons of MRP,Mhc, and Dscam1. a Violin plots of the LAST alignment scores of each variable exon within MRP cluster 1 and MRP cluster 2 to themselves and the second (2nd) best alignments. b Violin plots of the LAST alignment scores of each variable exon within each Mhc cluster to themselves and the second (2nd) best alignments. c Violin plots of the LAST alignment scores of each variable exon within each Dscam1 cluster to themselves (1st), and to the exons with the second (2nd), third (3rd) and fourth (4th) best alignments

Optimizing template switching in Dscam1 cDNA libraries

Template switching can occur frequently when libraries are prepared by PCR and can confound the interpretation of results [9], [10]. For example, CAM-Seq [11] and a similar method we independently developed called Triple-Read sequencing [12] to characterize Dscam1 isoforms, were found to have excessive template switching due to amplification during the library prep protocols. To assess template switching in our current study, we generated a spike-in mixture of in vitro transcribed RNAs representing six unique Dscam1 isoforms – Dscam1 4.2,6.32,9.31 , Dscam14.1,6.46,9.30 , Dscam1 4.3,6.33,9.9 , Dscam1 4.12,6.44,9.32 , Dscam1 4.7,6.8,9.15 , and Dscam1 4.5,6.4,9.4. We used 10 pg of this control spike-in mixture and prepared libraries for MinION sequencing by amplifying the exon 3 through exon 10 region for 20, 25, or 30 cycles of RT-PCR. We then end-repaired and dA-tailed the fragments, ligated adapters, and sequenced the samples on a MinION (7.3) for 12 h each. We obtained 33,736, 8,961, and 7,511 base-called reads from the 20, 25, and 30 cycle libraries, respectively. Consistent with the size of the exon 3 to 10 cDNA fragment being 1,806–1,860 bp in length, depending on the precise combination of exons it contains, most reads we observed were in this size range (Fig. 3a). We used Poretools [13] to convert the raw output files into fasta format and then used LAST to align the reads to a LAST database containing each variable exon. From these alignments, we identified reads that mapped to all three exon clusters, as well as the exon with the best alignment score within each cluster. When examining the alignments to each cluster independently, we found that for these spike-in libraries, all reads mapped uniquely to the exons present in the input isoforms. Therefore, any observed isoforms that were not present in the input pool were a result of template switching during the RT-PCR and library prep protocol and not due to false alignments or sequencing errors.


Fig. 3. Optimized RT-PCR minimizes template-switching for MinION sequencing. a Histogram of read lengths from MinION sequencing ofDscam1 spike-ins from the library generated using 25 cycles of PCR. bBar plot indicating the extent of template switching in Dscam1 spike-ins at different PCR cycles (left). The blue portions indicate the fraction of reads corresponding to input isoforms while the red portions correspond to the fraction of reads corresponding to template-switched isoforms. On the right, plots of the rank order versus number of reads (log10) for the 20, 25, and 30 cycle libraries. The blue dots indicate input isoforms while the red portions correspond to template-switched isoforms

When comparing the combinations of exons within each read to the input isoforms, we observed that 32 % of the reads from the 30 cycle library corresponded to isoforms generated by template switching (Fig. 3b). The template-switched isoforms observed by the greatest number of reads in the 30 cycle library were due to template switching between the two most frequently sequenced input isoforms. In most cases, template switching occurred somewhere within exon 7 or 8 and resulted in a change in exon 9. However, the extent of template switching was reduced to only 1 % in the libraries prepared using 25 cycles, and to 0.2 % in the libraries prepared using 20 cycles of PCR (Fig. 3b). Again, for these two libraries the most frequently sequenced template-switched isoforms involved the input isoforms that were also the most frequently sequenced. These experiments demonstrate that the MinION nanopore sequencer can be used to sequence ‘full length’ Dscam1 cDNAs with sufficient accuracy to identify isoforms and that the cDNA libraries can be prepared in a manner that results in a very small amount of template switching.

Dscam1 isoforms observed in adult heads

To explore the diversity of Dscam1 isoforms expressed in a biological sample, we prepared aDscam1 library from RNA isolated from D. melanogaster heads prepared from mixed male and female adults using 25 cycles of PCR and sequenced it for 12 h on the MinION nanopore sequencer obtaining a total of 159,948 reads of which 78,097 were template reads, 48,474 were complement reads, and 33,377 were 2D reads (Fig. 4a). We aligned the reads individually to the exon 4, 6, and 9 variants using LAST. A total of 28,971 reads could be uniquely or preferentially aligned to a single variant in all three clusters. For further analysis, we used all 16,419 2D read alignments and 31 1D reads when both template and complement aligned to same variant exons (not all reads with both a template and complement yield a 2D read). The remaining 12,521 aligned reads were 1D reads where there was either only a template or complement read, or when the template and complement reads disagreed with one another and were therefore not used further. We observed 92 of the 93 potential exon 4, 6, or 9 variants – only exon 6.11 was not observed in any read (Fig. 4f). To assess the accuracy of the results we performed RT-PCR using primers in the flanking constitutive exons that contained Illumina sequencing primers to separately amplify the Dscam1exon 4, 6, and 9 clusters from the same RNA used to prepare the MinION libraries, and sequenced the amplicons on an Illumina MiSeq. The frequency of variable exon use in each cluster was extremely consistent between the two methods (R 2  = 0.95, Fig. 5a).

Fig. 4. MinION sequencing of Dscam1 identified 7,874 isoforms. aHistogram of read length distribution for Drosophila head samples. b The total number of Dscam1 isoforms identified from MinION sequencing. cCumulative distribution of Dscam1 isoforms with respect to expression. dViolin plot of the number of isoforms identified using 100 random pools of the indicated number of reads. e Plot of the estimated number of total isoforms present in the library using the capture-recapture method with two random pools of the indicated number of reads. The shaded blue area indicates the 95 % confidence interval. f Deconvoluted expression of Dscam1 exon cluster variants (top) and the isoform connectivity of two highly expressed Dscam1 isoforms (bottom)


Fig. 5. Accuracy of Dscam1 sequencing results. a Comparison of the frequency of variable exon inclusion for the Dscam1 exon 4 (yellow), 6 (red), and 9 (orange) clusters as determined by nanopore sequencing or by amplicon sequencing using an Illumina MiSeq. b Percent identities (left) or LAST alignment scores (right) of full-length template, complement, and two directions (sequencing both template and complements) nanopore read alignments

Over their entire lengths, the 2D reads that map specifically to one exon 4, 6, and 9 variants map with an average 90.37 % identity and an average LAST score of approximately 1,200 (Fig. 5b). The 16,450 full length reads correspond to 7,874 unique isoforms, or 42 % of the 18,612 possible isoforms given the exon 4, 6, and 9 variants observed. We note, however, that while 4,385 isoforms were represented by more than one read, 3,516 of isoforms were represented by only one read indicating that the depth of sequencing has not reached saturation (Fig. 4b and c). This was further confirmed by performing a bootstrapped subsampling analysis (Fig. 4d) and by using the capture-recapture method to attempt to assess the complexity of isoforms present in the library (Fig. 4e), which suggests that over 11,000 isoforms are likely to be present, though even this analysis has not yet reached saturation. The most frequently observed isoforms were Dscam14.1,6.12,9.30 and Dscam1 4.1,6.1,9.30 which were observed with 30 and 25 reads, respectively (Fig. 4e). In conclusion, these results demonstrate the practical application of using the MinION nanopore sequencer to identify thousands of distinct Dscam1 isoforms in a single biological sample.

Nanopore sequencing of ‘full-length’ Rdl, MRP, and Mhc isoforms

To extend this approach to other genes with complex splicing patterns, we focused on Rdl, MRP, and Mhc which have the potential to generate four, 16, and 180 isoforms, respectively. We prepared libraries for each of these genes by RT-PCR using primers in the constitutive exons flanking the most distal alternative exons using 25 cycles of PCR, pooled the three libraries and sequenced them together on the MinION nanopore sequencer for 12 h obtaining a total of 22,962 reads. The input libraries for Rdl, MRP, and Mhc were 567 bp, 1,769-1,772 bp, and 3,824 bp, respectively. The raw reads were aligned independently to LAST indexes of each cluster of variable exons. The alignment results were then used to assign reads to their respective libraries, identify reads that mapped to all variable exon clusters for each gene, and the exon with the best alignment score within each cluster. In total, we obtained 301, 337, and 112 full length reads forRdl (Fig. 6), MRP (Fig. 7), and Mhc (Fig. 8), respectively. For Rdl, both variable exons in each cluster was observed, and accordingly all four possible isoforms were observed, though in each case the first exon was observed at a much higher frequency than the second exon (Fig. 6d). Interestingly, the ratio of isoforms containing the first versus second exon in the second cluster is similar for isoforms containing either the first exon or the second exon in the first cluster indicating that the splicing of these two clusters may be independent. For MRP, both exons in the first cluster were observed and all but one of the exons in the second cluster (exon B) were observed, though the frequency at which the exons in both clusters were used varied dramatically (Fig. 7d). For example, within the first cluster, exon B was observed 333 times while exon A was observed only four times. Similarly, in the second cluster, exon A was observed 157 times whereas exons B, E, F, and G were observed 0 times, thrice, once, and twice, respectively, and exons D, E, and H were observed between 40 and 76 times. As a result, we observed only nine MRP isoforms. For Mhc, we again observed strong biases in the exons observed in each of the five clusters (Fig. 8d). In the first cluster, exon B was observed more frequently than exon A. In the second cluster, 109 of the reads corresponded to exon A, while exons B and C were observed by only two and one read, respectively. In the third cluster, exon A was not observed at all while exons B and C were observed in roughly 80 % and 20 % of reads, respectively. In the fourth cluster, exon A was observed only once, exons B and C were not observed at all, exon E was observed 13 times while exon D was present in all of the remaining reads. Finally, in the fifth cluster, only exon B was observed. As with MRP, these strong biases and near or complete absences of exons in some of the clusters severely reduces the number of possible isoforms that can be observed. In fact, of the 180 potential isoforms encoded by Mhc, we observed only 12 isoforms. Various Mhc isoforms are known to be expressed in striking spatial and temporally restricted patterns [14] and thus it is likely that other Mhc isoforms that we did not observe, could be observed by sequencing other tissue samples.


Fig. 6. MinION sequencing of Rdl identified four isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)


Fig. 7. MinION sequencing of MRP identified nine isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)


Fig. 8. MinION sequencing of Mhc identified 12 isoforms. a Histogram of read lengths. b The number of reads per isoform. c Cumulative distribution of isoforms with respect to expression. d The number of reads per alternative exon (top) and per isoform (below)


Here we have demonstrated that nanopore sequencing with the Oxford Nanopore MinION can be used to easily determine the connectivity of exons in a single transcript, including Dscam1, the most complicated alternatively spliced gene known in nature. This is an important advance for several reasons. First, because short-read sequence data cannot be used to conclusively determine which exons are present in the same RNA molecule, especially for complex alternatively spliced genes, long-read sequence data are necessary to fully characterize the transcript structure and exon connectivity of eukaryotic transcriptomes. Second, although the Pacific Bioscience platform can perform long-read sequencing, there are several differences between it and the Oxford Nanopore MinION that could cause users to choose one platform over the other. In general, the quality of the sequence generated by the Pacific Bioscience is higher than that currently generated by the Oxford Nanopore MinION. This is largely due to the fact that each molecule is sequenced multiple times on the Pacific Bioscience platform yielding a high quality consensus sequence whereas on the Oxford Nanopore MinION, each molecule is sequenced at most twice (in the template and complement). We have previously used the Pacific Bioscience platform to characterize Dscam1 isoforms and found that it works well, though due to the large amount of cDNA needed to generate the libraries, many cycles of PCR are necessary and we observed an extensive amount of template switching, making it impractical to use for these experiments (BRG, unpublished data). However, over the past year that we have been involved in the MAP, the quality of sequence has steadily increased. As this trend is likely to continue, the difference in sequence quality between these two platforms is almost certain to shrink. Nonetheless, as we demonstrate, the current quality of the data is more than sufficient to allow us to accurately distinguish between highly similar alternatively spliced isoforms of the most complex gene in nature. Third, the ability to accurately characterize alternatively spliced transcripts with the Oxford Nanopore MinION makes this technology accessible to a much broader range of researchers than was previously possible. This is in part due to the fact that, in contrast to all other sequencing platforms, very little capital expense is needed to acquire the sequencer. Moreover, the MinION is truly a portable sequencer that could literally be used in the field (provided one has access to an Internet connection), and due to its size, almost no laboratory space is required for its use.

Although nanopore sequencing has many exciting and potentially disruptive advantages, there are several areas in which improvement is needed. First, although we were able to accurately identify over 7,000 Dscam1 isoforms with an average identity of full-length alignments >90 %, there are several situations in which this level of accuracy will be insufficient to determine transcript structure. For instance, there are many micro-exons in the human genome [15], and these exons would be difficult to identify if they overlapped a portion of a read that contained errors. Additionally, small unannotated exons could be difficult to identify for similar reasons. Second, the current number of usable reads is lower than that which will be required to perform whole transcriptome analysis. One issue that plagues transcriptome studies is that the majority of the sequence generated comes from the most abundant transcripts. Thus, with the current throughput, numerous runs would be needed to generate a sufficient number of reads necessary to sample transcripts expressed at a low level. In fact, this is one reason that we chose in this study, to begin by targeting specific genes rather than attempting to sequence the entire transcriptome. We do note, however, that over the past year of our participation in the MAP, the throughput of the Oxford Nanopore MinION has increased, and it is reasonable to expect additional improvements in throughput that should make it possible to generate a sufficient number of long reads to deeply interrogate even the most complex transcriptome.

In conclusion, we anticipate that nanopore sequencing of whole transcriptomes, rather than targeted genes as we have performed here, will be a rapid and powerful approach for characterizing isoforms, especially with improvements in the throughput and accuracy of the technology, and the simplification and/or elimination of the time-consuming library preparations.


The Tangled Transcriptome

Graveley’s lab studies the transcriptome, the mass of RNA molecules in living cells whose job is to translate DNA into proteins. The transcriptome is a sort of snapshot of which parts of the genome are active at a given time and place. Which genes are transcribed into RNA, and in what quantities, changes from organ to organ and even cell to cell, and can vary over an organism’s lifetime or in response to environmental changes.

Of particular interest to Graveley are those RNA molecules than can take different shapes, or “isoforms,” depending on random chance or what the cell needs at a particular time. RNA isoforms are distinct versions of the same isoforms quotegene. Through a process called alternative splicing, the different subunits, or “exons,” that make up a gene can be reshuffled in new combinations. Many genes have two or more mutually exclusive exons, and which ones are actually expressed as RNA and protein can have big effects on cellular behavior ― in effect, expanding the protein arsenal of the genome.

“For the entire field of transcriptomics and gene function, knowing what isoforms are expressed is critical,” says Graveley. “Most genes are complicated, especially in humans, and have alternative splicing that occurs at multiple places.”

That brings us to the challenge of Dscam1, the world record holder for alternative splicing. In fruit flies, a particularly well-studied model organism, Dscam1 is made up of 115 exons, only 20 of which are always transcribed into RNA. The other 95 exist in four “clusters” of mutually exclusive exons, and as a result, over 38,000 possible isoforms of Dscam1 have been predicted.

“This is by far, an order of magnitude, more than any other gene,” Graveley explains. This flexibility makes sense in light of Dscam1’s function. The protein it makes helps to “identify” single neurons in the insect brain, making them distinct enough from their neighbors for these cells to assemble a neural circuit on principles of like avoiding like. In experiments where Dscam1 has been altered to make fewer RNA isoforms, the neural wiring breaks down during development, sometimes severely enough to kill the flies.

Dscam1 also plays a role in the insect immune system, another reason for it to produce a huge variety of isoforms. Each of these molecules might be more or less effective at fighting certain pathogens.

It’s frustratingly hard, however, to figure out exactly which isoforms are in a specific sample. Graveley has been working on Dscam1 in fruit flies for more than a decade, but very basic questions remain unanswered: are some isoforms more common, or more important, than others? Are all the theoretical isoforms expressed? Do the isoforms have different behaviors, or are they just arbitrary ways of tagging neurons?

Size Matters

The trouble is the current state of the art in sequencing technology, which reads just a couple of hundred DNA bases at a time. That works great for identifying which exons are present in the transcriptome, but it’s no good for saying which mix of exons any specific strand of RNA is carrying. Different exons can lie thousands of bases apart on the RNA molecule, and there’s no way to bridge the gap between reads.

Graveley has tried a lot of solutions. He’s used the outdated Sanger sequencing method, which is much slower and more labor-intensive than modern sequencers, but does span longer reads. His lab also worked out a roundabout way of reconstructing RNA transcripts with contemporary Illumina sequencers, through a combination of chemistry and computational approaches.

“It worked,” he says, “but it was complicated by a lot of library preparation artifacts, and you basically had to jury-rig a genome analyzer to do something it was not supposed to do.”

Graveley’s preferred method is to use a sequencer produced by Pacific Biosciences, which, like the MinION, is built on long-read, single-molecule technology. PacBio sequencing is much better established than nanopores, and its results are known to be reliable; it also has the high throughput typical of modern instruments. For researchers working on alternative splicing, it’s clearly the technology to beat.

Unfortunately, it’s also very expensive. So Graveley’s team set out to learn whether the MinION, a low-throughput but extremely cheap alternative, could be an adequate substitute.

For the Genome Biology paper, the team focused on a 1.8-kilobase region of Dscam1 RNA that covers 93 of the gene’s 95 alternatively spliced exons. To get their samples, they crushed fruit fly heads, isolated Dscam1 RNA from the sample using a polymerase, and reverse-transcribed it into cDNA for sequencing. They also sequenced transcripts of three other alternatively spliced genes, Rdl, MRP, and Mhc.

splicing quote

The biggest concern for new applications of the MinION is its shaky accuracy. While most sequencers can achieve comfortably over 99% consensus with reference sequences, Graveley’s group has seen only about 90% identity with the MinION. That’s actually a little better than most MinION users have managed, although the device’s accuracy has been steadily improving. Users have had to pick their projects carefully to account for this: the device is pretty reliable in resequencing studies that map DNA reads to known references, but it’s still a dubious choice for sequencing unknown genetic material from scratch (although it’s been tried).

To accurately pin down the exact isoforms in the transcriptome, the MinION didn’t have to read every RNA molecule perfectly, but it did have to come close enough to decisively tell one exon from another ― and inDscam1, those exons could be as much as 80% identical.

In fact, Graveley and his co-authors found that the MinION was very capable of this. Out of around 33,000 high-quality Dscam1 reads pulled off the sequencer, almost 29,000 were a strong match for one and only one combination of exons. To further check their accuracy, the team also sequenced the same sample on Illumina technology. While the Illumina sequencer could not give whole isoforms, it did show the same proportions of different exons, suggesting that the MinION gave a complete and unbiased picture of the sample.

“Alternative splicing, it turns out, is probably one of the ideal applications for this platform,” Graveley says. “Even with a gene as complicated as this one, we’re able to accurately distinguish the isoforms from one another. Unless you have very, very small exons, or two exons that are almost identical to each other, the accuracy is good enough.”

Make Way for PromethION

The results are good news for researchers studying the transcriptome, but the MinION probably won’t push out other methods for dealing with alternative splicing just yet. Its low throughput means that at best it can cover a very small portion of the transcriptome with each run ― and that means isolating targeted RNA transcripts, a process that can introduce new biases into the data.

“You need a lot of reads to get the whole transcriptome, and what happens is you end up sequencing boring genes like actin and tubulin, the really abundantly expressed things,” Graveley explains. Still, his data from this experiment was good enough to replicate a few earlier findings: for instance, that Dscam1 does appear to make every predicted isoform. In this experiment, his lab observed almost half the possible isoforms, containing 92 of 93 possible exons.

Meanwhile, Oxford Nanopore Technologies is working on a new instrument, the PromethION, which will contain 48 MinION-style flow cells in a battery. Graveley has already signed on to be one of the first recipients, in an access program that is likely to start in the winter.

Judging by studies like this one, the PromethION stands a good chance of becoming the instrument of choice for large-scale RNA sequencing. With Dscam1, Graveley hopes to reach high enough throughput to do functional studies, seeking to learn whether different combinations of isoforms give rise to physical or behavioral differences. He also wants to look at human genes with high levels of alternative splicing, and to test whether the MinION can accurately count total numbers of RNA isoforms.

“The fact that you can use this technology to characterize whole isoforms is very exciting,” Graveley says. “It’s going to help us start characterizing the transcriptome in ways that have been very difficult.”





Read Full Post »

Infinity and AbbVie partner to develop and commercialise Duvelisib for cancer… for the treatment of chronic lymphocytic leukemia.

Duvelisib is a dual phosphoinositide-3-kinase (PI3K) delta and PI3K gamma inhibitor.  The delta and gamma isozymes are selectively expressed in leukocytes.    This article (at Dr. Melvin Crasto’s blog discusses the synthesis of Duvelisib and mentions additional clinical trials underway including a phase II trial for the treatment of patients with mild asthma undergoing allergen challenge, for the treatment of rheumatoid arthritis and for the treatment of refractory indolent non-Hodgkin’s lymphoma. Phase I clinical trials for the treatment of advanced hematological malignancies (including T-cell lymphoma and mantle cell lymphoma).  The drug was originally developed at Takeda subsidiary Intellikine.

Read Full Post »

Larry H Bernstein, MD, FCAP, Author and Curator – The Pathway to Understanding and Decision-making in Medicine

This dialogue is a series of discussions introducing several perspective on proteomics discovery, an emerging scientific enterprise in the -OMICS- family of disciplines that aim to clarify many of the challenges toward the understanding of disease and aiding in the diagnosis as well as guiding treatment decisions. Beyond that focus, it will contribute to personalized medical treatment in facilitating the identification of treatment targets for the pharmaceutical industry. Despite enormous advances in genomics research over the last two decades, there is a still a problem in reaching anticipated goals for introducing new targeted treatments that has seen repeated failures in stage III of clinical trials, and even when success has been achieved, it is temporal.  The other problem has been toxicity of agents widely used in chemotherapy.  Even though the genomic approach brings relieve to the issues of toxicity found in organic chemistry derivative blocking reactions, the specificity for the target cell without an effect on normal cells has been elusive.

This is not confined to cancer chemotherapy, but can also be seen in pain medication, and has been a growing problem in antimicrobial therapy.  The stumbling block has been inability to manage a multiplicity of reactions that also have to be modulated in a changing environment based on 3-dimension structure of proteins, pH changes, ionic balance, micro- and macrovascular circulation, and protein-protein and protein- membrane interactions. There is reason to consider that the present problems can be overcome through a much better modification of target cellular metabolism as we peel away the confounding and blinding factors with a multivariable control of these imbalances, like removing the skin of an onion.

This is the first of a series of articles, and for convenience we shall here  only emphasize the progress of application of proteomics to cardiovascular disease.

growth in funding proteomics 1990-2010

growth in funding proteomics 1990-2010

Part I.

Panomics: Decoding Biological Networks  (Clinical OMICs 2014; 5)

Technological advances such as high-throughput sequencing are transforming medicine from symptom-based diagnosis and treatment to personalized medicine as scientists employ novel rapid genomic methodologies to gain a broader comprehension of disease and disease progression. As next-generation sequencing becomes more rapid, researchers are turning toward large-scale pan-omics, the collective use of all omics such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, lipidomics and lipoprotein proteomics, to better understand, identify, and treat complex disease.

Genomics has been a cornerstone in understanding disease, and the sequencing of the human genome has led to the identification of numerous disease biomarkers through genome-wide association studies (GWAS). It was the goal of these studies that these biomarkers would serve to predict individual disease risk, enable early detection of disease, help make treatment decisions, and identify new therapeutic targets. In reality, however, only a few have gone on to become established in clinical practice. For example in human GWAS studies for heart failure at least 35 biomarkers have been identified but only natriuretic peptides have moved into clinical practice, where they are limited primarily for use as a diagnostic tool.

Proteomics Advances Will Rival the Genetics Advances of the Last Ten Years

Seventy percent of the decisions made by physicians today are influenced by results of diagnostic tests, according to N. Leigh Anderson, founder of the Plasma Proteome Institute and CEO of SISCAPA Assay Technologies. Imagine the changes that will come about when future diagnostics tests are more accurate, more useful, more economical, and more accessible to healthcare practitioners. For Dr. Anderson, that’s the promise of proteomics, the study of the structure and function of proteins, the principal constituents of the protoplasm of all cells.

In explaining why proteomics is likely to have such a major impact, Dr. Anderson starts with a major difference between the genetic testing common today, and the proteomic testing that is fast coming on the scene. “Most genetic tests are aimed at measuring something that’s constant in a person over his or her entire lifetime. These tests provide information on the probability of something happening, and they can help us understand the basis of various diseases and their potential risks. What’s missing is, a genetic test is not going to tell you what’s happening to you right now.”

Mass Spec-Based Multiplexed Protein Biomarkers

Clinical proteomics applications rely on the translation of targeted protein quantitation technologies and methods to develop robust assays that can guide diagnostic, prognostic, and therapeutic decision-making. The development of a clinical proteomics-based test begins with the discovery of disease-relevant biomarkers, followed by validation of those biomarkers.

“In common practice, the discovery stage is performed on a MS-based platform for global unbiased sampling of the proteome, while biomarker qualification and clinical implementation generally involve the development of an antibody-based protocol, such as the commonly used enzyme linked ELISA assays,” state López et al. in Proteome Science (2012; 10: 35–45). “Although this process is potentially capable of delivering clinically important biomarkers, it is not the most efficient process as the latter is low-throughput, very costly, and time-consuming.”

Part II.  Proteomics for Clinical and Research Use: Combining Protein Chips, 2D Gels and Mass Spectrometry in 

The next Step: Exploring the Proteome: Translation and Beyond

N. Leigh Anderson, Ph.D., Chief Scientific Officer, Large Scale Proteomics Corporation

Three streams of technology will play major roles in quantitative (expression) proteomics over the coming decade. Two-dimensional electrophoresis and mass spectrometry represent well-established methods for, respectively, resolving and characterizing proteins, and both have now been automated to enable the high-throughput generation of data from large numbers of samples.

These methods can be powerfully applied to discover proteins of interest as diagnostics, small molecule therapeutic targets, and protein therapeutics. However, neither offers a simple, rapid, routine way to measure many proteins in common samples like blood or tissue homogenates.

Protein chips do offer this possibility, and thus complete the triumvirate of technologies that will deliver the benefits of proteomics to both research and clinical users. Integration of efforts in all three approaches are discussed, highlighting the application of the Human Protein Index® database as a source of protein leads.



N. Leigh Anderson, Ph D. is Chief Scientific Officer of the Proteomics subsidiary of Large Scale Biology Corporation (LSBC).
Dr. Anderson obtained his B.A. in Physics with honors from Yale and a Ph.D. in Molecular Biology from Cambridge University
(England) where he worked with M. F. Perutz as a Churchill Fellow at the MRC Laboratory of Molecular Biology. Subsequently
he co-founded the Molecular Anatomy Program at the Argonne National Laboratory (Chicago) where his work in the development
of 2D electrophoresis and molecular database technology earned him, among other distinctions, the American Association for
Clinical Chemistry’s Young Investigator Award for 1982, the 1983 Pittsburgh Analytical Chemistry Award, 2008 AACC Outstanding
Research Award, and 2013 National Science Medal..

In 1985 Dr. Anderson co-founded LSBC in order to pursue commercial development and large scale applications of 2-D electro-
phoretic protein mapping technology. This effort has resulted in a large-scale proteomics analytical facility supporting research
work for LSBC and its pharmaceutical industry partners. Dr. Anderson’s current primary interests are in the automation of proteomics
technologies, and the expansion of LSBC’s proteomics databases describing drug effects and disease processes in vivo and in vitro.
Large Scale Biology went public in August 2000.

Part II. Plasma Proteomics: Lessons in Biomarkers and Diagnostics

Exposome Workshop
N Leigh Anderson
Washington 8 Dec 2011



The Clinical Plasma Proteome
• Plasma and serum are the dominant non-invasive clinical sample types
– standard materials for in vitro diagnostics (IVD)
• Proteins measured in clinically-available tests in the US
– 109 proteins via FDA-cleared or approved tests
• Clinical test costs range from $9 (albumin) to $122 (Her2)
• 90% of those ever approved are still in use
– 96 additional proteins via laboratory-developed tests (not FDA
cleared or approved)
– Total 205 proteins (≅ products of 211genes, excluding Ig’s)
• Clinically applied proteins thus account for
– About 1% of the baseline human proteome (1 gene :1 protein)
– About 10% of the 2,000+ proteins observed in deep discovery
plasma proteome datasets

“New” Protein Diagnostics Are FDA-Cleared at a Rate of ~1.5/yr:
Insufficient to Meet Dx or Rx Development Needs

FDA clearance of protein diagnostics

FDA clearance of protein diagnostics

A  Major Technology Gulf Exists Between Discovery

Proteomics and Routine Diagnostic Platforms

Two Streams of Proteomics
A.  Problem Technology
Basic biology: maximum proteome coverage (including PTM’s, splices) to
provide unbiased discovery of mechanistic information
• Critical: Depth and breadth
• Not critical: Cost, throughput, quant precision

B.  Discovery proteomics
Specialized proteomics field,
large groups,
complex workflows and informatics

Part III.  Addressing the Clinical Proteome with Mass Spectrometric Assays

N. Leigh Anderson, PhD, SISCAPA Assay Technologies, Inc.

protein changes in biological mechanisms

protein changes in biological mechanisms

No Increase in FDA Cleared Protein Tests in 20 yr

“New” Protein Tests in Plasma Are FDA-Cleared at a Rate of ~1.5/yr:
Insufficient to Meet Dx or Rx Development Needs

See figure above

An Explanation: the Biomarker Pipeline is Blocked at the Verification Step

Immunoassay Weaknesses Impact Biomarker Verification

1) Specificity: what actually forms the immunoassay sandwich – or prevents its
formation – is not directly visualized

2) Cost: an assay developed to FDA approvable quality costs $2-5M per



Immunoassay vs Hybrid MS-based assays

Immunoassay vs Hybrid MS-based assays

MASS SPECTROMETRY: MRM’s provide what is missing in..IMMUNOASSAYS:


MRM of Proteotypic Tryptic Peptides Provides Highly Specific Assays for Proteins > 1ug/ml in Plasma

Peptide-Level MS Provides High Structural Specificity
Multiple Reaction Monitoring (MRM) Quantitation



SISCAPA combines best features of immuno and MS

SISCAPA combines best features of immuno and MS

SISCAPA Process Schematic Diagram
Stable Isotope-labeled Standards with Capture on Anti-Peptide Antibodies

An automated process for SISCAPA targeted protein quantitation utilizes high affinity capture antibodies that are immobilized on magnetic beads

An automated process for SISCAPA targeted protein quantitation utilizes high affinity capture antibodies that are immobilized on magnetic beads

Antibodies sequence specific peptide binding

Antibodies sequence specific peptide binding

SISCAP target enrichmant

SISCAP target enrichmant

Multiple reaction monitoring (MRM) quantitation

Multiple reaction monitoring (MRM) quantitation



First SISCAP Assay - thyroglobulin

First SISCAP Assay – thyroglobulin

personalized reference range within population range

Glycemic control in DM

Glycemic control in DM

Part IV. National Heart, Lung, and Blood Institute Clinical

Proteomics Working Group Report
Christopher B. Granger, MD; Jennifer E. Van Eyk, PhD; Stephen C. Mockrin, PhD;
N. Leigh Anderson, PhD; on behalf of the Working Group Members*
Circulation. 2004;109:1697-1703 doi: 10.1161/01.CIR.0000121563.47232.2A

Abstract—The National Heart, Lung, and Blood Institute (NHLBI) Clinical Proteomics Working Group
was charged with identifying opportunities and challenges in clinical proteomics and using these as a
basis for recommendations aimed at directly improving patient care. The group included representatives
of clinical and translational research, proteomic technologies, laboratory medicine, bioinformatics, and
2 of the NHLBI Proteomics Centers, which form part of a program focused on innovative technology development.

This report represents the results from a one-and-a-half-day meeting on May 8 and 9, 2003. For the purposes
of this report, clinical proteomics is defined as the systematic, comprehensive, large-scale identification of
protein patterns (“fingerprints”) of disease and the application of this knowledge to improve patient care
and public health through better assessment of disease susceptibility, prevention of disease, selection of
therapy for the individual, and monitoring of treatment response. (Circulation. 2004;109:1697-1703.)
Key Words: proteins diagnosis prognosis genetics plasma

Part V.  Overview: The Maturing of Proteomics in Cardiovascular Research

Jennifer E. Van Eyk
Circ Res. 2011;108:490-498  doi: 10.1161/CIRCRESAHA.110.226894

Abstract: Proteomic technologies are used to study the complexity of proteins, their roles, and biological functions.
It is based on the premise that the diversity of proteins, comprising their isoforms, and posttranslational modifications
(PTMs) underlies biology.

Based on an annotated human cardiac protein database, 62% have at least one PTM (phosphorylation currently dominating),
whereas 25% have more than one type of modification.

The field of proteomics strives to observe and quantify this protein diversity. It represents a broad group of technologies
and methods arising from analytic protein biochemistry, analytic separation, mass spectrometry, and bioinformatics.
Since the 1990s, the application of proteomic analysis has been increasingly used in cardiovascular research.



Technology development and adaptation have been at the heart of this progress. Technology undergoes a maturation,

becoming routine and ultimately obsolete, being replaced by newer methods. Because of extensive methodological
improvements, many proteomic studies today observe 1000 to 5000 proteins.

Only 5 years ago, this was not feasible. Even so, there are still road blocks. Nowadays, there is a focus on obtaining
better characterization of protein isoforms and specific PTMs. Consequentl, new techniques for identification and
quantification of modified amino acid residues are required, as is the assessment of single-nucleotide polymorphisms
in addition to determination of the structural and functional consequences.

In this series, 4 articles provide concrete examples of how proteomics can be incorporated into cardiovascular
research and address specific biological questions. They also illustrate how novel discoveries can be made and
how proteomic technology has continued to evolve. (Circ Res. 2011;108:490-498.)
Key Words: proteomics technology protein isoform posttranslational modification polymorphism

Part VI.   The -omics era: Proteomics and lipidomics in vascular research

Athanasios Didangelos, Christin Stegemann, Manuel Mayr∗

King’s British Heart Foundation Centre, King’s College London, UK

Atherosclerosis 2012; 221: 12– 17

a b s t r a c t

A main limitation of the current approaches to atherosclerosis research is the focus on the investigation of individual
factors, which are presumed to be involved in the pathophysiology and whose biological functions are, at least in part, understood.

These molecules are investigated extensively while others are not studied at all. In comparison to our detailed
knowledge about the role of inflammation in atherosclerosis, little is known about extracellular matrix remodelling
and the retention of individual lipid species rather than lipid classes in early and advanced atherosclerotic lesions.

The recent development of mass spectrometry-based methods and advanced analytical tools are transforming
our ability to profile extracellular proteins and lipid species in animal models and clinical specimen with the goal
of illuminating pathological processes and discovering new biomarkers.

Fig. 1. ECM in atherosclerosis

Fig. 1. ECM in atherosclerosis. The bulk of the vascular ECM is synthesised by smooth muscle cells and composed primarily of collagens, proteoglycans and glycoproteins.During the early stages of atherosclerosis, LDL binds to the proteoglycans of the vessel wall, becomes modified, i.e. by oxidation (ox-LDL), and sustains a proinflammatory cascade that is proatherogenic

Lipidomics of atherosclerotic plaques

Lipidomics of atherosclerotic plaques

Fig. 2. Lipidomics of atherosclerotic plaques. Lipids were separated by ultra performance reverse phase
liquid chromatography on a Waters® ACQUITY UPLC® (HSS T3 Column, 100 mm × 2.1 mm i.d., 1.8 _m
particle size, 55 ◦C, flow rate 400 _L/min, Waters, Milford MA, USA) and analyzed on a quadrupole time-of-flight
mass spectrometer (Waters® SYNAPTTM HDMSTM system) in both positive (A) and negative ion mode (C).
In positive MS mode, lysophosphatidyl-cholines (lPCs) and lysophosphatidylethanolamines (lPEs) eluted first;
followed by phosphatidylcholines (PCs), sphingomyelin (SMs), phosphatidylethanol-amines (PEs) and cholesteryl
esters (CEs); diacylglycerols (DAGs) and triacylglycerols (TAGs) had the longest retention times. In negative MS mode,
fatty acids (FA) were followed by phosphatidyl-glycerols (PGs), phosphatidyl-inositols (PIs), phosphatidylserines (PS)
and PEs. The chromatographic peaks corresponding to the different classes were detected as retention time-mass to
charge ratio (m/z) pairs and their areas were recorded. Principal component analyses on 629 variables from triplicate
analysis (C1, 2, 3 = control 1, 2, 3; P1, 2, 3 = endarterectomy patient 1, 2, 3) demonstrated a clear separation of
atherosclerotic plaques and control radial arteries in positive (B) and negative (D) ion mode. The clustering of the
technical replicates and the central projection of the pooled sample within the scores plot confirm the reproducibility
of the analyses, and the Goodness of Fit test returned a chi-squared of 0.4 and a R-squared value of 0.6.

Challenges in mass spectrometry

Mass spectrometry is an evolving technology and the technological advances facilitate the detection and quantification
of scarce proteins. Nonetheless, the enrichment of specific subproteomes using differential solubilityor isolation of cellular
organelleswill remain important to increase coverage and, at least partially, overcome the inhomogeneity of diseased tissue,
one of the major factors affecting sample-to-sample variation.

Proteomics is also the method of choice for the identification of post-translational modifications, which play an essential
role in protein function, i.e. enzymatic activation, binding ability and formation of ECM structures. Again, efficient enrichment
is essential to increase the likelihood of identifying modified peptides in complex mixtures. Lipidomics faces similar challenges.
While the extraction of lipids is more selective, new enrichment methods are needed for scarce lipids as well as labile lipid
metabolites, that may have important bioactivity. Another pressing issue in lipidomics is data analysis, in particular the lack
of automated search engines that can analyze mass spectra obtained from instruments of different vendors. Efforts to
overcome this issue are currently underway.


Proteomics and lipidomics offer an unbiased platform for the investigation of ECM and lipids within atherosclerosis. In
combination, these innovative technologies will reveal key differences in proteolytic processes responsible for plaque rupture
and advance our understanding of ECM – lipoprotein interactions in atherosclerosis.


Virtualization in Proteomics: ‘Sakshat’ in India, at IIT Bombay(

Proteome Portraits (

A Protease for ‘Middle-down’ Proteomics(

Intrinsic Disorder in the Human Spliceosomal Proteome(



active site of eNOS (PDB_1P6L) and nNOS (PDB_1P6H).

active site of eNOS (PDB_1P6L) and nNOS (PDB_1P6H).

Table - metabolic  targets

Table – metabolic targets

HK-II Phosphorylation

Read Full Post »

Introduction to Translational Medicine (TM) – Part 1: Translational Medicine

Introduction to Translational Medicine (TM) – Part 1: Translational Medicine

Author and Curator: Larry H Bernstein, MD, FCAP


Curator: Aviva Lev-Ari, PhD, RN 


This document in the Series A: Cardiovascular Diseases e-Series Volume 4: Translational and Regenerative Medicine,  is a measure of the postgenomic and proteomic advances in the laboratory to the practice of clinical medicine.  The Chapters are preceded by several videos by prominent figures in the emergence of this transformative change.  When I was a medical student, a large body of the current language and technology that has extended the practice of medicine did not exist, but a new foundation, predicated on the principles of modern medical education set forth by Abraham Flexner, was sprouting.  The highlights of this evolution were:

  • Requirement for premedical education in biology, organic chemistry, physics, and genetics.
  • Medical education included two years of basic science education in anatomy, physiology, pharmacology, and pathology prior to introduction into the clinical course sequence of the last two years.
  • Post medical graduate education was an internship year followed by residency in pediatrics, OBGyn, internal medicine, general surgery, psychiatry, neurology, neurosurgery, pathology, radiology, and anesthesiology, emergency medicine.
  • Academic teaching centers were developing subspecialty centers in ophthalmology, ENT and head and neck surgery, cardiology and cardiothoracic surgery, and hematology, hematology/oncology, and neurology.
  • The expansion of postgraduate medical programs included significant postgraduate funding for programs by the National Institutes of Health, and the NIH had faculty development support in a system of peer-reviewed research grant programs in medical and allied sciences.

The period after the late 1980s saw a rapid expansion of research in genomics and drug development to treat emerging threats of infectious diseases as US had a large worldwide involvement after the end of the Vietnam War, and drug resistance was increasingly encountered (malaria, tick borne diseases, salmonellosis, pseudomonas aeruginosa, staphylococcus aureus, etc.).

Moreover, the post-millenium found a large, dwindling population of veterans who had served in WWII and Vietnam, and cardiovascular, musculoskeletal,  dementias, and cancer were now more common.  The Human Genome Project was undertaken to realign the existing knowledge of gene structure and genetic regulation with the needs for drug development, which was languishing in development failures due to unexpected toxicities.

A substantial disconnect existed between diagnostics and pharmaceutical development, which had been over-reliant on modification of known organic structures to increase potency and reduce toxicity.  This was about to change with changes in medical curricula, changes in residency programs and physicians cross-training in disciplines, and the emergence of bio-pharma, based on the emerging knowledge of the cell function, and at the same time, the medical profession was developing an evidence-base for therapeutics, and more pressure was placed on informed decision-making.

The great improvement in proteomics came from GCLC/MS-MS and is described in the video interview with Dr. Gyorgy Marko-Varga, Sweden, in video 1 of 3 (Advancing Translational Medicine).  This is a discussion that is focused on functional proteomics role in future diagnostics and therapy, involving a greater degree of accuracy in mass spectrometry (MS) than can be obtained by antibody-ligand binding, and is illustrated below, the last emphasizing the importance of information technology and predictive analytics

Thermo ScientificImmunoassays and LC–MS/MS have emerged as the two main approaches for quantifying peptides and proteins in biological samples. ELISA kits are available for quantification, but inherently lack the discriminative power to resolve isoforms and PTMs.

To address this issue we have developed and applied a mass spectrometry immunoassay–selected reaction monitoring (Thermo Scientific™ MSIA™ SRM technology) research method to quantify PCSK9 (and PTMs), a key player in the regulation of circulating low density lipoprotein cholesterol (LDL-C).

A Day in the (Future) Life of a Predictive Analytics Scientist


By Lars Rinnan, CEO, NextBridge   April 22, 2014

A look into a normal day in the near future, where predictive analytics is everywhere, incorporated in everything from household appliances to wearable computing devices.

During the test drive (of an automobile), the extreme acceleration makes your heart beat so fast that your personal health data sensor triggers an alarm. The health data sensor is integrated into the strap of your wrist watch. This data is transferred to your health insurance company, so you say a prayer that their data scientists are clever enough to exclude these abnormal values from your otherwise impressive health data. Based on such data, your health insurance company’s consulting unit regularly gives you advice about diet, exercise, and sleep. You have followed their advice in the past, and your performance has increased, which automatically reduced your insurance premiums. Win-win, you think to yourself, as you park the car, and decide to buy it.

In the clinical presentation at Harlan Krumholtz’ Yale Symposium, Prof. Robert Califf, Director of the Duke University Translational medicine Clinical Research Institute, defines translational medicine as effective translation of science to clinical medicine in two segments:

  1. Adherence to current standards
  2. Improving the enterprise by translating knowledge

He says that discrepancies between outcomes and medical science will bridge a gap in translation by traversing two parallel systems.

  1. Physician-health organization
  2. Personalized medicine

He emphasizes that the new basis for physician standards will be legitimized in the following:

  1. Comparative effectiveness (Krumholtz)
  2. Accountability

Some of these points are repeated below:

WATCH VIDEOS ON YOUTUBE  Harlan Krumholtz  complexity  integration map  progression  informatics

An interesting sidebar to the scientific medical advances is the huge shift in pressure on an insurance system that has coexisted with a public system in Medicare and Medicaid, initially introduced by the health insurance industry for worker benefits (Kaiser, IBM, Rockefeller), and we are undertaking a formidable change in the ACA.

The current reality is that actuarially, the twin system that has existed was unsustainable in the long term because it is necessary to have a very large pool of the population to spread the costs, and in addition, the cost of pharmaceutical development has driven consolidation in the industry, and has relied on the successes from public and privately funded research.  Corbett Report Nov 2013

(1979 ER Brown)  UCPress  Rockefeller Medicine Men   Liz Fowler VP of Wellpoint (designed ACA)

I shall digress for a moment and insert a video history of DNA, that hits the high points very well, and is quite explanatory of the genomic revolution in medical science, biology, infectious disease and microbial antibiotic resistance, virology, stem cell biology, and the undeniability of evolution.

DNA History

As I have noted above, genomics is necessary, but not sufficient.  The story began as replication of the genetic code, which accounted for variation, but the accounting for regulation of the cell and for metabolic processes was, and remains in the domain of an essential library of proteins. Moreover, the functional activity of proteins, at least but not only if they are catalytic, shows structural variants that is characterized by small differences in some amino acids that allow for separation by net charge and have an effect on protein-protein and other interactions.

Protein chemistry is so different from DNA chemistry that it is quite safe to consider that DNA in the nucleotide sequence does no more than establish the order of amino acids in proteins. On the other hand, proteins that we know so little about their function and regulation, do everything that matters including to set what and when to read something in the DNA.

Jose Eduardo de Salles Roselino

Chapters 2, 3, and 4 sequentially examine:

  • The causes and etiologies of cardiovascular diseases
  • The diagnosis, prognosis and risks determined by – biomarkers in serum, circulating cells, and solid tissue by contrast radiography
  • Treatment of cardiovascular diseases by translation of science from bench to bedside, including interventional cardiology and surgical repair

These are systematically examined within a framework of:

  • Genomics
  • Proteomics
  • Cardiac and Vascular Signaling
  • Platelet and Endothelial Signaling
  • Cell-protein interactions
  • Protein-protein interactions
  • Post-Translational Modifications (PTMs)
  • Epigenetics
  • Noncoding RNAs and regulatory considerations
  • Metabolomics (the metabolome)
  • Mitochondria and oxidative stress


Read Full Post »