Healthcare analytics, AI solutions for biological big data, providing an AI platform for the biotech, life sciences, medical and pharmaceutical industries, as well as for related technological approaches, i.e., curation and text analysis with machine learning and other activities related to AI applications to these industries.
Rewriting the Mathematics of Tumor Growth[1]; Teams Use Math Models to Sort Drivers from Passengers[2]: Two JNCI Reviews by Mike Martin Regarding Genomics, Cancer, and Mutation
Curator: Stephen J. Williams, Ph.D.
WordCloud Image Produced by Adam Tubman
Word Cloud By Danielle Smolyar
Recently, there has been extensive interest in the cancer research and oncology community on detecting those mutations responsible for the initiation and propagation of a neoplastic cell (driver mutations) versus those mutations that are randomly (or by selective pressures) acquired due to the genetic instability of the transformed cell. The impact of either type of mutation has been a topic for debate, with a recent article showing that some passenger mutations may actually be responsible for tumor survival. In addition many articles, highlighted on this site (and referenced below) in recent years have described the importance of classifying driver and passenger mutations for the purposes of more effective personalized medicine strategies directed against tumors. Two review articles by Mike Martin in the Journal of the National Cancer Institute (JCNI) shed light on the current efforts and successes to discriminate between these passenger and driver mutations and determine impact of each type of mutation to tumor growth. However, as described in the associated article, the picture is not as clear cut as previously thought and highlights some revolutionary findings. In Rewriting the Mathematics of Tumor Growth, researchers discovered that driver mutations may confer such a small growth advantage that, multiple mutations, including the so called passenger mutations are necessary in order to sustain tumor growth. In fact, much experimental evidence has suggested at least six defined genetic events may be necessary for the in-vitro transformation of human cells. The following table shows some of the genetic events required for in-vitro transformation in cell culture systems.
3 for anchorage independence (cyclin D1, dnp53, EGFR),Cyclin D1+dnp53 for immortalization
HOSE
6
CDK4, cyclin D, hTERT plus combination of either P53DD, myrAkt, and H-ras or P53DD, H-ras, c-myc Bcl2
(f)Sasaki(Kiyono)
5
HOSE
3
hTERTSV40 earlyH-ras orK-ras
(g)Liu(Bast)
2hTERT+ SV40 early
HOSE
3
Large ThTERTH-ras orc-erB-2
(h)Kusakari(Fujii)
2hTERT+large T
Rat
Fibroblasts
2
Large TH-ras
(i)Hirakawa
Did not analyze
Fibroblasts
2
Large TH-ras
(d)Rangarajan(Weinberg)
Large T
Mouse
MOSEIn p53-/- background
3
c-mycK-rasAkt
(j)Orsulic
Pig
Fibroblasts
6
p53DDhTERTCDK4H-ras c-myccyclin D1
(k)Adam(Counter)
5 need all butp53DD
Note: priming means events required to immortalize but not fully transform. * Note that both ability to form colonies in soft agarose and subsequently tested for tumor formation in immunocompromised mice.
a. Hahn, W. C., Counter, C. M., Lundberg, A. S., Beijersbergen, R. L., Brooks, M. W., and Weinberg, R. A. (1999) Creation of human tumour cells with defined genetic elements, Nature400, 464-468.
b. Kendall, S. D., Linardic, C. M., Adam, S. J., and Counter, C. M. (2005) A network of genetic events sufficient to convert normal human cells to a tumorigenic state, Cancer Res65, 9824-9828.
c. Sun, B., Chen, M., Hawks, C. L., Pereira-Smith, O. M., and Hornsby, P. J. (2005) The minimal set of genetic alterations required for conversion of primary human fibroblasts to cancer cells in the subrenal capsule assay, Neoplasia7, 585-593.
d. Rangarajan, A., Hong, S. J., Gifford, A., and Weinberg, R. A. (2004) Species- and cell type-specific requirements for cellular transformation, Cancer Cell6, 171-183.
e. Goessel, G., Quante, M., Hahn, W. C., Harada, H., Heeg, S., Suliman, Y., Doebele, M., von Werder, A., Fulda, C., Nakagawa, H., Rustgi, A. K., Blum, H. E., and Opitz, O. G. (2005) Creating oral squamous cancer cells: a cellular model of oral-esophageal carcinogenesis, Proc Natl Acad Sci U S A102, 15599-15604.
f. Sasaki, R., Narisawa-Saito, M., Yugawa, T., Fujita, M., Tashiro, H., Katabuchi, H., and Kiyono, T. (2009) Oncogenic transformation of human ovarian surface epithelial cells with defined cellular oncogenes,Carcinogenesis30, 423-431.
g. Liu, J., Yang, G., Thompson-Lanza, J. A., Glassman, A., Hayes, K., Patterson, A., Marquez, R. T., Auersperg, N., Yu, Y., Hahn, W. C., Mills, G. B., and Bast, R. C., Jr. (2004) A genetically defined model for human ovarian cancer, Cancer Res64, 1655-1663.
h. Kusakari, T., Kariya, M., Mandai, M., Tsuruta, Y., Hamid, A. A., Fukuhara, K., Nanbu, K., Takakura, K., and Fujii, S. (2003) C-erbB-2 or mutant Ha-ras induced malignant transformation of immortalized human ovarian surface epithelial cells in vitro, Br J Cancer89, 2293-2298.
i. Hirakawa, T., and Ruley, H. E. (1988) Rescue of cells from ras oncogene-induced growth arrest by a second, complementing, oncogene, Proc Natl Acad Sci U S A85, 1519-1523.
j. Orsulic, S., Li, Y., Soslow, R. A., Vitale-Cross, L. A., Gutkind, J. S., and Varmus, H. E. (2002) Induction of ovarian cancer by defined multiple genetic changes in a mouse model system, Cancer Cell1, 53-62.
k. Adam, S. J., Rund, L. A., Kuzmuk, K. N., Zachary, J. F., Schook, L. B., and Counter, C. M. (2007) Genetic induction of tumorigenesis in swine, Oncogene26, 1038-1045.
However it may be argued that the aforementioned experimental examples were produced in cell lines with a more stable genome than that which is seen in most tumors and had used traditional assays of transformation, such as growth in soft agarose and tumorigenicity in immunocompromised mice, as endpoints of transformation, and not representative of the tumor growth seen in the clinical setting.
Therefore Bert Vogelstein, M.D., along with collaborators around the world developed a model they termed the “sequential driver mutation theory”, in which they describe that driver mutations multiply over time with each mutation “slightly increasing the tumor growth rate through a process that depends on three factors”:
Driver mutation rate
The 0.4% selective growth advantage
Cell division time
This model was based on a combination of experimental data and computer simulations of gliobastoma multiforme and pancreatic adenocarcinoma. Most tumor models follow a Gompertz kinetics, which show how tumor growth is exponential but eventually levels off over time.
This new theory shows though that a tumor cell with only one driver mutation can only grow so much, until a second driver mutation is required. Using data for the COSMIC database (Catalog of Somatic Mutations in Cancer) together with analysis software CHASM (Cancer-specific High-throughput Annotation of Somatic Mutations) the researchers analyzed 713 mutations sequenced from 14 glioma patients and 562 mutations in nine pancreatic adenocarcinomas, revealing at least 100 tumor suppressor genes and 100 oncogenes altered. Therefore, the authors suggested these may be possible driver mutations, or at least mutations required for the sustained growth of these tumors. Applying this new model to data obtained from Dr. Giardiello’s publication concerning familial adenopolypsis in New England Journal of medicine in 19993 and 2000, the sequential driver mutation model predicted age distribution of FAP patients, number and size of polyps, and polyp growth rate than previous models. This surprising number of required driver mutations for full transformation was also verified in a study led by University of Texas Southwestern Medical Center biologist Jerry Shay, Ph.D., who noted “this team’s surprise nearly 45% of all colorectal candidate oncogenes (65 mutations) drove malignant proliferation”[3].
However, some investigators do not believe the model is complex enough to account for other factors involved in oncogenesis, such as epigenetic factors like methylation and acetylation. In addition the review also discusses host and tissue factors which may complicate the models, such as location where a tumor develops. However, most of the investigators interviewed for this review agreed that focusing on this long-term progression of the disease may give us clues to other potential druggable targets.
Teams Use Math Models to Sort Drivers From Passengers
A related review from Mike Martin in JNCI [2] describes a statistical method, published in 2009 Cancer Informatics[4], which distinguishes chromosomal abnormalities that can drive oncogenesis from passenger abnormalities. Chromosomal abnormalities, such as deletions, additions, and translocations are common in cancer. For instance, the well-known Philadelphia chromosome, a translocation between chromosome 9 and 22 which results in the BCR-ABL tyrosine kinase fusion protein is the molecular basis of chronic myelogenous leukemia.
In the report, Eytan Domany, Ph.D., from Weizmann Institute and several colleagues from University of Lausanne, University of Haifa and the Broad Institute were analyzing chromosomal aberrations in a subset of medulloblastoma, which had more gain and losses in chromosomes than had been attributed to the disease. Using a statistical method they termed a “volumetric sieve”, the investigators were able to identify driver versus passenger aberrations based on three filters:
Fraction of patients with the abnormality
Length of DNA involved in the aberrant chromosome
Abnormality’s copy number
Another method to sort the most “important” chromosomal aberrations from less relevant alterations is termed GISTIC[5], as the website describes is: a tool to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth (at the Broad Institute website http://www.broadinstitute.org/software/cprg/?q=node/31). The method allows for comparison across multiple tumors so noise is eliminated and improves consistency of analysis. This method had been successfully used to determine driver aberrations is mesotheliomas, leukemias, and identify new oncogenes in adenocarcinomas of the lung and squamous cell carcinoma of the esophagus.
Main references for the two Mike Martin articles are as follows:
3. Eskiocak U, Kim SB, Ly P, Roig AI, Biglione S, Komurov K, Cornelius C, Wright WE, White MA, Shay JW: Functional parsing of driver mutations in the colorectal cancer genome reveals numerous suppressors of anchorage-independent growth. Cancer research 2011, 71(13):4359-4365.
4. Shay T, Lambiv WL, Reiner-Benaim A, Hegi ME, Domany E: Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes. Cancer informatics 2009, 7:91-104.
Article 1.1 Advances in the Understanding of the Human Genome The Initiation and Growth of Molecular Biology and Genomics- Part I
Introduction and purpose
This material will cover the initiation phase of molecular biology, Part I; to be followed by the Human Genome Project, Part II; and concludes with Ubiquitin, it’s Role in Signaling and Regulatory Control, Part III. This article is first a continuation of a previous discussion on the role of genomics in discovery of therapeutic targets titled Directions for genomics in personalized medicine http://pharmaceuticalintelligence.com/2013/01/27/directions-for-genomics-in-personalized-medicine/
The previous article focused on key drivers of cellular proliferation, stepwise mutational changes coinciding with cancer progression, and potential therapeutic targets for reversal of the process. It also covers the race to delineation of the Human Genome, discovery methods and fundamental genomic patterns that are ancient in both animal and plant speciation.
This article reviews the web-like connections between early and later discoveries, as significant finding has led to novel hypotheses and many more findings over the last 75 years. This largely post WWII revolution has driven our understanding of biological and medical processes at an exponential pace owing to successive discoveries of chemical structure, the basic building blocks of DNA and proteins, of nucleotide and protein-protein interactions, protein folding, allostericity, genomic structure, DNA replication, nuclear polyribosome interaction, and metabolic control. In addition, the emergence of methods for copying, removal and insertion, and improvements in structural analysis as well as developments in applied mathematics have transformed the research framework.
In the Beginning
During the Second World War we had the discoveries of physics and the emergence out of the Manhattan Project of radioactive nuclear probes from E.O. Lawrence University of California Berkeley Laboratory. The use of radioactive isotopes led to the development of biochemistry and isolation of nucleotides, nucleosides, enzymes, and filling in of details of pathways for photosynthesis, for biosynthesis, and for catabolism. Perhaps a good start of the journey is a student of Neils Bohr named Max Delbruck (September 4, 1906 – March 9, 1981), who won the Nobel prize for discovering that bacteria become resistant to viruses (phages) as a result of genetic mutations, founded a new discipline called Molecular Biology, lifting the experimental work in Physiology to a systematic experimentation in biology with the rigor of Physics using radiation and virus probes on selected cells. In 1937 he turned to research on the genetics of Drosophila melanogaster at Caltech, and two years later he coauthored a paper, “The growth of bacteriophage”, reporting that the viruses replicate in one step, not exponentially. In 1942, he and Salvador Luria of Indiana University demonstrated that bacterial resistance to virus infection is mediated by random mutation. This research, known as the Luria-Delbrück experiment, notably applied mathematics to make quantitative predictions, and earned them the 1969 Nobel Prize in Physiology or Medicine, shared with Alfred Hershey. His inferences on genes’ susceptibility to mutation was relied on by physicist Erwin Schrödinger in his 1944 book, What Is Life?, which conjectured genes were an “aperiodic crystal” storing code-script and influenced Francis Crick and James D. Watson in their 1953 identification of cellular DNA’s molecular structure as a double helix.
Watson-Crick Double Helix Model
A new understanding of heredity and hereditary disease was possible once it was determined that DNA consists of two chains twisted around each other, or double helixes, of alternating phosphate and sugar groups, and that the two chains are held together by hydrogen bonds between pairs of organic bases—adenine (A) with thymine (T), and guanine (G) with cytosine (C). Modern biotechnology also has its basis in the structural knowledge of DNA—in this case the scientist’s ability to modify the DNA of host cells that will then produce a desired product, for example, insulin. The background for the work of the four scientists was formed by several scientific breakthroughs:
the progress made by X-ray crystallographers in studying organic macromolecules;
the growing evidence supplied by geneticists that it was DNA, not protein, in chromosomes that was responsible for heredity;
Erwin Chargaff’s experimental finding that there are equal numbers of A and T bases and of G and C bases in DNA;
and Linus Pauling’s discovery that the molecules of some proteins have helical shapes.
In 1962 James Watson (b. 1928), Francis Crick (1916–2004), and Maurice Wilkins (1916–2004) jointly received the Nobel Prize in physiology or medicine for their 1953 determination of the structure of deoxyribonucleic acid (DNA), performed with a knowledge of Chargaff’s ratios of the bases in DNA and having access to the X-ray crystallography of Maurice Wilkins and Rosalind Franklin at King’s College London. Because the Nobel Prize can be awarded only to the living, Wilkins’s colleague Rosalind Franklin (1920–1958), who died of cancer at the age of 37, could not be honored. Of the four DNA researchers, only Rosalind Franklin had any degrees in chemistry. Franklin completed her degree in 1941 in the middle of World War II and undertook graduate work at Cambridge with Ronald Norrish, a future Nobel Prize winner. She returning to Cambridge after a year of war service, presented her work and received the PhD in physical chemistry. Franklin then learned the X-ray crystallography in Paris and rapidly became a respected authority in this field. Returning to returned to England to King’s College London in 1951, her charge was to upgrade the X-ray crystallographic laboratory there for work with DNA.
Cold Spring Harbor Laboratory
I digress to the beginnings of the Cold Spring Harbor Laboratory. A significant part of the Laboratory’s life revolved around education with its three-week-long Phage Course, taught first in 1945 by Max Delbruck, the German-born, theoretical-physicist-turned-biologist. James D Watson first came to Cold Spring Harbor Laboratory with his thesis advisor, Salvador Luria, in the summer of 1948. Over its more than 25-year history, the Phage Course was the training ground for many notable scientists. The Laboratory’s annual scientific Symposium, has provided a unique highly interactive education about the exciting field of “molecular” biology. The 1953 symposium featured Watson coming from England to give the first public presentation of the DNA double helix. When he became the Laboratory’s director in 1968 he was determined to make the Laboratory an important center for advancing molecular biology, and he focused his energy on bringing large donations to the enterprise CSHNL. It became a magnate for future discovery at which James D. Watson became the Director in 1968, and later the Chancellor. This contribution has as great an importance as his Nobel Prize discovery.
Biochemistry and Molecular Probes comes into View
Moreover, at the same time, the experience of Nathan Kaplan and Martin Kamen at Berkeley working with radioactive probes was the beginning of an establishment of Lawrence-Livermore Laboratories role in metabolic studies, as reported in the previous paper. A collaboration between Sid Collowick, NO Kaplan and Elizabeth Neufeld at the McCollum Pratt Institute led to the transferase reaction between the two main pyridine nucleotides. Neufeld received a PhD a few years later from the University of California, Berkeley, under William Zev Hassid for research on nucleotides and complex carbohydrates, and did postdoctoral studies on non-protein sulfhydryl compounds in mitosis. Her later work at the NIAMDG on mucopolysaccharidoses. The Lysosomal Storage Diseases opened a new chapter on human genetic diseases when she found that the defects in Hurler and Hunter syndromes were due to decreased degradation of the mucopolysaccharides. When an assay became available for α-L-iduronidase in 1972, Neufeld was able to show that the corrective factor for Hurler syndrome that accelerates degradation of stored sulfated mucopolysaccharides was α-L-iduronidase.
The Hurler Corrective Factor. Purification and Some Properties (Barton, R. W., and Neufeld, E. F. (1971) J. Biol. Chem. 246, 7773–7779) The Sanfilippo A Corrective Factor. Purification and Mode of Action (Kresse, H., and Neufeld, E. F. (1972) J. Biol. Chem. 247, 2164–2170) _______________________________________________________
I mention this for two reasons: [1] We see a huge impetus for nucleic acids and nucleotides research growing in the 1950’s with a post WWII emergence of work on biological structure. [2] At the same time, the importance of enzymes in cellular metabolic processes runs parallel to that of the genetic code.
In 1959 Arthur Kornberg was a recipient of the Nobel prize for Physiology or Medicine based on his discovery of “the mechanisms in the biological synthesis of deoxyribonucleic acid” (DNA polymerase) together with Dr. Severo Ochoa of New York University. In the next 20 years Stanford University Department of Biochemistry became a top rated graduate program in biochemistry. Today, the Pfeffer Lab is distinguished for research into how human cells put receptors in the right place through Rab GTPases that regulate all aspects of receptor trafficking. Steve Elledge (1984-1989) at Harvard University is one of its graduates from the 1980s.
Transcription –RNA and the ribosome
In 2006, Roger Kornberg was awarded the Nobel Prize in Chemistry for identifying the role of RNA polymerase II and other proteins in transcribing DNA. He says that the process is something akin to a machine. “It has moving parts which function in synchrony, in appropriate sequence and in synchrony with one another”. The Kornbergs were the tenth family with closely-related Nobel laureates. The 2009 Nobel Prize in Chemistry was awarded to Venki Ramakrishnan, Tom Steitz, and Ada Yonath for crystallographic studies of the ribosome. The atomic resolution structures of the ribosomal subunits provide an extraordinary context for understanding one of the most fundamental aspects of cellular function: protein synthesis. Research on protein synthesis began with studies of microsomes, and three papers were published on the atomic resolution structures of the 50S and 30S the atomic resolution of structures of ribosomal subnits in 2000. Perhaps the most remarkable and inexplicable feature of ribosome structure is that two-thirds of the mass is composed of large RNA molecules, the 5S, 16S, and 23S ribosomal RNAs, and the remaining third is distributed among ~50 relatively small and innocuous proteins. The first step on the road to solving the ribosome structure was determining the primary structure of the 16S and 23S RNAs in Harry Noller’s laboratory. The sequences were rapidly followed by secondary structure models for the folding of the two ribosomal RNAs, in collaboration with Carl Woese, bringing the ribosome structure into two dimensions. The RNA secondary structures are characterized by an elaborate series of helices and loops of unknown structure, but other than the insights offered by the structure of transfer RNA (tRNA), there was no way to think about folding these structures into three dimensions. The first three-dimensional images of the ribosome emerged from Jim Lake’s reconstructions from electron microscopy (EM) (Lake, 1976).
Ada Yonath reported the first crystals of the 50S ribosomal subunit in 1980, a crucial step that would require almost 20 years to bring to fruition (Yonath et al., 1980). Yonath’s group introduced the innovative use of ribosomes from extremophilic organisms. Peter Moore and Don Engelman applied neutron scattering techniques to determine the relative positions of ribosomal proteins in the 30S ribosomal subunit at the same time. Elegant chemical footprinting studies from the Noller laboratory provided a basis for intertwining the RNA among the ribosomal proteins, but there was still insufficient information to produce a high resolution structure, but Venki Ramakrishnan, in Peter Moore’s laboratory did it with deuterated ribosome reconstitutions. Then the Yale group was ramping up its work on the H. marismortui crystals of the 50S subunit. Peter Moore had recruited long-time colleague Tom Steitz to work on this problem and Steitz was about to complete the final event in the pentathlon of Crick’s dogma, having solved critical structures of DNA polymerases, the glutaminyl tRNA-tRNA synthetase complex, HIV reverse transcriptase, and T7 RNA polymerase. In 1999 Steitz, Ramakrishnan, and Yonath all presented electron density maps of subunits at approximately 5 Å resolution, and the Noller group presented 10 Å electron density maps of the Thermus 70S ribosome. Peter Moore aptly paraphrased Churchill, telling attendees that this was not the end, but the end of the beginning. Almost every nucleotide in the RNA is involved in multiple stabilizing interactions that form the monolithic tertiary structure at the heart of the ribosome. Williamson J. The ribosome at atomic resolution. Cell 2009; 139:1041-1043. http://dx.doi.org/10.1016/j.cell.2009.11.028/http://www.sciencedirect.com/science/article/pii/S0092867409014536
This opened the door to new therapies. For example, in 2010 it was reported that Numerous human genes display dual coding within alternatively spliced regions, which give rise to distinct protein products that include segments translated in more than one reading frame. To resolve the ensuing protein structural puzzle, we identified human genes with alternative splice variants comprising a dual coding region at least 75 nucleotides in length and analyzed the structural status of the protein segments they encode. The inspection of their amino acid composition and predictions by the IUPred and PONDR® VSL2 algorithms suggest a high propensity for structural disorder in dual-coding regions. Kovacs E, Tompa P, liliom K, and Kalmar L. Dual coding in alternative reading frames correlates with intrinsic protein disorder. PNAS 2010. http://www.jstor.org/stable/25664997http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851785 http://www.pnas.org/content/107/12/5429.full.pdf
In 2012, it was shown that drug-bound ribosomes can synthesize a distinct subset of cellular polypeptides. The structure of a protein defines its ability to thread through the antibiotic-obstructed tunnel. Synthesis of certain polypeptides that initially bypass translational arrest can be stopped at later stages of elongation while translation of some proteins goes to completion. (Kannan K, Vasquez-Laslop N, and Mankin AS. Selective Protein Synthesis by Ribosomes with a Drug-Obstructed Exit Tunnel. Cell 2012; 151; 508-520.) http://dx.doi.org/10.1016/j.cell.2012.09.018 http://www.sciencedirect.com/science/article/pii/S0092867412011257
Mobility of genetic elements
Barbara McClintock received the Nobel Prize for Medicine for the discovery of the mobility of genetic elements, work that been done in that period. When transposons were demonstrated in bacteria, yeast and other organisms, Barbara rose to a stratospheric level in the general esteem of the scientific world, but she was uncomfortable about the honors. It was sufficient to have her work understood and acknowledged. Prof. Howard Green said of her, “There are scientists whose discoveries greatly transcend their personalities and their humanity. But those in the future who will know of Barbara only her discoveries will know only her shadow”. “In Memoriam – Barbara McClintock”. Nobelprize.org. 5 Feb 2013 http://www.nobelprize.org/nobel_prizes/medicine/laureates/1983/mcclintock-article.html/
She introduced her Nobel Lecture in 1983 with the following observation: “An experiment conducted in the mid-nineteen forties prepared me to expect unusual responses of a genome to challenges for which the genome is unprepared to meet in an orderly, programmed manner. In most known instances of this kind, the types of response were not predictable in advance of initial observations of them. It was necessary to subject the genome repeatedly to the same challenge in order to observe and appreciate the nature of the changes it induces…a highly programmed sequence of events within the cell that serves to cushion the effects of the shock. Some sensing mechanism must be present in these instances to alert the cell to imminent danger, and to set in motion the orderly sequence of events that will mitigate this danger”. She goes on to consider “early studies that revealed programmed responses to threats that are initiated within the genome itself, as well as others similarly initiated, that lead to new and irreversible genomic modifications. These latter responses, now known to occur in many organisms, are significant for appreciating how a genome may reorganize itself when faced with a difficulty for which it is unprepared”.
An experiment with Zea conducted in the summer of 1944 alerted her to the mobility of specific components of genomes involved the entrance of a newly ruptured end of a chromosome into a telophase nucleus. This experiment commenced with the growing of approximately 450 plants in the summer of 1944, each of which had started its development with a zygote that had received from each parent a chromosome with a newly ruptured end of one of its arms. The design of the experiment required that each plant be self-pollinated to isolate from the self-pollinated progeny new mutants that were expected to appear, and confine them to locations within the ruptured arm of a chromosome. Each mutant was expected to reveal the phenotype produced by a minute homozygous deficiency. Their modes of origin could be projected from the known behavior of broken ends of chromosomes in successive mitoses. Forty kernels from each self-pollinated ear were sown in a seedling bench in the greenhouse during the winter of 1944-45.
Some seedling mutants of the type expected overshadowed by segregants exhibiting bizarre phenotypes. These were variegated for type and degree of expression of a gene. Those variegated expressions given by genes associated with chlorophyll development were startingly conspicuous. Within any one progeny chlorophyll intensities, and their pattern of distribution in the seedling leaves, were alike. Between progenies, however, both the type and the pattern differed widely.
The effect of X-rays on chromosomes
Initial studies of broken ends of chromosomes began in the summer of 1931. By 1931, means of studying the beads on a string hypothesis was provided by newly developed methods of examining the ten chromosomes of the maize complement in microsporocytes in meiosis. The ten bivalent chromosomes are elongated in comparison to their metaphase lengths. Each chromosome
is identifiable by its relative length,
by the location of its centromere, which is readily observed at the pachytene stage, and
by the individuality of the chromomeres strung along the length of each chromosome.
At that time maize provided the best material for locating known genes along a chromosome arm, and also for precisely determining the break points in chromosomes that had undergone various types of rearrangement, such as translocations, inversions, etc. The recessive phenotypes in the examined plants arose from loss of a segment of a chromosome that carried the wild-type allele, and X-rays were responsible for inducing these deficiencies. A conclusion of basic significance could be drawn from these observations:
broken ends of chromosomes will fuse, 2-by-2, and
any broken end with any other broken end.
This principle has been amply proved in a series of experiments conducted over the years. In all such instances the break must sever both strands of the DNA double helix. This is a “double-strand break” in modern terminology. That two such broken ends entering a telophase nucleus will find each other and fuse, regardless of the initial distance that separates them, soon became apparent.
During the summer of 1931 she had seen plants in the maize field that showed variegation patterns resembling the one described for Nicotiana. Dr. McClintock was interested in selecting the variegated plants to determine the presence of a ring chromosome in each, and in the summer of 1932 with Dr. Stadler’s generous cooperation from Missouri, she had the opportunity to examine such plants. Each plant had a ring chromosome, but It was the behavior of this ring that proved to be significant. It revealed several basic phenomena. The following was noted:
In the majority of mitoses
replication of the ring chromosome produced two chromatids completely free from each other
could separate without difficulty in the following anaphase.
sister strand exchanges do occur between replicated or replicating chromatids
the frequency of such events increases with increase in the size of the ring.
these exchanges produce a double-size ring with two centromeres.
Mechanical rupture occurs in each of the two chromatid bridges formed at anaphase by passage of the two centromeres on the double-size ring to opposite poles of the mitotic spindle.
The location of a break can be at any one position along any one bridge.
The broken ends entering a telophase nucleus then fuse.
The size and content of each newly constructed ring depend on the position of the rupture that had occurred in each bridge.
The conclusion was that cells sense the presence in their nuclei of ruptured ends of chromosomes
then activate a mechanism that will bring together and then unite these ends
this will occur regardless of the initial distance in a telophase nucleus that separated the ruptured ends.
The ability of a cell to
sense these broken ends,
to direct them toward each other, and
then to unite them so that the union of the two DNA strands is correctly oriented,
is a particularly revealing example of the sensitivity of cells to all that is going on within them.
Evidence from gave unequivocal support for the conclusion that broken ends will find each other and fuse. The challenge is met by a programmed response. This may be necessary, as
both accidental breaks and
programmed breaks may be frequent.
If not repaired, such breaks could lead to genomic deficiencies having serious consequences.
A cell capable of repairing a ruptured end of a chromosome must sense the presence of this end in its nucleus. This sensing
activates a mechanism that is required for replacing the ruptured end with a functional telomere.
that such a mechanism must exist was revealed by a mutant that arose in the stocks.
this mutant would not allow the repair mechanism to operate in the cells of the plant.
Entrance of a newly ruptured end of a chromosome into the zygote is followed by the chromatid type of breakage-fusion-bridge cycle throughout mitoses in the developing plant. This suggested that the repair mechanism in the maize strains is repressed in cells producing
the male and female gametophytes and
also in the endosperm,
but is activated in the embryo.
The extent of trauma perceived by cells
whose nuclei receive a single newly ruptured end of a chromosome that the cell cannot repair,
and the speed with which this trauma is registered, was not appreciated until the winter of 1944-45.
By 1947 it was learned that the bizarre variegated phenotypes that segregated in many of the self-pollinated progenies grown on the seedling bench in the fall and winter of 1944-45, were due to the action of transposable elements. It seemed clear that
these elements must have been present in the genome,
and in a silent state previous to an event that activated one or another of them.
She concluded that some traumatic event was responsible for these activations. The unique event in the history of these plants relates to their origin. Both parents of the plants grown in 1944 had contributed a chromosome with a newly ruptured end to the zygote that gave rise to each of these plants. Detection of silent elements is now made possible with the aid of DNA cloning method. Silent AC (Activator) elements, as well as modified derivatives of them, have already been detected in several strains of maize. When other transposable elements are cloned it will be possible to compare their structural and numerical differences among various strains of maize. In any one strain of maize the number of silent but potentially transposable elements, as well as other repetitious DNAs, may be observed to change, and most probably in response to challenges not yet recognized. Telomeres are especially adapted to replicate free ends of chromosomes. When no telomere is present, attempts to replicate this uncapped end may be responsible for the apparent “fusions” of the replicated chromatids at the position of the previous break as well as for perpetuating the chromatid type of breakage-fusion-bridge cycle in successive mitoses. In conclusion, a genome may react to conditions for which it is unprepared, but to which it responds in a totally unexpected manner. Among these is
the extraordinary response of the maize genome to entrance of a single ruptured end of a chromosome into a telophase nucleus.
It was this event that was responsible for activations of potentially transposable elements that are carried in a silent state in the maize genome.
The mobility of these activated elements allows them to enter different gene loci and to take over control of action of the gene wherever one may enter.
Because the broken end of a chromosome entering a telophase nucleus can initiate activations of a number of different potentially transposable elements,
the modifications these elements induce in the genome may be explored readily.
In addition to
modifying gene action, these elements can
restructure the genome at various levels,
from small changes involving a few nucleotides,
to gross modifications involving large segments of chromosomes, such as
duplications,
deficiencies,
inversions,
and other reorganizations.
In the future attention undoubtedly will be centered on the genome, and with greater appreciation of its significance as a highly sensitive organ of the cell,
monitoring genomic activities and correcting common errors,
sensing the unusual and unexpected events,
and responding to them,
often by restructuring the genome.
We know about the elements available for such restructuring. We know nothing, however, about
how the cell senses danger and instigates responses to it that often are truly remarkable.
Source: 1983 Nobel Lecture. Barbara McClintock. THE SIGNIFICANCE OF RESPONSES OF THE GENOME TO CHALLENGE.
In 2009 the Nobel Prize in Physiology or Medicine was awarded to Elizabeth Blackburn, Carol Greider and Jack Szoztak for the discovery of Telomerase. This recognition came less than a decade after the completion of the Human Genome Project previously discussed. Prof. Blackburn acknowledges a strong influence coming from the work of Barbara McClintock. The discovery is tied to the pond organism Tetrahymena thermophila, and studies of yeast cells. Blackburn was drawn to science after reading the biography of Marie Curie by her daughter, Irina, as a child. She recalls that her Master’s mentor while studying the metabolism of glutamine in the rat liver, thought that every experiment should have the beauty and simplicity of a Mozart sonata. She did her PhD at the distinguished Laboratory for Molecular Biology at Cambridge, the epicenter of molecular biology sequencing the regions of bacteriophage phiX 174, a single stranded DNA bacteriophage. Using Fred Sanger’s methods to piece together RNA sequences she showed the first sequence of a 48 nucleotide fragment to her mathematical-gifted Cambridge cousin, who pointed out repeats of DNA sequence patterns! She worked on the sequencing of the DNA at the terminal regions of the short “minichromosomes” of the ciliated protozoan Tetrahymena thermophile at Yale in 1975. She continued her research begun at Yale at UCSF funded by the NIH based on an intriguing audiogram showing telomeric DNA in Tetrahymena. I describe the work as follows:
Prof. Blackburn incorporated 32P isotope labelled deoxynucleoside residues into the rDNA molecules for DNA repair enzymatic reactions and found that
the end regions were selectively labeled by combinations of 32P isotope radiolabled nucleoside triphosphate, and by mid-year she had an audiogram of the depurination products.
The audiogram showed sequences of 4 cytosine residues flanked by either an adenosine or a guanosine residue.
In 1976 she had deduced a sequence consisting of a tandem array of CCCAA repeats, and subsequently separated the products on a denaturing gel electrophoresis that appeared as tiger stripes extending up the gel.
The size of each band was 6 bases more than the band below it.
Telomere must have a telomerase!
The discovery of the telomerase enzyme activity was done by the Prize co-awardee, Carol Greider. They were trying to decipher the structure right at the termini of telomeres of both cliliated protozoans and yeast plasmids. The view that in mammalian telomeres there is a long protruding G-rich strand does not take into account the clear evidence for the short C strand repeat oligonucleotides that she discovered. This was found for both the Tetrahymena rDNA minichromosome molecules and linear plasmids purified from yeast. In contrast to nucleosomal regions of chromosomes, special regions of DNA, for example
promoters that must bind transcription initiation factors that control transcription, have proteins other than the histones on them.
The telomeric repeat tract turned out to be such a non-nucleosomal region.
They found that by clipping up chromatin using an enzyme that cuts the linker between neighboring nucleosomes,
it cut up the bulk of the DNA into nucleosome-sized pieces
but left the telomeric DNA tract as a single protected chunk.
The resulting complex of the telomeric DNA tract plus its bound cargo of protective proteins behaved very differently, from nucleosomal chromatin, and concluded that it had no histones or nucleosomes.
Any evidence for a protein on the bulk of the rDNA molecule ends, such as their behavior in gel electrophoresis and the appearance of the rDNA molecules under the electron microscope, was conspicuously lacking. This was reassuring that there was no covalently attached protein at the very ends of this minichoromosome. Despite considerable work, she was unable to determine what protein(s) would co-purify with the telomeric repeat tract DNA of Tetrahymena. It was yeast genetics and approaches done by others that turned out to provide the next great leaps forward in understanding telomeric proteins. Carol Greider, her colleague, noticed the need to scale up the telomerase activity preparations and they used a very large glass column for preparative gel filtration chromatography.
Jack W Szostak at the Howard Hughes Medical Institue at Harvard shared in the 2009 Nobel Prize. He became interested in molecular biology taking a course on the frontiers of Molecular Biology and reading about the experiments of Meselson-Stahl barely a decade earlier, and learned how the genetic code had been unraveled. The fact that one could deduce, from measurements of the radioactivity in fractions from a centrifuge tube, the molecular details of DNA replication, transcription and translation was astonishing. A highlight of his time at McGill was the open-book, open-discussion final exam in this class, in which the questions required the intense collaboration of groups of students.
At Cornell, Ithaca, he collaborated with John Stiles and they came up with a specific idea to chemically synthesize a DNA oligonucleotide of sufficient length that it would hybridize to a single sequence within the yeast genome, and then to use it as an mRNA and gene specific probe. At the time, there was only one short segment of the yeast genome for which the DNA sequence was known,
the region coding for the N-terminus of the iso-1 cytochrome c protein,
intensively studied by Fred Sherman The Sherman lab, in a tour de force of genetics and protein chemistry, had isolated
double-frameshift mutants in which the N-terminal region of the protein was translated from out-of-frame codons.
Protein sequencing of the wild type and frame-shifted mutants allowed them to deduce 44 nucleotides of DNA sequence.
If they could prepare a synthetic oligonucleotide that was complementary to the coding sequence, they could use it to detect the cytochrome-c mRNA and gene. At the time, essentially all experiments on mRNA were done on total cellular mRNA. Ray Wu was already well known for determining the sequence of the sticky ends of phage lambda, the first ever DNA to be sequenced, and his lab was deeply involved in the study of enzymes that could be used to manipulate and sequence DNA more effectively, but would not take on a project from another laboratory. So John went to nearby Rochester to do postdoctoral work with Sherman, and he was able to transfer to Ray Wu’s laboratory. In order to carry out his work, Ray Wu sent him to Saran Narang’s lab in Ottawa, and he received training there under Keichi Itakura, who synthesized the Insulin gene. A few months later, he received several milligrams of our long sought 15-mer. In collaboration with John Stiles and Fred Sherman, who sent us RNA and DNA samples from appropriate yeast strains, they were able to use the labeled 15-mer as a probe to detect the cyc1 mRNA, and later the gene itself. He notes that one of the delights of the world of science is that it is filled with people of good will who are more than happy to assist a student or colleague by teaching a technique or discussing a problem. He remained in Ray’s lab after completion of the PhD upon the arrival of Rodney Rothstein from Sherman’s lab in Rochester, who introduced him to yeast genetics, and he was prepared for the next decade of work on yeast.
first in recombination studies, and
later in telomere studies and other aspects of yeast biology.
His studies of recombination in yeast were enabled by the discovery, in Gerry Fink’s lab at Cornell, of a way to introduce foreign DNA into yeast. These pioneering studies of yeast transformation showed that circular plasmid DNA molecules could on occasion become integrated into yeast chromosomal DNA by homologous recombination.
His studies of unequal sister chromatid exchange in rDNA locus resulted in his first publication in the field of recombination.
The idea that you could increase transformation frequency by cutting the input DNA was pleasingly counterintuitive and led us to continue our exploration of this phenomenon. He gained an appointment to the Sidney-Farber Cancer Institute due to the interest of Prof. Ruth Sager, who gathered together a great group of young investigators. In work spearheaded by his first graduate student, Terry Orr-Weaver, on
double-strand breaks in DNA
and their repair by recombination (and continuing interaction with Rod Rothstein),
they were attracted to what kinds of reactions occur at the DNA ends.
It was at a Gordon Conference that he was excited hearing a talk by Elizabeth Blackburn on her work on telomeres in Tetrahymena.
This led to a collaboration testing the ability of Tetrahymena telomers to function in yeast.
He performed the experiments himself, and experienced the thrill of being the first to know that our wild idea had worked.
It was clear from that point on that a door had been opened and that they were going to be able to learn a lot about telomere function from studies in yeast.
Within a short time he was able to clone bona fide yeast telomeres, and (in a continuation of the collaboration with Liz Blackburn’s lab)
they obtained the critical sequence information that led (them) to propose the existence of the key enzyme, telomerase.
A fanciful depiction evoking both telomere dynamics and telomere researchers, done by the artist Julie Newdoll in 2008, elicits the idea of a telomere as an ancient Sumarian temple-like hive, tended by a swarm of ancient Sumarian Bee-goddesses against a background of clay tablets inscribed with DNA sequencing gel-like bands. Dr. Blackburn recalls owing much to Barbara McClintock for her scientific findings, but also, Barbara McClintock also gave her advice in a conversation with her in 1977, during which
she had unexpected findings with the rDNA end sequences.
Dr. McClintock urged her to trust in intuition about the scientific research results.
In this Part I of a series of 3, I have described the
emergence of Molecular Biology and
closely allied work on the mechanism of Cell Replication and
the dependence of metabolic processes on proteins and enzymatic conversions through a surge of
post WWII research that gave birth to centers for basic science research in biology and medicine in both US and in England, which was preceded by work in prewar Germany. This is to be followed by further developments related to the Human Genome Project.
Meiosis plays a crucial role in generating haploid gametes for sexual reproduction. In most organisms, the presence of crossovers between homologous chromosomes, in combination with connections between sister chromatids, creates a physical connection that ensures regular segregation of homologs at the first of the two meiotic divisions.
Abnormality in generating crossovers is the leading cause of miscarriage and birth defects.Crossovers also create new combinations of alleles, thus contributing to genetic diversity and evolution. Recent linkage disequilibrium and pedigree studies have shown that the distribution of recombination is highly uneven across the human genome, as in all studied organisms. Substantial recombination active regions are not conserved between humans and chimpanzees or among different human populations, suggesting that these regions are quickly evolving and might even be individual-specific. However, such variation in the human population would be masked by the population average, and resolution of this variation would require comparison of recombination genome-wide among many single genomes.
Whole-genome amplification (WGA) of single sperm cells was proposed decades ago to facilitate mapping recombination at the individual level. With the development of highthroughput genotyping technologies, whole–genome mapping of recombination events in single gametes of an individual is achievable and was recently demonstrated by performing WGA by multiple displacement amplification (MDA) on single sperm cells, followed by genotyping with DNA microarrays recently demonstrated by Wang et al.. However, due to the amplification bias and, consequently, insufficient marker density, the resolution of crossover locations has been limited to ~150 kbthus far. In addition, in their recent work, Wang et al. relied on prior knowledge of the chromosome-level haplotype information of the analyzed individual, which is experimentally inconvenient to obtain and is currently available for only a few individuals.
Meiotic recombination creates genetic diversity and ensures segregation of homologous chromosomes. Previous population analyses yielded results averaged among individuals and affected by evolutionary pressures. In this study 99 sperm from an Asian male was sequenced by using the newly developed amplification method—multiple annealing and looping-based amplification cycles—to phase the personal genome and map recombination events at high resolution, which are non-uniformly distributed across the genome in the absence of selection pressure. The paucity of recombination near transcription start sites observed in individual sperm indicates that such a phenomenon is intrinsic to the molecular mechanism of meiosis. Interestingly, a decreased crossover frequency combined with an increase of autosomal aneuploidy is observable on a global per-sperm basis.
Article 2.3 Genome-Wide Detection of Single-Nucleotide and Copy-Number Variation of a Single Human Cell
Most tumors exhibit a level of diversity, at the cellular, histologic, and even genetic level (2). This genetic heterogeneity within a tumor has been a focus of recent research efforts to analyze the characteristics, expression patterns, and genetic differences between individual tumor cells. This genetic diversity is usually manifested as single nucleotide variations (SNV) and copy number variations (CNV), both of which provide selection pressures in both cancer and evolution.
As cancer research and personalized medicine is focused on analyzing this tumor heterogeneity it has become pertinent view the tumor as a heterogeneous population of cells instead of as a homogenous mass. In, fact, studies have suggested that cancer cell lines growing on plastic in culture, even though thought of as clonogenic, can actually display a varied degree of expression differences between neighboring cells growing on the same dish. Indeed, cancer stem cells show an asynchronous cell division, for example a parent CD133-positive cell will divide into a CD133-positive and a CD133-negative cell(3). In addition, the discovery that circulating tumor cells (a rare population of circulating cells in the blood) can be prognostic of outcome in cancer such as inflammatory breast cancer(4), it is ever more important to develop methods to analyze single cell populations.
Harvard University researchers, Dr. Chenghang Zong, Sijia Lu, Alec Chapman and Sunney Xie developed a new amplification method utilizing multiple annealing and looping-based amplification cycles (MALBAC)(1). A quasilinear preamplification process is used on pictograms of DNA genomic fragments (form 10 to 100 kb) isolated from a single cell. This is performed to reduce the bias associated with nonlinear DNA amplification. A series of random primers (which the authors termed MALBAC primers, constructed with a common sequence tags) are annealed at low temperature (0 °C). PCR rounds produce semiamplicons. Further rounds of amplification, after a step of looping the amplicons, result in full amplicons with complementary ends. When the two ends hybridize to form the looped DNA, this prevents use of this loop structure as a template, therefore leading to a close-to–linear amplification. The process allows for a higher fidelity of DNA replication and the ability to amplify a whole genome. The amplicons are then sequenced either by whole-genome sequencing methods using Sanger-sequencing to verify any single nucleotide polymorphisms. This procedure of MALBAC-amplification resulted in coverage of 85-93% of the genome of a single cell.
As proof of principle, the authors used MALBAC to amplify the DNA of single SW480 cancer cells (picked from a clonally expanded population of a heterogeneous population (the bulk DNA). Comparison of the MALBAC method versus the MDA method revealed copy number variations (CNV) between three individual cells, which had been picked from the clonally expanded pool. Their results were in agreement with karyotyping studies on the SW480 cell line. Meticulous quality controls were performed to limit contamination, high false positive rates of SNV detection due to amplification bias, and false positives due to amplification or sequencing errors.
Interestingly, the authors found 35 unique single nucleotide variations which h had occurred from 20 cell divisions from a single SW480 cancer cell. This resulted in an estimated 49 mutations which occurred in 20 generations, yielding a mutation rate of 2.5 nucleotides per generation. In addition, the authors were able to map some of these mutations on various chromosomes and perform next-gen sequencing (deep sequencing) to verify the nucleotide mutations and found an unusually high purine-pyrimidine exchange rate.
In a subsequent paper, investigators from the same group at Harvard used this technology to sequence 99 sperm cells from a single individual to study genetic diversity created during meiotic recombination, a mechanism involved in evolution and development(5).
2. Cooke, S. L., Temple, J., Macarthur, S., Zahra, M. A., Tan, L. T., Crawford, R. A., Ng, C. K., Jimenez-Linan, M., Sala, E., and Brenton, J. D. (2011) British journal of cancer104, 361-368
4. Giuliano, M., Giordano, A., Jackson, S., Hess, K. R., De Giorgi, U., Mego, M., Handy, B. C., Ueno, N. T., Alvarez, R. H., De Laurentiis, M., De Placido, S., Valero, V., Hortobagyi, G. N., Reuben, J. M., and Cristofanilli, M. (2011) Breast cancer research : BCR13, R67
5. Lu, S., Zong, C., Fan, W., Yang, M., Li, J., Chapman, A. R., Zhu, P., Hu, X., Xu, L., Yan, L., Bai, F., Qiao, J., Tang, F., Li, R., and Xie, X. S. (2012) Science338, 1627-1630
Other related posts on this website regarding Cancer and Genomics include:
Reporters: Aviva Lev-Ari, PhD, RN and Pnina G. Abir-Am, PhD
Putting Genome Interpretation to the Test
01/30/2013
Ashley Yeager
How well do methods for interpreting genome variation work? Ashley Yeager takes a look at a community experiment that is trying to assess just how useful genome interpretation tools in real-world situations.
At the American Society of Human Genetics (ASHG) conference in November 2012 in San Francisco, CA, Steven Brenner, a computational geneticist from the University of California, Berkley, stood up in front of an audience and argued that it was unlikely that a single genome interpretation tool could identify variants for an array of illnesses or phenotypic traits (1). Instead, interpretation methods would likely need to be gene-specific or tailored for precise applications.
The predictors, assessors, and observers who participated in CAGI 2011, which was held in San Francisco, CA. Source: CAGI
This figure shows the ROC curves for the prediction of patients with Crohn’s disease against the result of 1,000 random predictions, which are shown in gray. Source: CAGI
Steven Brenner helped develop CAGI to determine how well genome interpretation tools could translate to the clinic. Source: UC Berkeley
John Moult, one of the organizers of CAGI, says the challenges are giving scientists a better sense of the genome interpretation tools that currently exist. Source: University of Maryland
Brenner came to that conclusion after looking over the results of the Critical Assessment of Genome Interpretation (CAGI), a community experiment, now in its third year, challenges researchers to computationally predict the phenotypes of genetic variants. The teams then compare their results with unpublished experimental data, showing researchers and clinicians which tools can most accurately interpret large amounts of genomic sequence variation data and which ones might be reliable enough to use in the clinic. The results from the first two rounds of challenges have been clear for Brenner: most genomic interpretation tools are not reliable enough for the clinic yet.
After his talk at ASHG, several clinicians came up to him and expressed their concerns. Many had been using genome interpretation tools more generally, possibly making their conclusions less reliable. “General methods are limited in how well they will perform, which is not what people assumed before,” he says. “What that reaction showed me was that CAGI has a broad set of people that derive value from the experiment’s findings.”
Increasing Confidence
Brenner and John Moult, a computational biologist at the University of Maryland in Rockville, MD, organized the first CAGI experiment in 2010. It was a pilot project to get a better sense of the tools researchers in the community were using to study human genome variation and the phenotypic predictions coming from them. “Coming into CAGI, we had no understanding of how well methods for interpreting genome variation worked,” Brenner says. “Now, we’re starting to get a hint of what the big picture is.”
The goal was to provide a better sense of the correct level of confidence scientists and clinicians should have in the methods to predict the phenotype of sequence variants that are out there right now. “There’s a lot of uncertainty about how these methods work on real problems and so the challenges address the question of how can we test them in real-world situations,” Moult says.
In the beginning, Brenner and Moult had little idea of what to expect. The first year of the experiment was supposed to be very small, a pilot to see who would participate and what tools actually existed. In the end, the 2010 challenges drew more than 100 prediction submissions from eight countries, exceeding the organizers’ expectations.
Forty of the participants traveled to Berkeley in December 2010 to review the results. The top prize was awarded to Yana Bromberg, a bioinformatician at Rutgers University in New Jersey, for her work on interpretation software called screening for non-acceptable polymorphisms, or SNAP for short, which evaluates the effects of single amino acid substitutions on protein function (2). It was the first time Moult and Brenner had heard of SNAP.
In 2011, teams worked on 11 challenges, resulting in 117 predictions from 21 groups representing 18 countries. The challenges expanded, including exercises on exome variation and breast cancer gene variation. Again, SNAP was often one of the best interpretation tools, ranking high on several of the challenges.
One of the challenges in the second year of the experiment asked variation predictors to analyze exome sequence data from 42 Crohn’s disease patients and 6 healthy individuals. Researchers didn’t know how many of the exomes had variations associated with the disease, but many of the tools predicted the disease in patients significantly better than random. The best performing teams used an unexpected approach, looking at rare variants on a large panel of genes (1).
“The Crohn’s results were so great, we wonder if they were an artifact,” Brenner says, explaining that the CAGI organizers have included the challenge again in this year’s experiment to verify the results. If the results hold, “it could be a huge breakthrough there in interpreting genetic variation under certain circumstances,” he says.
The first year results were significant in a statistical sense, but the second year, Brenner says, “really gave us a baseline for better understanding personal genome variations and also started to show which types of interpretation methods might be best for specific applications.”
Nowhere Near
The next step would be to explain why the methods, such as SNAP, are so successful. But that requires more funding. Right now, the experiment has no direct funding, but the CAGI organizing committee does have a grant proposal to run the experiment awaiting review. The National Institutes of Health typically funds the year-end meeting where challenge participants present their results. “We’re doing this on a shoe string,” Moult says. Despite the financial pressure, Brenner and Moult feel that they have invested too much time to give up on the CAGI experiments.
The 2012 challenge deadline is March 2013, with the meeting to present the results slated for July. The delay was largely due to funding issues. But Brenner and Moult hope that the extension will allow more researchers to participate. Overall, Brenner and Moult are excited to see the results.
This year the experiment has 10 challenges, which include a test that focuses on genetic and phenotypic variation in breast cancer as well as the tried-and-true test to predict individuals’ phenotypic traits based on their genomes. The information for the personal genome analysis comes from the Personal Genome Project (PGP). “It acts as a valuable resource for diagnostics evaluations and standardization testing like CAGI,” Harvard molecular geneticist George Church said in an email, adding that the PGP has been providing data to CAGI since its first year.
But this year, there’s a change to the personal genome challenge. For the past two years, participants used the data to predict individual phenotypic traits based on a genome. But phenotypic profiles of all PGP participants are now public. “The availability of the complete profiles makes it impossible to have a valid assessment of individual trait predictions,” Brenner explains.
So instead of predicting the phenotype based on a single genome, in the 2012 challenge, the participants will develop tools that play a “matching game.” The goal will be to match 77 genomes with their corresponding phenotypic profiles, each of which includes 239 traits such as high cholesterol, diabetes, and astigmatism. And to spice things up, the organizers have included 214 phenotypic profiles that do not match any of the 77 genomes.
Ultimately, the CAGI predictors will release the PGP challenge results to those who volunteered their genomes so the individuals can learn more about their genetic susceptibilities for disease. But the reliability of the results is not necessarily high yet, Brenner cautions, so it’s important that individuals, scientists, and clinicians take that into account if someone shows a predicted high risk for cancer or other serious illnesses.
“We are nowhere near having a method for genome interpretation where a doctor could use it and then go and give surgery based on what we are saying,” Moult says. He and Brenner hope CAGI is a first step toward getting there one day.
References
CAGI: The Critical Assessment of Genome Interpretation, a community experiment to evaluate phenotype prediction. (2012). American Society for Human Genetics Conference: Poster.
Bromberg, Y. and B. Rost. 2007. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Research 35: 3823-3835.
A number of novel genes have been identified in association with a variety of endocrine phenotypes over the last few years. However, although mutations in a number of genes have been described in association with disorders such as
hypogonadotropic hypogonadism,
congenital hypopituitarism,
disorders of sex development, and
congenital hyperinsulinism,
these account for a minority of patients with these conditions, suggesting that many more genes remain to be identified.
How will these novel genes be identified? Monogenic disorders can arise as a result of genomic microdeletions or microduplications, or due to single point mutations that lead to a functional change in the relevant protein. Such disorders may also result from altered expression of a gene, and hence altered dosage of the protein. Candidate genes may be identified by utilizing naturally occurring or transgenic mouse models, and this approach has been particularly informative in the elucidation of the genetic basis of a number of disorders.
Other approaches include the identification of chromosomal rearrangements using conventional karyotyping techniques, as well as novel assays such as array comparative genomic hybridization (CGH) and single nucleotide polymorphism oligonucleotide arrays (SNP arrays). These molecular methods usually result in the identification of gross abnormalities as well as submicroscopic deletions and duplications, and eventually to the discovery of single gene defects that are associated with a particular phenotype.
However, there is no doubt that the major advances in novel gene identification will be made as a result of the sequencing of the genome of affected individuals and comparison with control data that are already available. Chip techniques allow hybridization of DNA or RNA to hundreds of thousands of probes simultaneously. Microarrays are being used for mutational analysisof human disease genes.
Complete sequencing of genomes or sequencing of exons that encode proteins (exome sequencing) is now possible, and will lead to the elucidation of the etiology of a number of human diseases in the next few years. High-throughput, high-density sequencing using microarray technology potentially offers the option of obtaining rapid, accurate, and relatively inexpensive sequence of large portions of the genome. One such technique is oligo-hybridization sequencing, which relies on the differential hybridization of target DNA to an array of oligonucleotide probes. This technique is ideally suited to the analysis of DNA from patients with defined disorders, such as disorders of sex development and retinal disease, but suffers from a relatively high false positive rate and failure to detect insertions and deletions.
It is often difficult to perform studies in humans, and so the generation of animal models may be valuable in understanding the etiology and pathogenesis of disease. A number of naturally occurring mouse models have led to the identification of corresponding candidate genes in humans, with mutations subsequently detected in human patients. More frequently, genes of interest are often deleted and lead to the generation of disease models.
In general, mouse models correlate well with human disease; however species-specific defects need to be taken into account. Additionally, the transgenic models could be used to manipulate a condition, with the potential for new therapies. The advent of conditional transgenesis has led to an exponential increase in our understanding of how the mutation of a single gene impacts on a single organ. Using technology such as inducible gene expression systems, the effect of switching on or switching off a gene at a particular stage in development can be determined.
Advances in genomics will also have a major impact on therapeutics.Micro RNAs (miRNA) are small non-coding RNAs that regulate gene expression by targeting mRNAs of protein coding genes or non-coding RNA transcripts. Micro RNAs also have an important role in developmental and physiological processes and can act as tumor suppressors or oncogenes in the ontogenesis of cancers. The use of small interfering RNA (siRNA)offers promise of novel therapies in a range of conditions, such as cystic fibrosis and Type II autosomal dominant IGHD. Elucidation of the genetic basis of disease also allows more direct targeting of therapy. For instance, children with permanent neonatal-onset diabetes mellitus (PNDM) due to mutations in SUR1 or KIR6.2 were previously treated with insulin but have now been shown to respond well to sulfonylureas,thereby allowing the cessation of insulin therapy.
Finally, we are now entering the era of pharmacogenetics when the response of an individual to various therapeutic agents may be determined by their genotype. For example, a polymorphism in the GH receptor that results in deletion of exon 3 may be associated with an improved response to GH. Thus the elucidation of the genetic basis of many disorders will aid their management, and permit the tailoring of therapy in individual patients.
Breast Cancer: Genomic profiling to predict Survival: Combination of Histopathology and Gene Expression Analysis
Reporter: Aviva Lev-Ari, PhD, RN
Article 10.3. Breast Cancer: Genomic profiling to predict Survival: Combination of Histopathology and Gene Expression Analysis
Some assays that gauge cancer-related signatures can’t factor in tissue architecture, while other assessments that are good at gauging tissue architecture, provide mostly qualitative tumor data. To reconcile these differences, researchers led by Yinyin Yuan of Cancer Research UK decided to combine histopathological and gene expression analysis to show that quantitative image analysis of the cellular environment inside tumors can bolster the ability of genomic profiling to predict survival in breast cancer patients. This approach, too, though, has its limitations.
For instance, molecular assays that gauge cancer-related signatures are challenged by their inability to factor in tissue architecture and the results are confounded by genomic information from the different types of cells inside the tumor other than cancer cells. Meanwhile, traditional histopathological assessments are good at gauging tissue architecture and differentiating cellular heterogeneity, but mostly provide qualitative tumor data and are too time consuming to be applied in large-scale studies.
Recognizing these weaknesses, researchers led by Yinyin Yuan of Cancer Research UK decided to combine histopathological and gene expression analysis to show that quantitative image analysis of the cellular environment inside tumors can bolster the ability of genomic profiling to predict survival in breast cancer patients. “All technologies have some sort of weakness. That’s why when we combined two types of assays — image and microarray — we get a more reliable readout,” Yuan says.
As they report in Science Translational Medicine, Yuan and her colleagues gathered histopathological information from hematoxylin and eosin-stained images as well as gene expression and copy-number variation data on a discovery set of 323 samples and on a validation set of 241 samples from patients with estrogen receptor-negative breast cancer. Using the discovery sample set, the investigators developed an image-processing method to differentiate the cells inside tumor samples as cancerous, lymphocytic, or stromal. They then tested this technique on the validation sample.
Once Yuan and colleagues had an accurate picture of the types of cells in the tumor samples, they used image analysis to correct copy-number data — as it is influenced by cellular heterogeneity — and developed an algorithm to determine patients’ HER2 status better than copy-number analysis can.
Using the image-processing method, the researchers stratified the discovery and validation sample sets into lymphocytic infiltration-high and lymphocytic infiltration-low groups — as past studies have suggested that high lymphocytic infiltration is linked to better patient outcomes.
When the image analysis was compared to the pathological scores of the samples, the discovery set showed no difference in patient outcomes, but the assessments disagreed with regard to the outcomes of the lymphocytic infiltration-low group in the validation cohort.
Hypothesizing that integrating the gene expression signatures and quantitative image analysis would improve survival prediction, the study investigators combined them. “The gene expression classifier had 67 percent cross-validation accuracy in predicting disease-specific deaths, the image-based classifier had 75 percent, and the integrated classifier reached 86 percent,” the study authors write.
Finally, Yuan and her colleagues applied the image analysis to develop a quantitative score that determines whether specific types of cells are tightly clustered — a high score — or are randomly scattered — a low score. In stromal cells, this approach could discern that breast cancer patients with a high or low score had a “significantly better outcome” than patients whose scores fell in the medium range.
Ultimately, Yuan and her colleagues show that their image processing avoids the biases of manual pathological assessments and accurately quantifies cellular composition and tissue architecture not accounted for by molecular tests. The researchers’ computational approach is also faster than traditional pathological techniques. “These two sets of samples can be done in a day,” Yuan says. According to the study authors, the limitation of the image processing technique is, of course, that it requires matched molecular and image data.
Turna Ray is the editor of GenomeWeb’s Pharmacogenomics Reporter. She covers pharmacogenomics, personalized medicine, and companion diagnostics. E-mail her here or follow her GenomeWeb Twitter account at @PGxReporter.
With the completion of the mapping of the human genome, we now have access to all the DNA sequence information responsible for human biology. Together with microarray technology, we are ushering in a new era in reproductive medicine—the era of Reproductive Genomics.
Whole genome microarray analysis of the testis and ovary suggests that a substantial part of the genome is expressed in reproductive tissues and many of them are likely to be important for normal reproduction. Yet adequate expression and functional information is only available for less than 10% of them. Hence, one of the important questions in reproductive studies now is ‘how do we associate function with the genes expressed in reproductive tissues?’ The establishment of mutations in animal models such as the mouse represents one powerful approach to address this question.
Animal models have played critical roles in improving our understanding of mechanisms and pathogenesis of diseases. Mouse knockout models have often provided highly needed functional validation of genes implicated in human diseases. The rapid advance of human genetics in areas such as
now allows the identification of disease-associated single nucleotide variation at a much faster pace. Functional examination of those candidate genes is needed to determine if those genes or variants are indeed involved in reproductive disease. Generating mutations in murine homologs of candidate genes represents a direct way to determine their roles, and mouse models will further allow the dissection of genetic pathways underlying the disease condition and provide models to test possible drug treatments. Thus, how to generate mouse models efficiently becomes a priority issue in the Genomics eraof Reproductive Medicine.
It is known that generating a mouse knockout is no small endeavor, even for a mouse research lab, often requiring specialized expertise and experience in
molecular biology,
embryonic stem (ES) biology and
mouse husbandry.
Therefore, it could be intimidating for people who have little experience in mouse research. Fortunately, there are some technological developments in the mouse community that make the task of generating mouse mutations less intimidating to people unfamiliar with mouse genetics. One of these developments is the effort led by the International Gene Trap Consortium (IGTC) to generate a library of mouse mutant ES cells covering most of the genes in the mouse genome. This method saves researchers the tedious and sometimes challenging tasks of making knockout vectors and screening ES cell colonies and directly provides researchers an ES cell clone carrying the mutation of the gene of interest.
Because gene trapping involves the use of different mechanisms in generating mutations from the traditional knockout method, and its efficacy in targeting reproductive genes which often are expressed in later development or adult has not been fully established, it is necessary to examine the benefits and limitations of this technology, especially in the perspective of reproductive medicine so that reproductive researchers and physicians who are interested in mouse models could become familiar with this technology.
With this in mind, we provide an overview of the gene trapping mutagenesis method and its possible application to Reproductive Medicine. We evaluate gene trapping as a method in terms of its efficiency in comparison with traditional knockout methods and use an in-house software program to screen the IGTC database for existing cell lines with possible mutations in genes expressed in various reproductive tissues. Among over seven thousand genes highly expressed in human ovaries, almost half of them have existing gene trap lines.
Additionally, from 900 human seminal fluid proteins, 43% of them have gene trap hits in their mouse homologs. Our analysis suggests gene trapping is an effective mutagenesis method for identifying the genetic basis of reproductive diseases and many mutations for important reproductive genes are already present in the database. Given the rapid growth of the number of gene trap lines, the continuing evolution of gene trap vectors, and its easy accessibility to scientific communities, gene trapping could provide a fast and efficient way of generating mouse mutation(s) for any one particular gene of interest or multiple genes involved in a pathway at the same time. Consequently, we recommend gene trapping to be considered in the planning of mouse modeling of human reproductive disease and the IGTC be the first stop for people interested in searching for and generating mouse mutations of genes of interest.
Gene trapping is a high-throughput approach of generating mutations in murine ES cells through vectors that simultaneously disrupt and report the expression of the endogenous gene at the point of insertion. First-generation vectors trapped genes that were actively transcribed in undifferentiated ES cells. Depending on the areas in which they integrate, these vectors can be roughly divided into two classes:
promoter trap vectors and
gene trap vectors.
Promoter trap vectors contain promoterless reporter regions, usually bgeo (a fusion of neomycin phosphotransferase and b-galactosidase), and thus have to be integrated into an exon of a transcriptionally active locus in order for the cell to be selected for neomycin resistance or by LacZ staining. Gene trap vectors demonstrate more utility by their added ability to integrate into an intron. These vectors contain a splice acceptor (SA) site positioned at the 50-end of the reporter gene, allowing the vector to be spliced to the endogenous gene to form a fusion transcript. Later improvements include an internal ribosomal re-entry site (IRES) between the SA site and the reporter gene sequence; as a result, the reporter gene can be translated even when it is not fused to the trapped gene. Second-generation vectors have sought to trap genes that are transcriptionally silent in ES cells. Although these vectors still contain a promoterless reporter gene with a 50 SA sequence, the antibiotic resistance gene is under the control of a constitutive promoter. Consequently, antibiotic selection is independent from the expression of the trapped gene, whereas the expression of the reporter gene is still regulated by the endogenous promoter.
A disadvantage of these vectors is that all integration events give rise to resistant ES cells regardless of whether or not the vector has integrated into a gene locus. To increase trapping efficiency, a new class of polyA gene trap vectors was developed where the polyadenylation signal of the neo gene was replaced by a splice donor sequence, thereby requiring the vector to trap an endogenous polyA signal for expression of neo. These vectors were recently shown to have a bias toward insertion near the 30-end of a gene due to nonsense-mediated mRNA decay of the fusion transcript. An improved polyA trap vector, UPATrap, was developed to overcome this bias using an IRES sequence placed downstream of a marker containing a termination codon. Gene trap vectors are usually introduced by retroviral infection or electroporation of plasmid DNA, with each approach having its own advantages and disadvantages.
While relatively difficult to manipulate, retroviral gene traps display a preference toward insertion at the 50-end of genes, which is advantageous for generating null alleles. Moreover, the multiplicity of infection with retroviruses can be tightly controlled to a single trap event or simultaneous disruption in many genes. However, there may be a possible bias integration toward certain ‘hotspots’ of the genome.
In contrast, plasmid-based gene trap vectors integrate more randomly into the genome. This can, however, potentially result in a functional partial protein and a hypomorphic phenotype. Additionally, plasmid vectors usually result in multiple integrations in 20–50% of cell lines. The most common approach for identifying the gene trap integration site is to use 50 or 30 rapid amplification of cDNA ends (RACE) to amplify the fusion transcript. The sequence provides a DNA tag for the identification of the disrupted gene and can be used for genotypic screens. Mutagenesis screens can also be performed on the basis of gene function or expression, and data from an expression sequence combined with sequence tag information can elucidate novel expression patterns of known genes or to suggest gene function.
Gene trapping has proven to be an efficacious technique in mutagenesis compared with other methods such as
spontaneous mutations,
fortuitous transgene integration and
N-ethyl-N-nitrosurea (ENU) mutagenesis
We have been able to use our SpiderGene program to identify genes in reproductive tissues that are present in the IGTC database and moreover to narrow down those with restricted expression in the testis and ovary. Gene trapping possesses an enormous potential for researchers in the reproductive field seeking to create mouse models for a gene mutation. The improving versatility of gene trap vectors has enabled groups to trap an increasing number of genes in various organisms, including Arabidopsis, Zebra fish and Drosophila.
The gene trap effort has perhaps been the most extensive in the murine genome, with over 57000 cell lines representing more than 40% of the known genome.These large-scale screens will likely achieve the trapping of the entire mouse genome in the coming years, but the power of gene trapping will only be fully demonstrated by its usefulness in investigator-driven focused functional analyses.
In our laboratory, future work will focus on generating knockout mice in order to investigate gene function and to identify gene products that might have therapeutic value in reproduction. As screening efforts continue, gene trapping will continue to be a valuable tool in mouse genomics and will undoubtedly yield new discoveries in Reproductive Physiology and Pathology.
Targeted therapies are proven approaches in Cancer and other complicated diseases. Degrees of activation of measured EGFR and ERB2/HER2 in cancer cells are thought of one of the ways to identify the scale of aggressiveness of cancer in tissues. There are drugs, mostly for breast cancer, which targets inhibition of these receptors. Lapatinib (Tykerb, GSK – see Source for other targeted drugs) is the first drug which inhibits both EGFR and ERB2/HER2 gave hope to cancer patients, especially advanced ERB2-postive or metastatic breast cancer patients. Despite of proven high efficacy, Lapatinib didn’t show promising results in clinical responses due to acquired resistance.
Komurov et. al. (Mol. Systems.Biol., 2012) used network analysis along with experimental findings on cultured human breast cancer cell lines (SKBR3) and showed that a large part of acquired resistance to Lapatinib is due to increased levels of activated states of glucose deprivation signaling network. The authors cultured ERB2-positive SKBR3 cells with increasing doses of Lapatinib, to make the control cell lines for analyzing their experimental results in comparison with (SKBR3- R),SKBR3-Resistant cells. Their Western Blot analysis showed that Lapatinib was successful to inhibit down signaling pathways to ERB2 and EGFR in both control and resistant cells however fails to induce apoptotic pathways in resistant cells when compared with the controlled cells.
To identify other factors which can influence the differential effects of Lapatinib on controlled and resistant cell lines, Komurov et. al. used a data biased random walk network analysis method called Netwalk (Komurov et. al. PLOS Comp Biol., 2010). Their method is data driven and based on comparative network analysis of gene expressions at different conditions rather than network analysis at one gene level. Their network analysis identified presence of high levels of genes which act as compensatory mechanisms for glucose deprivation (as shown in Figure 2 of the paper Komurov et. al. (2012) Figure 2). They showed validation of their network analysis findings using Western Blot analysis (as shown in Figure 3 of the paper Komurov et.al. (2012) Figure 3).
The authors’ results not only show a nice elegant way of finding new information using network analysis and experimental techniques together, but also points out an important concept which can be future of cancer therapy. Their results show that along with targeting mutated Oncogenes eg., EGFR and ERB2/HER2 as in case of Lapatinib, additional way of controlling the pathway of deprivation of glucose, can achieve better clinical responses for cancer patients with aggressive levels of cancer. Targeting glucose or pathways of glucose can be tricky, because of its ubiquitous links to many physiological functions, including metabolism. However, the levels at which these pathways need to be targeted to achieve certain positive responses at in-vitro, supported by systems biology methods, and then in-vivo studies can be informative. Moreover, targeting many parts in the network in smaller amounts, along with targeted cancer drugs, may produce interesting results.
Early in the month of September, Nature, published 30 research papers on the results found from the ambitious and one time felt risky project, named, ENCODE (Encyclopedia of DNA Elements). The results of ENCODE revealed that 80% of human genome is not “junk”, as thought before, rather act as regulatory domains for further signaling events.
When human genome was first sequenced, more than a decade ago, scientists were surprised with the low ratio of coding regions transcribing genes to the number of bases in human DNA. Out of 3 billion bases in human DNA scientists found only 21,000 genes. This unexpected finding led to few basic questions:
Why do humans have so many base pairs?
How highly regulated complex behaviors of biochemical, cellular and physiological processes can be translated to regulation at genetic levels?
ENCODE project results unveil our limited knowledge about human genome until now. Their results open up new ways of thinking human DNA and its functional domains. It also brings in huge challenges for both experimental developments and data driven computational approaches for better understanding and applications of these new findings.
To gain insight from large scale data and identifying key players from a large pool of data, Bioinformatics approaches will probably be the only way to move forward. This also means importance of developing new algorithms which will include the capability of including regulatory functions linking with gene regulation. Presently, most algorithms are targeted toward identifying genes and their connections in a linear fashion. However, regulatory domains and their functional activities might be non linear, something which will be revealed with many more experimental results in coming years.
The functional characteristics of human genome will also lead to better understanding of genetic differences between normal states and disease states. Moreover, with proper identification of functional characteristics of a particular gene regulation, drugs can be targeted with much more precision in future. However, to make success of such a complicated problem, it will require visionary design and execution of experiment and computational biology teams working together.
It is well recognized already that Bioinformatics approaches can hugely help in identifying key players in regulation of genes. However many times it is not easy to translate information at the genetic levels directly to cellular or physiological levels. Some of the main reasons are – a) the complex cross talks between proteins which lead to intracellular signaling events and b) highly non linear information sharing among receptors and ligands for extra cellular signaling processes. To achieve efficient understanding of the functional characteristics of non-coding regions of DNA in context with regulation of genes, an effort should be given to map the functional network of gene regulation to signaling pathways of protein networks. This will require development of experimental as well as computational approaches to capture genetic as well as proteomics analysis together. Furthermore, for better understanding of cellular and physiological decisions, mapping between regulations of genes and intracellular signaling pathways should be extended for dynamic analysis with time.
The extraordinary findings from ENCODE project pose many challenges in front for getting answers to many unknowns for next decade or so but also give solutions to some basic questions which have haunted scientific world for almost a decade.