ENCODE data reveals important information from Genome Wide Association Studies relevant to understanding complex genetic diseases
Author: Ritu Saxena, Ph.D.
Introduction
“The depth, quality, and diversity of the ENCODE data are unprecedented” is what was stated by John Stamatoyannopoulos, professor of genomic sciences at the University of Washington and one of the many principle investigators of ENCODE project. ENCODE (Encyclopedia of DNA elements), indeed, was an ambitious project launched as a pilot in 2003 and then expanded in 2007 for the whole genome analysis and identification of all the functional elements of the human genome. The findings were striking as they challenged the definition of “gene” and ‘the central dogma of genetics (Gene-mRNA-protein). Infact, the non-coding part that constitutes about 80% of the genome or the so-called “junk DNA” was found to contain elements crucial for gene regulation. The elements, in large part, include RNA transcripts that are not transcribed into proteins but might have a regulatory role. For detailed reading, refer to the findings published in the issue of Nature, The ENCODE Project Consortium Nature 489, 57–74 (2012) An integrated encyclopedia of DNA elements in the human genome
Key features of the data, as explained in the National Human Genome Research Institute website (National Human Genome Research Institute News feature), include comprehensive mapping of:
- Protein-coding genes — Proteins are molecules made of amino acids linked together in a specific sequence; the amino acid sequence is encoded by the sequence of DNA subunits called nucleotides that make up genes.
- Non-coding genes — Stretches of DNA that are read by the cell as if they were genes but do not encode proteins. These appear to help regulate the activity of the genome.
- Chromatin structure features — Complex physical structures made from a combination of DNA and binding proteins that make up the contents of the nucleus and affects genome function.
- Histone modifications — Histones are the proteins that make up the chromatin structures that help shape and control the genome. In addition, histone proteins can be physically modified by adding chemical groups, such as a methyl molecule, that further regulates genomic activity.
- DNA methylation — Just like histones, methyl groups can be added to DNA itself in a process called DNA methylation. Chemically attaching methyl groups to DNA physically changes the ability of enzymes to reach the DNA and thus alters the gene expression pattern in cells. Methylation helps cells “remember what they are doing” or alter levels of gene expression, and it is a crucial part of normal development and cellular differentiation in higher organisms.
- Transcription factor binding sites — Transcription factors are proteins that bind to specific DNA sequences, controlling the flow (or transcription) of genetic information from DNA to mRNA. Mapping the binding sites can help researchers understand how genomic activity is controlled.
How could ENCODE be helpful in the study of complex human diseases?
Complex diseases and Genome wide association studies (GWAS)
Coronary artery disease, type 2 diabetes and many forms of cancer are complex human diseases that have a significant genetic component. Unlike mendelian disorders that have defined loci, the genetic component of complex disorders lies in the form of genetic variations in the genome making an individual susceptible to these complex diseases.
Researchers have performed Genome-wide association studies (GWAS) of the human genome, leading to the identification of thousands of DNA variants that could be linked with complex traits and diseases. However, identifying the variants, referred to as SNPs (Single Nucleotide Polymorphisms), that actually contribute to the disease, and understanding how they exert influence on a disease has been more of a mystery.
How would ENCODE solve the puzzle?
The puzzle lies in interpreting how the SNPs found in the genome affect a person’s susceptibility to a particular trait or disease and what is the mechanism behind it. As identified in the GWAS, most variants that are associated with the phenotype of the trait or disease lie in the non-coding region of the genome. Infact, in more than 400 studies compiled in the GWAS catalog only a small minority of the trait/disease-associated SNPs occur in protein-coding regions; the large majority (89%) are in noncoding regions. These variants fall in the gene deserts that lie far from protein-coding region, similar to those where cis-regulatory modules (CRMs) are found. CRMs such as promoters and enhancers are a group of binding sites for transcription factors, and the presence of transcription factors bound to these sites is a good indicator of the potential regulatory regions.
The integrative analysis of ENCODE data has give important insights to the results of GWAS studies. Investigators have employed ENCODE data as an initial guide to discover regulatory regions in which genetic variation is affecting a complex trait. Additionally, ENCODE study when examined the SNPs from GWAS that were associated with the phenotype of the trait, found that these regions are enriched in DNase-sensitive regions i.e, lie in the function-associated DNA region of the genome as it could be bound by transcription factors affecting the regulation of gene expression. Thus, the project demonstrates that non-coding regions must be considered when interpreting GWAS results, and it provides a strong motivation for reinterpreting previous GWAS findings.
Using ENCODE Data to Interpret GWAS Results
ENCODE and predisposition to CANCER:
C-Myc, a proto-oncogene, codes for a transcripton factor, when expressed constitutively leads to uninhibited cell proliferation resulting in cancer. It has been observed that common variants within a ~1 Mb region upstream of c-Myc gene have been associated with cancers of the colon, prostate, and breast. Several SNPs have been reported in this region, that although affect the phenotype, lie in the distal cis-region of the MYC gene. Alignment of the ENCODE data in this region with the significant variants from the GWAS also reveals that key variants are found in the transcription factor occupied DNA segments mapped by this consortium. One variant rs698327, lies within a DNase hypersensitive site that is bound by several transcription factors, enhancer-associated protein p300, and contains histone modifications relative to enhancers (high H3K4me1, low H3K4me3). ENCODE data indicates that non-coding regions in the human chromosome 8q24 loci are associated with cancer and as observed in the case of c-myc gene, similar studies on cancer-related genes could help explain predisposition to cancer.
ENCODE and fetal hemoglobin expression:
Another example of the use of ENCODE data is that of gene regulation of fetal hemoglobin. Several regions were predicted via ENCODE that were involved in the regulation of fetal hemoglobin. It was found that these predicted regions are close to the SNPs in the BLC11A gene that is associated with persistent expression of fetal hemoglobin.
Future perspective
As evident from the above examples, the ENCODE data shows that genetic variants do affect regulated expression of a target gene. Recently, several research groups in the UK performed a large-scale GWAS study to determine the genetic predisposition to fracture risk. The collaborative effort, published in a recent issue of the PLoS journal, was made to identify genetic variants associated with cortical bone thickness (CBT) and bone mineral density (BMD) with data from more than 10,000 subjects. http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002745 The study generated a wealth of data including the result – identification of SNPs in the WNT16 and its adjacent gene, FAM3C were found to be relevant to CBT and BMD. ENCODE data, in this case, could be helpful in interpreting more detailed information including determining additional SNPs, the regulatory information of the genes involved and much more. Thus, it could be concluded that ENCODE data could be immensely useful in interpreting associations between disease and DNA sequences that can vary from person to person.
Sources:
Research articles–
An integrated encyclopedia of DNA elements in the human genome
A User’s Guide to the Encyclopedia of DNA Elements (ENCODE)
Genome-wide Epigenetic Data Facilitate Understanding of Disease Susceptibility Association Studies
ENCODE Project Writes Eulogy For Junk DNA
ENCODE project: In massive genome analysis new data suggests ‘gene’ redefinition
National Human Genome Research Institute News feature
Related posts–
Expanding the Genetic Alphabet and linking the genome to the metabolome
Junk DNA codes for valuable miRNAs: non-coding DNA controls Diabetes
Ritu, very nicely done. Check the bold face type in the paragraph on c-myc – it repeats info in 2nd senctence of same paragraph.
Quite an informative discussion. No small feat.
As always, Dr. Ritu’s writing style is edifying and makes an enjoyable read.
I like in particular the bold font for the critical sentence in the paragraph on Cancer.
I believe that a sentence like that should appear in each post on ENCODE addressing a specific complex disease domain.
Meg, thanks for the comment. I have made changes in the paragraph to eliminate redundancy in information.
Larry, indeed, it is no small feat. The combined analysis of ENCODE data and the GWAS could lead to some valuable information. Although the GWAS study mentioned in the PLoS paper revealed information on the Wnt16 gene, important insights could be achieved by studying the non-coding regions with respect to osteoporosis and fracture risk as emphasized by ENCODE findings. (http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002745)
Aviva, I am glad you liked the post and enjoyed reading it.
[…] ENCODE: the key to unlocking the secrets of complex genetic diseases […]
I am a college student writing a research on this fascinating subject. I found this article very informative and fascinating. Dr. Ritu Saxena, do you mind if I cite your article as a source in my research paper? I thoroughly enjoyed reading it.
Dear ‘kguest’, I am glad you found the ENCODE post informative and enjoyed reading it.The results of the ENCODE project are and would be leading to valuable interpretations that are/would be useful in complex genetic diseases.
Feel free to cite my article in the reference section of your research paper.
Thanks
Ritu
PUT IT IN CONTEXT OF CANCER CELL MOVEMENT
The contraction of skeletal muscle is triggered by nerve impulses, which stimulate the release of Ca2+ from the sarcoplasmic reticuluma specialized network of internal membranes, similar to the endoplasmic reticulum, that stores high concentrations of Ca2+ ions. The release of Ca2+ from the sarcoplasmic reticulum increases the concentration of Ca2+ in the cytosol from approximately 10-7 to 10-5 M. The increased Ca2+ concentration signals muscle contraction via the action of two accessory proteins bound to the actin filaments: tropomyosin and troponin (Figure 11.25). Tropomyosin is a fibrous protein that binds lengthwise along the groove of actin filaments. In striated muscle, each tropomyosin molecule is bound to troponin, which is a complex of three polypeptides: troponin C (Ca2+-binding), troponin I (inhibitory), and troponin T (tropomyosin-binding). When the concentration of Ca2+ is low, the complex of the troponins with tropomyosin blocks the interaction of actin and myosin, so the muscle does not contract. At high concentrations, Ca2+ binding to troponin C shifts the position of the complex, relieving this inhibition and allowing contraction to proceed.
Figure 11.25
Association of tropomyosin and troponins with actin filaments. (A) Tropomyosin binds lengthwise along actin filaments and, in striated muscle, is associated with a complex of three troponins: troponin I (TnI), troponin C (TnC), and troponin T (TnT). In (more ) Contractile Assemblies of Actin and Myosin in Nonmuscle Cells
Contractile assemblies of actin and myosin, resembling small-scale versions of muscle fibers, are present also in nonmuscle cells. As in muscle, the actin filaments in these contractile assemblies are interdigitated with bipolar filaments of myosin II, consisting of 15 to 20 myosin II molecules, which produce contraction by sliding the actin filaments relative to one another (Figure 11.26). The actin filaments in contractile bundles in nonmuscle cells are also associated with tropomyosin, which facilitates their interaction with myosin II, probably by competing with filamin for binding sites on actin.
Figure 11.26
Contractile assemblies in nonmuscle cells. Bipolar filaments of myosin II produce contraction by sliding actin filaments in opposite directions. Two examples of contractile assemblies in nonmuscle cells, stress fibers and adhesion belts, were discussed earlier with respect to attachment of the actin cytoskeleton to regions of cell-substrate and cell-cell contacts (see Figures 11.13 and 11.14). The contraction of stress fibers produces tension across the cell, allowing the cell to pull on a substrate (e.g., the extracellular matrix) to which it is anchored. The contraction of adhesion belts alters the shape of epithelial cell sheets: a process that is particularly important during embryonic development, when sheets of epithelial cells fold into structures such as tubes.
The most dramatic example of actin-myosin contraction in nonmuscle cells, however, is provided by cytokinesisthe division of a cell into two following mitosis (Figure 11.27). Toward the end of mitosis in animal cells, a contractile ring consisting of actin filaments and myosin II assembles just underneath the plasma membrane. Its contraction pulls the plasma membrane progressively inward, constricting the center of the cell and pinching it in two. Interestingly, the thickness of the contractile ring remains constant as it contracts, implying that actin filaments disassemble as contraction proceeds. The ring then disperses completely following cell division.
Figure 11.27
Cytokinesis. Following completion of mitosis (nuclear division), a contractile ring consisting of actin filaments and myosin II divides the cell in two.
http://www.ncbi.nlm.nih.gov/books/NBK9961/
This is good. I don’t recall seeing it in the original comment. I am very aware of the actin myosin troponin connection in heart and in skeletal muscle, and I did know about the nonmuscle work. I won’t deal with it now, and I have been working with Aviral now online for 2 hours.
I have had a considerable background from way back in atomic orbital theory, physical chemistry, organic chemistry, and the equilibrium necessary for cations and anions. Despite the calcium role in contraction, I would not discount hypomagnesemia in having a disease role because of the intracellular-extracellular connection. The description you pasted reminds me also of a lecture given a few years ago by the Nobel Laureate that year on the mechanism of cell division.
I actually consider this amazing blog , âSAME SCIENTIFIC IMPACT: Scientific Publishing –
Open Journals vs. Subscription-based « Pharmaceutical Intelligenceâ, very compelling plus the blog post ended up being a good read.
Many thanks,Annette