Funding, Deals & Partnerships: BIOLOGICS & MEDICAL DEVICES; BioMed e-Series; Medicine and Life Sciences Scientific Journal – http://PharmaceuticalIntelligence.com
brown adipocyte protein CIDEA promotes lipid droplet fusion
Larry H. Bernstein, MD, FCAP, Curator
LPBI
The brown adipocyte protein CIDEA promotes lipid droplet fusion via a phosphatidic acid-binding
Parker, Nicholas T Ktistakis, Ann M Dixon, Judith Klein-Seetharaman, Susan Henry, Mark Christian Dirk Dormann, Gil-Soo Han, Stephen A Jesch, George M Carman, Valerian Kagan, et al.
Maintenance of energy homeostasis depends on the highly regulated storage and release of triacylglycerol primarily in adipose tissue and excessive storage is a feature of common metabolic disorders. CIDEA is a lipid droplet (LD)-protein enriched in brown adipocytes promoting the enlargement of LDs which are dynamic, ubiquitous organelles specialized for storing neutral lipids. We demonstrate an essential role in this process for an amphipathic helix in CIDEA, which facilitates embedding in the LD phospholipid monolayer and binds phosphatidic acid (PA). LD pairs are docked by CIDEA trans-complexes through contributions of the N-terminal domain and a C-terminal dimerization region. These complexes, enriched at the LD-LD contact site, interact with the cone-shaped phospholipid PA and likely increase phospholipid barrier permeability, promoting LD fusion by transference of lipids. This physiological process is essential in adipocyte differentiation as well as serving to facilitate the tight coupling of lipolysis and lipogenesis in activated brown fat.
Evolutionary pressures for survival in fluctuating environments that expose organisms to times of both feast and famine have selected for the ability to efficiently store and release energy in the form of triacyclglycerol (TAG). However, excessive or defective lipid storage is a key feature of common diseases such as diabetes, atherosclerosis and the metabolic syndrome (1). The organelles that are essential for storing and mobilizing intracellular fat are lipid droplets (LDs) (2). They constitute a unique cellular structure where a core of neutral lipids is stabilized in the hydrophilic cytosol by a phospholipid monolayer embedding LD-proteins. While most mammalian 46 cells present small LDs (<1 Pm) (3), white (unilocular) adipocytes contain a single giant LD occupying most of their cell volume. In contrast, brown (multilocular) adipocytes hold multiple LDs of lesser size, increasing the LD surface/volume ratio which facilitates the rapid consumption of lipids for adaptive thermogenesis (4).
The exploration of new approaches for the treatment of metabolic disorders has been stimulated by the rediscovery of active brown adipose tissue (BAT) in adult humans (5, 6) and by the induction of multilocular brown-like cells in white adipose tissue (WAT) (7). The multilocular morphology of brown adipocytes is a defining characteristic of these cells along with expression of genes such as Ucp1. The acquisition of a unilocular or multilocular phenotype is likely to be controlled by the regulation of LD growth. Two related proteins, CIDEA and CIDEC promote LD enlargement in adipocytes (8-10), with CIDEA being specifically found in BAT. Together with CIDEB, they form the CIDE (cell death-inducing DFF45-like effector) family of LD-proteins, which have emerged as important metabolic regulators (11).
Different mechanisms have been proposed for LD enlargement, including in situ neutral lipid synthesis, lipid uptake and LD-LD coalescence (12-14). The study of CIDE 62 proteins has revealed a critical role in the LD fusion process in which a donor LD progressively transfers its content to an acceptor LD until it is completely absorbed (15). However, the underlying mechanism by which CIDEC and CIDEA facilitate the interchange of triacylglycerol (TAG) molecules between LDs is not understood. In the present study, we have obtained a detailed picture of the different steps driving this LD enlargement process, which involves the stabilization of LD pairs, phospholipid binding, and the permeabilization of the LD monolayer to allow the transference of lipids.
CIDEA expression mimics the LD dynamics observed during the differentiation of brown adipocytes
Phases of CIDEA activity: LD targeting, LD-LD docking and LD growth
A cationic amphipathic helix in C-term drives LD targeting
The amphipathic helix is essential for LD enlargement
LD-LD docking is induced by the formation of CIDEA complexes
CIDEC differs from CIDEA in its dependence on the N-term domain
CIDEA interacts with Phosphatidic Acid
PA is required for LD enlargement
The Cidea gene is highly expressed in BAT, induced in WAT following cold exposure (46), and is widely used by researchers as a defining marker to discriminate brown or brite adipocytes from white adipocytes (7, 28). As evidence indicated a key role in the LD biology (47) we have characterized the mechanism by which CIDEA promotes LD enlargement, which involves the targeting of LDs, the docking of LD pairs and the transference of lipids between them. The lipid transfer step requires the interaction of CIDEA and PA through a cationic amphipathic helix. Independently of PA-binding, this helix is also responsible for anchoring CIDEA in the LD membrane. Finally, we demonstrate that the docking of LD pairs is driven by the formation of CIDEA complexes involving the N-term domain and a C-term interaction site.
CIDE proteins appeared during vertebrate evolution by the combination of an ancestor N-term domain and a LD-binding C-term domain (35). In spite of this, the full process of LD enlargement can be induced in yeast by the sole exogenous expression of 395 CIDEA, indicating that in contrast to SNARE-triggered vesicle fusion, LD fusion by lipid transference does not require the coordination of multiple specific proteins (48). Whereas vesicle fusion implicates an intricate restructuring of the phospholipid bilayers, LD fusion is a spontaneous process that the cell has to prevent by tightly controlling their phospholipid composition (23). However, although phospholipid-modifying enzymes have been linked with the biogenesis of LDs (49, 50), the implication of phospholipids in physiologic LD fusion processes has not been previously described.
Complete LD fusion by lipid transfer can last several hours, during which the participating LDs remain in contact. Our results indicate that both the N-term domain and a C-term dimerization site (aa 126-155) independently participate in the docking of LD pairs by forming trans interactions (Fig. 7). Certain mutations in the dimerization sites that do not eliminate the interaction result in a decrease on the TAG transference efficiency, reflected on the presence of small LDs docked to enlarged LDs. This suggests that in addition to stabilizing the LD-LD interaction, the correct conformation of the 409 CIDEA complexes is necessary for optimal TAG transfer. Furthermore, the formation of stable LD pairs is not sufficient to trigger LD fusion by lipid transfer. In fact, although LDs can be tightly packed in cultured adipocytes, no TAG transference across neighbour LDs is observed in the absence of CIDE proteins (15), showing that the phospholipid monolayer acts as a barrier impermeable to TAG. Our CG-MD simulations indicate that certain TAG molecules can escape the neutral lipid core of the LD and be integrated within the aliphatic chains of the phospholipid monolayer. This could be a transition state 416 prior to the TAG transference and our data indicates that the docking of the amphipathic helix in the LD membrane could facilitate this process. However, the infiltrated TAGs in LD membranes in the presence of mutant helices, or even in the absence of docking, suggests that this is not enough to complete the TAG transference.
To be transferred to the adjacent LD, the TAGs integrated in the hydrophobic region of the LD membrane should cross the energy barrier defined by the phospholipid polar heads, and the interaction of CIDEA with PA could play a role in this process, as suggested by the disruption of LD enlargement by the mutations preventing PA-binding (K167E/R171E/R175E) and the inhibition of CIDEA after PA depletion. The minor effects observed with more conservative substitutions in the helix, suggests that the presence of positive charges is sufficient to induce TAG transference by attracting anionic phospholipids present in the LD membrane. PA, which requirement is indicated by our PA-depletion experiments, is a cone-shaped anionic phospholipid which could locally destabilize the LD monolayer by favoring a negative membrane curvature incompatible with the spherical LD morphology (51). Interestingly, while the zwitterion PC, the main component of the monolayer, stabilizes the LD structure (23), the negatively charged PA promote their coalescence (29). This is supported by our CD-MD results which resulted in a deformation of the LD shape by the addition of PA. We propose a model in which the C-term amphipathic helix positions itself in the LD monolayer and interacts with PA molecules in its vicinity, which might include trans interactions with PA in the adjacent LD. The interaction with PA disturbs the integrity of the phospholipid barrier at the LD-LD interface, allowing the LD to LD transference of TAG molecules integrated in the LD membrane (Fig. 7). Additional alterations in the LD composition could be facilitating TAG transference, as differentiating adipocytes experience a reduction in saturated fatty acids in the LD phospholipids (52), and in their PC/PE ratio (53) which could increase the permeability of the LD membranes, and we previously observed that a change in the molecular structures of TAG results in an altered migration pattern to the LD surface (32).
During LD fusion by lipid transfer, the pressure gradient experienced by LDs favors TAG flux from small to large LDs (15). However, the implication of PA, a minor component of the LD membrane, could represent a control mechanism, as it is plausible that the cell could actively influence the TAG flux direction by differently regulating the levels of PA in large and small LDs, which could be controlled by the activity of enzymes such as AGPAT3 and LIPIN-1J (13, 30). This is a remarkable possibility, as a switch in the favored TAG flux direction could promote the acquisition of a multilocular phenotype and facilitate the browning of WAT (24). Interestingly, Cidea mRNA is the LD protein- encoding transcript that experiences the greatest increase during the cold-induced process by which multilocular BAT-like cells appear in WAT (24). Furthermore, in BAT, cold exposure instigates a profound increase in CIDEA protein levels that is independent of transcriptional regulation (54). The profound increase in CIDEA is coincident with elevated lipolysis and de novo lipogenesis that occurs in both brown and white adipose tissues after E-adrenergic receptor activation (55). It is likely that CIDEA has a central role in coupling these processes to package newly synthesized TAG in LDs for subsequent lipolysis and fatty acid oxidation. Importantly, BAT displays high levels of glycerol kinase activity (56, 57) that facilitates glycerol recycling rather than release into the blood stream, following induction of lipolysis (58), which occurs in WAT. Hence, the reported elevated glycerol released from cells depleted of CIDEA (28) is likely to be a result of decoupling lipolysis from the ability to efficiently store the products of lipogenesis in LDs and therefore producing a net increase in detected extracellular glycerol. This important role of CIDEA is supported by the marked depletion of TAG in the BAT of Cidea null mice following overnight exposure to 4 °C (28) and our findings that CIDEA-dependent LD enlargement is maintained in a lipase negative yeast strain.
Cidea and the genes that are required to facilitate high rates of lipolysis and lipogenesis are associated with the “browning” of white fat either following cold exposure (46) or in genetic models such as RIP140 knockout WAT (59). The induction of a brown- like phenotype in WAT has potential benefits in the treatment and prevention of metabolic disorders (60). Differences in the activity and regulation of CIDEC and CIDEA could also be responsible for the adoption of unilocular or multilocular phenotypes. In addition to their differential interaction with PLIN1 and 5, we have observed that CIDEC is more resilient to the deletion of the N-term than CIDEA, indicating that it may be less sensitive to regulatory posttranslational modifications of this domain. This robustness of CIDEC activity together with its potentiation by PLIN1, could facilitate the continuity of the LD enlargement in white adipocytes until the unilocular phenotype is achieved. In contrast, in brown adipocytes expressing CIDEA the process would be stopped at the multilocular stage for example due to post-translational modifications that modulate the function or stability of the protein or alteration of the PA levels in LDs.
CRACKING THE CODE OF HUMAN LIFE: Recent Advances in Genomic Analysis and Disease – Part IIC
Author: Larry H. Bernstein, MD, FCAP, Triplex Medical Science
Article 1.4 CRACKING THE CODE OF HUMAN LIFE: Recent Advances in Genomics Analysis and Disease – Part IIC
Part I: The Initiation and Growth of Molecular Biology and Genomics – Part I From Molecular Biology to Translational Medicine: How Far Have We Come, and Where Does It Lead Us?
Part IIB. “CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics” lays the manifold multivariate systems analytical tools that has moved the science forward to a groung that ensures clinical application.
Part IIC. “CRACKING THE CODE OF HUMAN LIFE: Recent Advances in Genomic Analysis and Disease “ will extend the discussion to advances in the management of patients as well as providing a roadmap for pharmaceutical drug targeting.
This final paper of Part II concludes a thorough review of the scientific events leading to the discovery of the human genome, the purification and identification of the components of the chromosome and the DNA structure and role in regulation of embryogenesis, and potential targets for cancer.
The first two articles, Part IIA, Part IIB, go into some depth to elucidate the problems and breakthoughs encountered in the Human Genome Project, and the construction of a 3-D model necessary to explain interactions at a distance.
Part IIC, the final article, is entirely concerned with clinical application of this treasure trove of knowledge to resolving diseases of epigenetic nature in the young and the old, chronic inflammatory diseases, autoimmune diseases, infectious disease, gastrointestinal disorders, neurological and neurodegenerative diseases, and cancer.
Recently, large studies have identified some of the genetic basis for important common diseases such as heart disease and diabetes, but most of the genetic contribution to them remains undiscovered. Now researchers at the University of Massachusetts Amherst led by biostatistician Andrea Foulkes have applied sophisticated statistical tools to existing large databases to reveal substantial new information about genes that cause such conditions as high cholesterol linked to heart disease.
Foulkes says, “This new approach to data analysis provides opportunities for developing new treatments.” It also advances approaches
to identifying people at greatest risk for heart disease. Another important point is that our method is straightforward to use with freely
available computer software and can be applied broadly to advance genetic knowledge of many diseases.
The new analytical approach she developed with cardiologist Dr. Muredach Reilly at the University of Pennsylvania and others is called “Mixed modeling of Meta-Analysis P-values” or MixMAP. Because it makes use of existing public databases, the powerful new method
represents a low-cost tool for investigators.
MixMAP draws on a principled statistical modeling framework and the vast array of summary data now available from genetic association
studies to formally test at a new, locus-level, association.
While that traditional statistical method looks for one unusual “needle in a haystack” as a possible disease signal, Foulkes and colleagues’
new method uses knowledge of DNA regions in the genome that are likely to
contain several genetic signals for disease variation clumped together in one region.
Thus, it is able to detect groups of unusual variants rather than just single SNPs, offering a way to “call out” gene
regions that have a consistent signal above normal variation.
The LPA gene codes for apolipoprotein(a), which, when linked with low-density lipoprotein particles, forms lipoprotein(a) [Lp(a)] —
a well-studied molecule associated with coronary artery disease (CAD). The Lp(a) molecule has both atherogenic and thrombogenic effects in vitro , but the extent to which these translate to differences in how atherothrombotic disease presents is unknown.
LPA contains many single-nucleotide polymorphisms, and 2 have been identified by previous groups as being strongly associated with
levels of Lp(a) and, as a consequence, strongly associated with CAD.
However, because atherosclerosis is thought to be a systemic disease, it is unclear to what extent Lp(a) leads to atherosclerosis in other arterial beds (eg, carotid, abdominal aorta, and lower extremity),
as well as to other thrombotic disorders (eg, ischemic/cardioembolic stroke and venous thromboembolism).
Such distinctions are important, because therapies that might lower Lp(a) could potentially reduce forms of atherosclerosis beyond the coronary tree.
To answer this question, Helgadottir and colleagues compiled clinical and genetic data on the LPA gene from thousands of previous
participants in genetic research studies from across the world. They did not have access to Lp(a) levels, but by knowing the genotypes for
2 LPA variants, they inferred the levels of Lp(a) on the basis of prior associations between these variants and Lp(a) levels. [1]
Their studies included not only individuals of white European descent but also a significant proportion of black persons, in order to
widen the generalizability of their results.
Their main findings are that LPA variants (and, by proxy, Lp(a) levels) are associated with
CAD,
peripheral arterial disease,
abdominal aortic aneurysm,
number of CAD vessels,
age at onset of CAD diagnosis, and
large-artery atherosclerosis-type stroke.
They did not find an association with
cardioembolic or small-vessel disease-type stroke;
intracranial aneurysm;
venous thrombosis;
carotid intima thickness; or,
in a small subset of individuals, myocardial infarction.
English: Structure of the LPA protein. Based on PyMOL rendering of PDB 1i71. (Photo credit: Wikipedia)
Micrograph of an artery that supplies the heart with significant atherosclerosis and marked luminal narrowing. Tissue has been stained using Masson’s trichrome. (Photo credit: Wikipedia)
Scientists at the Gladstone Institutes have revealed the precise order and timing of hundreds of genetic “switches” required to construct a fully
functional heart from embryonic heart cells — providing new clues into the genetic basis for some forms of congenital heart disease.
In a study being published online today in the journal Cell, researchers in the laboratory of Gladstone Senior Investigator Benoit Bruneau, PhD,
employed stem cell technology, next-generation DNA sequencing and computing tools to piece together the instruction manual, or “genomic
blueprint” for how a heart becomes a heart. These findings offer renewed hope for combating life-threatening heart defects such as arrhythmias (irregular heart beat) and ventricular septal defects (“holes in the heart”).
They approach heart formation with a wide-angle lens by
looking at the entirety of the genetic material that gives heart cells their unique identity.
The news comes at a time of emerging importance for the biological process called “epigenetics,” in which a non-genetic factor impacts a cell’s genetic
makeup early during development — but sometimes with longer-term consequences. All of the cells in an organism contain the same DNA, but the
epigenetic instructions encoded in specific DNA sequences give the cell its identity. Epigenetics is of particular interest in heart formation, as the
incorrect on-and-off switching of genes during fetal development can lead to congenital heart disease — some forms of which may not be apparent until adulthood.
the scientists took embryonic stem cells from mice and reprogrammed them into beating heart cells by mimicking embryonic development in a petri dish. Next, they extracted the DNA from developing and mature heart cells, using an advanced gene-sequencing technique called ChIP-seq that lets scientists “see” the epigenetic signatures written in the DNA.
Map of Heart Disease Death Rates in US White Males from 2000-2004 (Photo credit: Wikipedia)
Estimated propability of death or non-fatal myocardial-infarction over one year corresponding ti selectet values of the individual scores. Ordinate: individual score, abscissa: Propability of death or non-fatal myocardial infarction in 1 year (in %) (Photo credit: Wikipedia)
simply finding these signatures was only half the battle — we next had to decipher which aspects of heart formation they encoded
To do that, we harnessed the computing power of the Gladstone Bioinformatics Core. This allowed us to take the mountains of data collected from
gene sequencing and organize it into a readable, meaningful blueprint for how a heart becomes a heart.”
For each of the above datasets, an upstream analysis from the identified transcription factors correctly identified the stimulus. IPA’s tools were very
easy to use and the
analysis time for the above experiments was less than one minute.
The performance, speed, and ease of use can only be characterized as very good, perhaps leading to breakthroughs when extended and used creatively. Ingenuity’s new transcription factor analysis tool in IPA, coupled with Ingenuity’s established upstream grow tools, should be strongly considered for every lab analyzing differential expression data.
NF-E2-related factor 2 (Nrf2) is an important transcription factor that
activates the expression of cellular detoxifying enzymes.
Nrf2 expression is largely regulated through the association of Nrf2 with Kelch-like ECH-associated protein 1 (Keap1), which
results in cytoplasmic Nrf2 degradation.
Conversely, little is known concerning the regulation of Keap1 expression. Until now, a regulatory role for microRNAs (miRs) in controlling Keap1 gene expression had not been characterized. By using miR array-
based screening, we observed miR-200a silencing in breast cancer cells and
demonstrated that upon re-expression, miR-200a
targets the Keap1 3′-untranslated region (3′-UTR), leading to Keap1 mRNA degradation. Loss of this regulatory mechanism may
contribute to the dysregulation of Nrf2 activity in breast cancer. Previously, we have identified epigenetic repression of miR-200a
in breast cancer cells. Here, we find that treatment with epigenetic therapy, the histone deacetylase inhibitor suberoylanilide hydroxamic acid, restored miR-200a expression and reduced Keap1 levels. This reduction in Keap1 levels corresponded with
Nrf2 nuclear translocation
and activation of Nrf2-dependent NAD(P)H-quinone oxidoreductase 1 (NQO1) gene transcription.
Moreover, we found that Nrf2 activation inhibited the anchorage-independent growth of breast cancer cells. Finally, our in vitro observations were confirmed in a model of carcinogen-induced mammary hyperplasia in vivo. In conclusion, our study demonstrates
that miR-200a regulates the Keap1/Nrf2 pathway in mammary epithelium, and we find that epigenetic therapy can restore miR-200a
regulation of Keap1 expression,
reactivating the Nrf2-dependent antioxidant pathway in breast cancer.
Nuclear factor-like 2 (erythroid-derived 2, also known as NFE2L2 or Nrf2, is a transcription factor that in humans is encoded by the NFE2L2 gene.[1]) NFE2L2 induces the expression of various genes including those that encode for several antioxidant enzymes, and it may play a physiological role in the regulation of oxidative stress. Investigational drugs that target NFE2L2 are of interest as potential therapeutic interventions for
oxidative-stress related pathologies.
4. Highly active zinc finger nucleases by extended modular assembly
Zinc finger nucleases (ZFNs) are important tools for genome engineering. Despite intense interest by many academic groups,
the lack of robust non-commercial methods has hindered their widespread use. The modular assembly (MA) of ZFNs from
publicly-available one-finger archives provides a rapid method to create proteins that can recognize a very broad spectrum of DNA sequences.
However, three- and four-finger arrays often fail to produce active nucleases. Efforts to improve the specificity of the one-finger archives have not increased the success rate above 25%, suggesting that the MA method might
be inherently inefficient due to its insensitivity to context-dependent effects.
Here we present the first systematic study on the effect of array length on ZFN activity. ZFNs composed of six-finger MA arrays produced mutations at 15 of 21 (71%) targeted
loci in human and mouse cells. A novel Drop-Out Linker scheme was used to rapidly assess three- to six-finger combinations,
demonstrating that shorter arrays could improve activity in some cases. Analysis of 268 array variants revealed that half of
MA ZFNs of any array composition that exceed an ab initio
B-score cut-off of 15 were active.
MA ZFNs are able to target more DNA sequences with higher success rates than other methods.
These insightful reviews are based on the strategic data and insights from Thomson Reuters Cortellis™ for Competitive Intelligence. (A Review of April-June 2012).
The majority of diseases are complex and multi-factorial, involving multiple genes interacting with environmental factors. At the genetic level,
information from genome-wide association studies that elucidate common patterns of genetic variation across various human populations,
in addition to profiling, technologies can be utilized in discovery research to provide snapshots of genes and expression profiles that are controlled
by the same regulatory mechanism and are altered between healthy and diseased states.
The characterization of genes that are abnormally expressed in disease tissues could further be employed as
diagnostic markers,
prognostic indicators of efficacy and/or toxicity, or as
targets for therapeutic intervention.
As the defining catalyst that exponentially paved the way for personalized medicine, information from the published genome sequence revealed that much of the genetic variations in humans are concentrated in about 0.1 percent of the over 3 billion base pairs in the haploid DNA. Most of these variations involve substitution of a single nucleotide for another at a given location in the genetic sequence, known as single nucleotide polymorphism (SNP).
Combinations of linked SNPs aggregate together to form haplotypes and
together these serve as markers for locating genetic variations in DNA sequences.
SNPs located within the protein-coding region of a gene or within the control regions of DNA that regulate a gene’s activity could
have a substantial effect on the encoded protein and thus influence phenotypic outcomes.
Analyzing SNPs between patient population cohorts could highlight specific genotypic variations which can be correlated with specific phenotypic variations in disease predisposition and drug responses.
Prior to the genomic revolution, many of the established therapies were directed against less than 500 drug targets, with many of the top selling drugs acting on well defined protein pathways. However, the sequencing of the human genome has massively expanded the pool of molecular targets that could be exploited in unmet medical needs and currently, of the approximately 22,300 protein-coding genes in the human code, it has been estimated that up to 3000 are druggable. Furthermore, genomic technologies such as
high-throughput sequencing
and transcription profiling,
can be used to identify and validate biologically relevant target molecules, or can be applied to cell-based and mice disease models or directly to in vivo human tissues,
helping to correlate gene targets with phenotypic traits of complex diseases.
This is particularly important, as
insufficient validation of target gene/proteins in complex diseases may be a contributing factor in the decline in R&D productivity.
Personalized medicine no doubt is already having a tremendous impact on drug development pipelines. According to a study conducted by the Tufts Center for the Study of Drug Development, more than 90 percent of biopharmaceutical companies now utilize at least some
genomics-derived targets in their drug discovery programs.
However, pipeline analysis from Cortellis for Competitive Intelligence suggests that there is still a scientific gap that has resulted in difficulty optimizing these novel genomic targets into the clinical R&D portfolios of major pharmaceutical companies, particularly outside the oncology field. Selected examples of personalized medicine product candidates in clinical development include (see TABLE 4).
Mutations in Melanomaare in regions that control genes, not in the genes themselves. The mutations are exactly the type caused by exposure to ultraviolet light. The findings are reported in two papers in http://Science.com/ScienceExpress/
The findings do not suggest new treatments, but they help explain how melanomas – and possibly – other cancers – develop and what drives their growth. This is a modification found in the “dark matter”, according to Dr. Levi A. Garraway, the 99 percent of DNA in a region that regulates genes. A small control region was mutated in 7 out of 10 of the tumors, commonly of one or two tiny changes. A German Team led by Rajiv Kumar (Heidelberg) and Dirk Schadendorf (Essen) looked at a family whose members tended to get melanomas. Their findings indicate that those inherited with the mutations might be born with cells that have taken the first step toward cancer.
The mutations spur cells to make telomerase, that keeps the cells immortal by preventing them from losing the ends of their chromosome, the telomere. Abundant telomerase occurs in 90 percent of cancers, according to Immaculata De Vivo at Harvard Medical School.
The importance of the findings is that the mechanism of telomerase involvement in cancer is now within view. But it is not clear how to block the telomerase production in cancer cells.
A slight mutation in the matched nucleotides can lead to chromosomal aberrations and unintentional genetic rearrangement. (Photo credit: Wikipedia)
Comment
This discussion addresses the issues raised about the direction to follow in personalized medicine. Despite the amount of work necessary to bring the clarity that is sought after, the experiments and experimental design is most essential.
The arrest of ciliogenesis in ovarian cancer cell lines compared to wild type (WT) ovarian epithelial cells, and
The link to suppressing ciliogenesis by AURA protein and CHFR at the base of the cilium, which disappears at mitosis or with proliferation.
There is no accumulation by upregulation of PDGF under starvation by the cancer cells compared to the effect in WT OSE.
Here we have a systematic combination of signaling events tied to changes in putative biomarkers that occur synchronously in Ov cancer cell lines.
These changes are identified with changes in
proliferation,
loss of ciliary structure, and
proliferation.
In this described scenario,
WT OSE cells would be arrested, and
it appears that they would take the path to apoptosis (under starvation).
Even without more information, this cluster is what one wants to have in a “syndromic classification”. The information used to form the classification entails the identification of strong‘signaling-related’ biomarkers. The Gli2 peptide has to be part of this.
In principle, a syndromic classification would be ideally expected to have no less than 64 classes. If the classification is “weak”, then the class frequencies would be close to what one would expect in the WT OSE. In this case, in reality,
several combinatorial classes would have low frequency, and
others would be quite high.
This obeys the classification rules established by feature identification, and the information gain described by Solomon Kullback and extended by Akaike.
Does this have to be the case for all different cancer types? I don’t think so. The cells are different in ontogenesis. In this case, even the WT OSE have mesenchymal features and so, are not fully directed to epithelial expression. This happens to be the case in actual anatomic expression of the ovary. On the other hand, one would expect shared features of the
ovary,
testes,
thyroid,
adrenals, and
pituitary.
There is biochemical expression in terms of their synthetic function – TPN organs. I would have to put the liver into that broad class. Other organs – skeletal muscle & heart – transform substrate into energy or work. (Where you might also put intestinal smooth muscle).
They have to have different biomarker expressions, even though they much less often don’t form neoplasms. (Bone is not just a bioenergetic force. It is maintained by muscle action. It forms sarcomas. But there has to be a balance between bone removal by osteoclasts and refill by osteoblasts.)
Viewpoint: What we have learned
The Watson-Crick model proposed in 1953 is limited for explaining fully genome effects
The Pauling triplex model may have been prescient because of a more full anticipation of molecular bonding variants
A more adequate triple-helix model has been proposed and is consistent with a compact genome in the nucleus
The structure of the genome is not as we assumed – based on the application of Fractal Geometry. Current body of evidence is building that can reveal a more complete view of genome function.
transcription
cell regulation
mutations
Summary
I have just completed a most comprehensive review of the Human Genome Project. There are key research collaborations, problems in deciphering the underlying structure of the genome, and there are also both obstacles and insights to elucidating the complexity of the final model.
This is because of frequent observations of molecular problems in folding and other interactions between nucleotides that challenge the sufficiency of the original DNA model proposed by Watson and Crick. This has come about because of breakthrough innovation in technology and in computational methods.
Radoslav Bozov •
Molecular biology and growth was primarily initiated on biochemical structural paradigms aiming to define functional spatial dynamics of molecules via assignation of various types of bondings – covalent and non-covalent – hydrogen, ionic , dipole-dipole, hydrophobic interactions.
Lab techniques based on z/m paradigm allowed separation, isolation and identification of bio substances with a general marker identity finding correlation between physiological/cellular states.
The development of electronic/x-ray technologies allowed zooming in nano space without capturing time.
NMR technology identified the existence of space topology of initial and final atomic states giving a highly limited light on time – energy axis of atomic interactions.
Sequence technology and genomic perturbations shed light on uncertainty of genomic dynamics and regulators of functional ever expanding networks.
Transition state theory coupled to structural complexity identification and enzymatic mechanisms ran up parallel to work on various phenomena of strings of nucleotides (oligomers and polymers) – illusion/observation of constructing models on the dynamics of protein-dna-rna interference.
The physical energetic constrains of biochemistry were inapplicable in open biological systems. Biologists have accepted observation as a sole driver towards re-evaluating models.
The separation of matter and time constrains emerged as deviation of energy and space constrains transforming into the full acceptance of code theory of life. One simple thing was left unnoticed over time –
the amount of information of quantum matter within a single codon is larger than that of a single amino acid. This violated all physical laws/principles known to work with a limited degree of certainty.
The limited amount of information analyzed by conventional sequence identity led to the notion of applicability of statistical measures of and PCR technology. Mutations were identified over larger scale of data.
Quantum chemistry itself is being limited due discrete space/energy constrains, thus it transformed into concepts/principles in biology that possess highly limited physical values whatsoever.
The central dogma is partially broken as a result of
regulatory constrains
epigenetic phenomena and
iRNA.
Large scale code computational data run into uncertainty of the processes of evolution and its consequence of signaling transformation. All drugs were ‘lucky based’ applicability and/or discovery with largely unpredictable side effect over time.
Other Related articles on this Open Access Online Sceintific Journal include the following:
In a three part series: Part IIA. CRACKING THE CODE OF HUMAN LIFE: Milestones along the Way Part IIB. CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics Part IIC. CRACKING THE CODE OF HUMAN LIFE: Recent Advances in Genomic Analysis and Disease
Part III will conclude with Ubiquitin, it’s Role in Signaling and Regulatory Control. Part I reviewed the huge expansion of the biological research enterprise after the Second World War. It concentrated on the
discovery of cellular structures,
metabolic function, and
creation of a new science of Molecular Biology.
Part II follows the race to delineation of the Human Genome, discovery methods and fundamental genomic patterns that are ancient in both animal and plant speciation. But it explores both the complexity and the systems view of the architecture that underlies and understanding of the genome.
These articles review a web-like connectivity between inter-connected scientific discoveries, as significant findings have led to novel hypotheses and many expectations over the last 75 years. This largely post WWII revolution has driven our understanding of biological and medical processes at an exponential pace owing to successive discoveries of
chemical structure,
the basic building blocks of DNA and proteins,
nucleotide and protein-protein interactions,
protein folding, allostericity,
genomic structure,
DNA replication,
nuclear polyribosome interaction, and
metabolic control.
In addition, the emergence of methods for
copying,
removal,
insertion,
improvements in structural analysis
developments in applied mathematics that have transformed the research framework.
Part IIA:
CRACKING THE CODE OF HUMAN LIFE:
Milestones along the Way
A NOVA interview with Francis Collins (NHGRI) (FC), J. Craig Venter (CELERA)(JCV), and Eric Lander (EL). RK: For the past ten years, scientists all over the world have been painstakingly trying to read the tiny instructions buried inside our DNA. And now, finally, the “Human Genome” has been decoded. EL: The genome is a storybook that’s been edited for a couple billion years. The following will address the odd similarity of genes between man and yeast
EL: In the nucleus of your cell the DNA molecule resides that is about 10 angstroms wide curled up, but the amount of curling is limited by the negative charges that repel one another, but there are folds upon folds. If the DNA is stretched the length of the DNA would be thousands of feet. EL: We have known for 2000 years that your kids look a lot like you. Well it’s because you must pass them instructions that give them the eyes, the hair color, and the nose shape they have. RK: Cracking the code of those minuscule differences in DNA that influence health and illness is what the Human Genome Project is all about. Since 1990, scientists all over the world have been involved in the effort to read all three billion As, Ts, Gs, and Cs of human DNA. It took 10 years to find the one genetic mistake that causes cystic fibrosis. Another 10 years to find the gene for Huntington’s disease. Fifteen years to find one of the genes that increase the risk for breast cancer. One letter at a time, painfully slowly… And then came the revolution. In the last ten years the entire process has been computerized. The computations can do a thousand every second and that has made all the difference. EL: This is basically a parts list with a lot of parts. If you take an airplane, a Boeing 777, I think it has like 100,000 parts. If I gave you a parts list for the Boeing 777 in one sense you’d know 100,000 components, screws and wires and rudders and things like that. But you wouldn’t know how to put it together, or why it flies. We now have a parts list, and that’s not enough to understand why it flies.
The Human Genome (Photo credit: dullhunk)
A Quest For Clarity
Tracy Vence is a senior editor of Genome Technology Tracy Vence @GenomeTechMag Projects supported by the US National Institutes of Health will have produced 68,000 total human genomes — around 18,000 of those whole human genomes — through the end of this year, National Human Genome Research Institute estimates indicate. And in his book, The Creative Destruction of Medicine, the Scripps Research Institute’s Eric Topol projects that 1 million human genomes will have been sequenced by 2013 and 5 million by 2014. Daniel MacArthur, a group leader in Massachusetts General Hospital’s Analytic and Translational Genetics Unit estimates that “From a capacity perspective … millions of genomes are not that far off. If you look at the rate that we’re scaling, we can certainly achieve that.” The prospect of so many genomes has brought clinical interpretation into focus. But there is an important distinction to be made between the interpretation of an apparently healthy person’s genome and that of an individual who is already affected by a disease. In an April Science Translational Medicine paper, Johns Hopkins University School of Medicine‘s Nicholas Roberts and his colleagues reported that personal genome sequences for healthy monozygotic twin pairs are not predictive of significant risk for 24 different diseases in those individuals. The researchers concluded that whole-genome sequencing was not likely to be clinically useful. Ambiguities have clouded even the most targeted interpretation efforts.
Technological challenges,
meager sample sizes,
a need for increased,
fail-safe automation and most important
a lack of community-wide standards for the task.
have hampered researchers’ attempts to reliably interpret the clinical significance of genomic variation.
How signals from the cell surface affect transcription of genes in the nucleus.
James Darnell, Jr., MD, Astor Professor, Rockefeller After graduation from Washington University School of Medicine he worked with Francois Jacob at the Pasteur Institute in Paris and served as Vice President for Academic Affairs at Rockefeller in 1990-91. He is the coauthor with S.E. Luria of General Virology and the founding author with Harvey Lodish and David Baltimore of Molecular Cell Biology, now in its sixth edition. His book RNA, Life’s Indispensable Molecule was published in July 2011 by Cold Spring Harbor Laboratory Press. A member of the National Academy of Sciences since 1973, recipient of numerous awards, including the 2003 National Medal of Science, the 2002 Albert Lasker Award. Using interferon as a model cytokine, the Darnell group discovered that cell transcription was quickly changed by binding of cytokines to the cell surface. The bound interferon led to the tyrosine phosphorylation of latent cytoplasmic proteins now called STATs (signal transducers and activators of transcription) that dimerize by
reciprocal phosphotyrosine-SH2 interchange.
accumulate in the nucleus,
bind DNA and drive transcription.
This pathway has proved to be of wide importance with seven STATs now known in mammals that take part in a wide variety of developmental and homeostatic events in all multicellular animals. Crystallographic analysis defined functional domains in the STATs, and current attention is focused on two areas:
how the STATs complete their cycle of activation and inactivation, which requires regulated tyrosine dephosphorylation; and how
persistent activation of STAT3 that occurs in a high proportion of many human cancers contributes to blocking apoptosis in cancer cells.
Current efforts are devoted to inhibiting STAT3 with modified peptides that can enter cells.
Cell cycle regulation and the cellular response to genotoxic stress
Stephen J Elledge, PhD, Gregor Mendel Professor of Genetics and Medicine, Investigator, Howard Hughes Medical Institute, Harvard Medical School As a postdoctoral fellow at Stanford working on eukaryotic homologous recombination, he serendipitously found a family of genes known as ribonucleotide reductases. He subsequently showed that
these genes are activated by DNA damage and
could serve as tools to help scientists dissect the signaling pathways
through which cells sense and respond to DNA damage and replication stress.
At Baylor College of Medicine he made a second major breakthrough with the discovery of the cyclin-dependent kinase 2 gene (Cdk2), which
controls the G1-to-S cell cycle transition,
an entry checkpoint for the cell proliferation cycle and
a critical regulatory step in tumorigenesis.
From there, using a novel “two-hybrid” cloning method he developed, Elledge and Wade Harper, PhD, proceeded to
isolate several members of the Cdk2-inhibitory family.
Their discoveries included the p21 and p57 genes, mutations in the latter (responsible for Beckwith-Wiedemann syndrome), characterized by somatic overgrowth and increased cancer risk. Elledge is also recognized for his work in understanding
proteome remodeling through ubiquitin-mediated proteolysis.
they identified F-box proteins that regulate protein degradation in the cell by
binding to specific target protein sequences and then
marking them with ubiquitin for destruction by the cell’s proteasome machinery.
This breakthrough resulted in
the elucidation of the cullin ubiquitin ligase family,
which controls regulated protein stability in eukaryotes.
Elledge’s recent research has focused on the cellular mechanisms underlying DNA damage detection and cancer using genetic technologies. In collaboration with Cold Spring Harbor Laboratory researcher Gregory Hannon, PhD, Elledge has generated complete human and mouse short hairpin RNA (shRNA) libraries for genome-wide loss-of-function studies. Their efforts have led to
the identification of a number of tumor suppressor proteins
genes upon which cancer cells uniquely depend for survival.
This work led to the development of the “non-oncogene addiction” concept. This is noted as follows:
proteome remodeling through ubiquitin-mediated proteolysis
F-box proteins regulate protein degradation in the cell by binding to specific target protein sequences
and then marking them with ubiquitin for destruction by the cell’s proteasome machinery
elucidation of the cullin ubiquitin ligase family, which controls regulated protein stability in eukaryotes
Playing the dual roles of inventor and investigator, Elledge developed original techniques to define
what drives the cell cycle and
how cells respond to DNA damage.
By using these tools, he and his colleagues have identified multiple genes involved in cell-cycle regulation.
Elledge’s work has earned him many awards, including a 2001 Paul Marks Prize for Cancer Research and a 2003 election to the National Academy of Sciences. In his Inaugural Article (1), published in this issue of PNAS, Elledge and his colleagues describe the function of Fbw7, a protein involved in controlling cell proliferation (see below). Elledge studied the error-prone DNA repair mechanism in E-Coli (Escherichia coli) called SOSmutagenesis for his PhD thesis at MIT. His work identified and described
the regulation of a group of enzymes now known as error-prone polymerases,
the first members of which were the umuCD genes in E. coli.
It was then that he developed a new cloning tool. Elledge invented a technique that allowed him to approach future cloning problems of this type with great rapidity. With the new technique, “you could make large libraries in lambda that behave like plasmids. We called them `phasmid’ vectors, like plasmid and phage together”. The phasmid cloning method was an early cornerstone for molecular biology research.
Elledge began working on homologous recombination in postdoctoral fellowship at Stanford University, an important niche in the field of eukaryotic genetics. Working with the yeast genome, Elledge searched for rec A, a gene that allows DNA to recombine homologously. Although he never located rec A, he discovered a family of genes known as ribonucleotide reductases (RNRs), which are involved in DNA production. Rec A and RNRs share the same last 4 amino acids, which caused an antibody crossreaction in one of Elledge’s experiments. Initially disappointed with the false positives in his hunt for rec A, Elledge was later delighted with his luck. He found that
RNRs are turned on by DNA damage, and
these genes are regulated by the cell cycle.
Prior to leaving Stanford, Elledge attended a talk at the University of California, San Francisco, by Paul Nurse, a leader in cell-cycle research who would later win the 2001 Nobel Prize in medicine. Nurse described his success in isolating the homolog of a key human cell-cycle kinase gene, Cdc2, by using a mutant strain of yeast (8). Although Nurse’s methods were primitive, Elledge was struck by the message he carried: that
cell-cycle regulation was functionally conserved, and
many human genes could be isolated by looking for complimentary genes in yeast.
Elledge then took advantage of his past successes in building phasmid vectors to build a versatile human cDNA library that could be expressed in yeast. After setting up a laboratory at Baylor, he introduced this library into yeast, screening for complimentary cell-cycle genes. He quickly identified the same Cdc2 gene isolated by Nurse. However, Elledge also discovered a related gene known as Cdk2. Elledge subsequently found that
Cdk2 controlled the G1 to S cell-cycle transition, a step that often goes awry in cancer. These results were published in the EMBO Journal in 1991.
He then continued to use
RNRs to perform genetic screens to
identify genes involved in sensing and responding to DNA damage.
He subsequently worked out the
signal transduction pathways in both yeast and humans that recognize damaged DNA and replication problems.
These “checkpoint” pathways are central to the
prevention of genomic instability and a key to understanding tumorigenesis.
This contribution is part of the special series of Inaugural Articles by members of the National Academy of Sciences elected on April 29, 2003.
Defective cardiovascular development and elevated cyclin E and Notch proteins in mice lacking the Fbw7 F-box protein.
The mammalian F-box protein Fbw7 and its Caenorhabditis elegans counterpart Sel-10 have been implicated in
the ubiquitin-mediated turnover of cyclin E
as well as the Notch Lin-12 family of transcriptional activators. Both unregulated
Notch and cyclin E
promote tumorigenesis, and
inactivate mutations in human
Fbw7 studies suggest that it may be a tumor suppressor. To generate an in vivo system to assess the consequences of such unregulated signaling, we generated mice deficient for Fbw7. Fbw7-null mice die around 10.5 days post coitus because of a combination of deficiencies in hematopoietic and vascular development and heart chamber mutations. The absence of Fbw7 results in elevated levels of cyclin E, concurrent with inappropriate DNA replication in placental giant trophoblast cells. Moreover, the levels of both Notch 1 and Notch 4 intracellular domains were elevated, leading to stimulation of downstream transcriptional pathways involving Hes1, Herp1, and Herp2. These data suggest essential functions for Fbw7 in controlling cyclin E and Notch signaling pathways in the mouse.
Science as an Adventure
Ubiquitins
Prof. Avram Hershko – Science as an Adventure Prof. Avram Hershko shared the 2004 Nobel Prize in Chemistry with Aaron Ciechanover and Irwin Rose for “for the discovery of ubiquitin-mediated protein degradation.”
Nipam Patel is a professor in the Departments of Molecular and Cell Biology and Integrative Biology at UC Berkeley and runs a research laboratory that studies the role, during embryonic development, of homeotic genes (the genetic switches described in this feature). “Ghost in Your Genes” focuses on epigenetic “switches” that turn genes “on” or “off.” But not all switches are epigenetic; some are genetic. That is, other genes within the chromosome turn genes on or off. In an animal’s embryonic stage, these gene switches play a predominant role in laying out the animal’s basic body plan and perform other early functions;
the epigenome begins to take over during the later stages of embryogenesis.
Beginning as a fertilized single egg that egg becomes many different kinds of cells. Altogether, multicellular organisms like humans have thousands of differentiated cells. Each is optimized for use in the brain, the liver, the skin, and so on. Remarkably, the DNA inside all these cells is exactly the same. What makes the cells differ from one another is that different genes in that DNA are either turned on or off in each type of cell.
Take a typical cell, such as a red blood cell. Each gene within that cell has a coding region that encodes the information used to make a particular protein. (Hemoglobin shuttles oxygen to the tissues and carbon dioxide back out to the lungs—or gills, if you’re a fish.) But another region of the gene, called “regulatory DNA,” determines whether and when the gene will be expressed, or turned on, in a particular kind of cell. This precise transcribing of genes is handled by proteins known as transcription factors, which bind to the regulatory DNA, thereby generating instructions for the coding region.
One important class of transcription factors is encoded by the so called homeotic, or Hox, genes. Found in all animals, Hox genes act to “regionalize” the body along the embryo’s anterior-to-posterior (head-to-tail) axis. In a fruit fly, for example, Hox genes lay out the various main body segments—the head, thorax, and abdomen. Amazingly, all animals, from fruit flies to mice to people, rely on the same basic Hox-gene complex. Using different-colored antibody stains, we can see exactly where and to what degree Hox genes are expressed. Each Hox gene is expressed in a specific region along the anterior-to-posterior axis of the embryo.
A fly’s body has three main divisions: head, thorax, and abdomen. We’ll focus on the thorax, which itself has three main segments. In a normal adult fly, the second thoracic segment features a pair of wings, while the third thoracic segment has a pair of small, balloon-shaped structures called halteres. A modified second wing, the haltere serves as a flight stabilizer. In order for the pair of wings and the pair of halteres (as well as all other parts of the fly) to develop properly, the fly’s suite of
Hox genes must be expressed in a precise way and at precise times.
During development, the fly’s two wings grow from a structure in the larva known as the wing imaginal disk. (An imago is an insect in its final, adult state.) The haltere grows from the larval haltere imaginal disk. Remember the Ubx Hox gene? Using staining again, we can detect the gene product of Ubx. This reveals that
the Ubx gene is naturally “off” in the wing disk—
and is “on” in the haltere disk.
Now you’ll see what happens when the Ubx gene—just one of a large number of Hox genes—is turned off in the haltere disk. What if a genetic mutation caused the Ubx gene to be turned off, during the larval stage, in the third thoracic segment, the segment that normally produces the haltere? Instead of a pair of halteres, the fly has a second set of wings. With the switch of that single Hox gene, Ubx, from on to off, the third thoracic segment becomes an additional second thoracic segment and the pair of halteres became a second pair of wings. This illustrates the remarkable ability of transcription factors like Ubx to control patterning as well as cell type during development.
ENCODE
A. Data Suggests “Gene” Redefinition
As part of a huge collaborative effort called ENCODE (Encyclopedia of DNA Elements), a research team led by Cold Spring Harbor Laboratory (CSHL) Professor Thomas Gingeras, PhD, publishes a genome-wide analysis of RNA messages, called transcripts, produced within human cells. Their analysis—one component of a massive release of research results by ENCODE teams from 32 institutes in 5 countries, with 30 papers appearing in 3 different high-level scientific journals—shows that three-quarters of the genome is capable of being transcribed. This indicates that nearly all of our genome is dynamic and active. It stands in marked contrast to consensus views prior to ENCODE’s comprehensive research efforts, which suggested that
only the small protein-encoding fraction of the genome was transcribed.
The vast amount of data generated with advanced technologies by Gingeras’ group and others in the ENCODE project changes the prevailing understanding of what defines a gene. The current outstanding question concerns
the nature and range of those functions. It is thought that these
“non-coding” RNA transcripts act something like components of a giant, complex switchboard, controlling a network of many events in the cell by
regulating the processes of
replication,
transcription
and translation
– that is, the copying of DNA and the making of proteins is based on information carried by messenger RNAs. With the understanding that so much of our DNA can be transcribed into RNA comes the realization that there is much less space between what we previously thought of as genes, Gingeras points out.
The full ENCODE Consortium data sets can be freely accessed through
the ENCODE project portal as well as at the University of California at Santa Cruz genome browser,
the National Center for Biotechnology Information, and
the European Bioinformatics Institute.
Topic threads that run through several different papers can be explored via the ENCODE microsite page at http://Nature.com/encode. Date: September 5, 2012 Source: Cold Spring Harbor Laboratory
1000 Genomes Project Team Reports on Variation Patterns
(from Phase I Data) October 31, 2012 GenomeWeb
In a study appearing online today in Nature, members of the 1000 Genomes Project Consortium presented an integrated haplotype map representing the genomic variation present in more than 1,000 individuals from 14 human populations. Using data on 1,092 individuals tested by
low-coverage whole-genome sequencing,
deep exome sequencing, and/or
dense genotyping,
the team looked at the nature and extent of the rare and common variation present in the genomes of individuals within these populations. In addition to population-specific differences in common variant profiles, for example, the researchers found distinct rare variant patterns within populations from different parts of the world — information that is expected to be important in interpreting future disease studies. They also encountered a surprising number of the variants that are expected to impact gene function, such as
non-synonymous changes,
loss-of-function variants, and, in some cases,
potentially damaging mutations.
ENCODE was designed to pick up where the Human Genome Project left off. Although that massive effort revealed the blueprint of human biology, it quickly became clear that the instruction manual for reading the blueprint was sketchy at best. Researchers could identify in its 3 billion letters many of the regions that code for proteins, but they make up little more than 1% of the genome, contained in around 20,000 genes. ENCODE, which started in 2003, is a massive data-collection effort designed to catalogue the
‘functional’ DNA sequences,
learn when and in which cells they are active and
trace their effects on how the genome is
packaged,
regulated and
read.
After an initial pilot phase, ENCODE scientists started applying their methods to the entire genome in 2007. That phase came to a close with the publication of 30 papers, in Nature, Genome Research and Genome Biology. The consortium has assigned some sort of function to roughly 80% of the genome, including
more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression —
and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes (see page 57)1. But the job is far from done.
proteins interact with the DNA to control gene expression.
Overall, the Encode data define regulatory switches that are scattered all over the three billion nucleotides of the genome. In fact, the data suggests,
the regions that lie between gene-coding sequences contain a wealth of previously unrecognized functional elements,Including
nonprotein-coding RNA transcribed sequences,
transcription factor binding sites,
chromatin structural elements, and
DNA methylation sites.
The combined results suggest that 95% of the genome lies within 8 kb of a DNA-protein interaction, and 99% lies within 1.7 kb of at least one of the biochemical events, the researchers say. Importantly, given the complex three-dimensional nature of DNA, it’s also apparent that
a regulatory element for one gene may be located quite some ‘linear’ distance from the gene itself.
“The information processing and the intelligence of the genome reside in the regulatory elements,” explains Jim Kent, director of the University of California, Santa Cruz Genome Browser project and head of the Encode Data Coordination Center. “With this project, we probably went from understanding less than 5% to now around 75% of them.” The ENCODE results also identified SNPs within regulatory regions that are associated with a range of diseases, providing new insights into the roles that
noncoding DNA plays in disease development.
“As much as nine out of 10 times, disease-linked genetic variants are not in protein-coding regions,” comments Mike Pazin, Encode program director at the National Human Genome Research Institute. “Far from being junk DNA, this regulatory DNA clearly makes important contributions to human disease.”
Other Related Articles on this Open Access Online Scientific Journal, include the following:
Impact of evolutionary selection on functional regions: The imprint of evolutionary selection on ENCODE regulatory elements is manifested between species and within human populations s Saha