The Binding of Oligonucleotides in DNA and 3-D Lattice Structures
Larry H Bernstein, MD, FCAP
This article is a renewal of a previous discussion on the role of genomics in discovery of therapeutic targets which focused on:
- key drivers of cellular proliferation,
- stepwise mutational changes coinciding with cancer progression, and
- potential therapeutic targets for reversal of the process.
“The Birth of BioInformatics & Computational Genomics” lays the manifold multivariate systems analytical tools that has moved the science forward to a ground that ensures clinical application. Their is a web-like connectivity between inter-connected scientific discoveries, as significant findings have led to novel hypotheses and has driven our understanding of biological and medical processes at an exponential pace owing to insights into the chemical structure of DNA,
- the basic building blocks of DNA and proteins,
- of nucleotide and protein-protein interactions,
- protein folding, allostericity, genomic structure,
- DNA replication,
- nuclear polyribosome interaction, and
- metabolic control.
In addition, the emergence of methods for
- removal and insertion, and
- improvements in structural analysis as well as
- developments in applied mathematics have transformed the research framework.
Three-Dimensional Folding and Functional Organization Principles of The Drosophila Genome Sexton T, Yaffe E, Kenigeberg E, Bantignies F,…Cavalli G. Institute de Genetique Humaine, Montpelliere GenomiX, and Weissman Institute, France and Israel. Cell 2012; 148(3): 458-472. http://dx.doi.org/10.1016/j.cell.2012.01.010 http://www.ncbi.nlm.nih.gov/pubmed/22265598 Chromosomes are the physical realization of genetic information and thus form the basis for its
- readout and propagation.
Here we present a high-resolution chromosomal contact map derived from a modified genome-wide chromosome conformation capture approach applied to Drosophila embryonic nuclei. The entire genome is linearly partitioned into well-demarcated physical domains that overlap extensively with
- active and repressive epigenetic marks.
Chromosomal contacts are hierarchically organized between domains. Global modeling of contact density and clustering of domains show that
- inactive domains are condensed and confined to their chromosomal territories, whereas
- active domains reach out of the territory to form remote intra- and interchromosomal contacts.
- we systematically identify specific long-range intrachromosomal contacts between Polycomb-repressed domains
Together, these observations allow for quantitative prediction of the Drosophila chromosomal contact map, laying the foundation for detailed studies of
- chromosome structure and function in a genetically tractable system.
“Mr. President; The Genome is Fractal !” Eric Lander (Science Adviser to the President and Director of Broad Institute) et al. delivered the message on Science Magazine cover (Oct. 9, 2009) and generated interest in this by the International HoloGenomics Society at a Sept meeting. First, it may seem to be trivial to rectify the statement in “About cover” of Science Magazine by AAAS. The statement
- “the Hilbert curve is a one-dimensional fractal trajectory” needs mathematical clarification.
While the paper itself does not make this statement, the new Editorship of the AAAS Magazine might be even more advanced if the previous Editorship did not reject (without review)
- a Manuscript by 20+ Founders of (formerly) International PostGenetics Society in December, 2006.
Second, it may not be sufficiently clear for the reader that the reasonable requirement for the
- DNA polymerase to crawl along a “knot-free” (or “low knot”) structure does not need fractals.
A “knot-free” structure could be spooled by an ordinary “knitting globule” (such that the DNA polymerase does not bump into a “knot” when duplicating the strand; just like someone knitting can go through the entire thread without encountering an annoying knot):
- Just to be “knot-free” you don’t need fractals.
Note, however, that the “strand” can be accessed only at its beginning – it is impossible to e.g.
- to pluck a segment from deep inside the “globulus”.
This is where certain fractals provide a major advantage – that could be the “Eureka” moment. For instance, the mentioned Hilbert-curve is not only “knot free” – but provides an easy access to
- “linearly remote” segments of the strand.
If the Hilbert curve starts from the lower right corner and ends at the lower left corner, for instance
- the path shows the very easy access of what would be the mid-point if the Hilbert-curve
- is measured by the Euclidean distance along the zig-zagged path.
Likewise, even the path from the beginning of the Hilbert-curve is about equally easy to access - easier than to reach from the origin a point that is about 2/3 down the path. The Hilbert-curve provides an easy access between two points within the “spooled thread”; from a point that is about 1/5 of the overall length to about 3/5 is also in a “close neighborhood”. This marvellous fractal structure is illustrated by the 3D rendering of the Hilbert-curve. Once you observe such fractal structure,
- you’ll never again think of a chromosome as a “brillo mess”, would you?
It will dawn on you that the genome is orders of magnitudes more finessed than we ever thought so. Those embarking at a somewhat complex review of some historical aspects of the power of fractals may wish to consult the ouvre of Mandelbrot (also, to celebrate his 85th birthday). For the more sophisticated readers, even the fairly simple Hilbert-curve (a representative of the Peano-class) becomes even more stunningly brilliant than just some “see through density”. Those who are familiar with the classic “Traveling Salesman Problem” know that “the shortest path along which every given n locations can be visited once, and only once” requires fairly sophisticated algorithms (and tremendous amount of computation if n>10 (or much more). Some readers will be amazed, therefore, that for n=9 the underlying Hilbert-curve helps to provide an empirical solution. refer to email@example.com Briefly, the significance of the above realization, that the (recursive) Fractal Hilbert Curve is intimately connected to the (recursive) solution of TravelingSalesman Problem, a core-concept of Artificial Neural Networks can be summarized as below. Accomplished physicist John Hopfield (already a member of the National Academy of Science) aroused great excitement in 1982 with his (recursive) design of artificial neural networks and learning algorithms which were able to find solutions to combinatorial problems such as the Traveling SalesmanProblem. (Book review Clark Jeffries, 1991; see J Anderson, Rosenfeld, and A Pellionisz (eds.), Neurocomputing 2: Directions for research, MIT Press, Cambridge, MA, 1990): “Perceptions were modeled chiefly with neural connections in a “forward” direction: A -> B -* C — D. The analysis of networks with strong backward coupling proved intractable. All our interesting results arise as consequences of the strong back-coupling” (Hopfield, 1982). The Principle of Recursive Genome Function surpassed obsolete axioms that blocked, for half a Century, entry of recursive algorithms to interpretation of the structure-and function of (Holo)Genome. This breakthrough,
- by uniting the two largely separate fields of Neural Networks and Genome Informatics,
is particularly important for those who focused on Biological (actually occurring) Neural Networks (rather than abstract algorithms that may not, or because of their core-axioms, simply could not represent neural networks under the governance of DNA information). If biophysicist Andras Pellionisz is correct, genetic science may be on the verge of yielding its third — and by far biggest — surprise. With a doctorate in physics, Pellionisz is the holder of Ph.D.’s in computer sciences and experimental biology from the prestigious Budapest Technical University and the Hungarian National Academy of Sciences. A biophysicist by training, the 59-year-old is a former research associate professor of physiology and biophysics at New York University, author of numerous papers in respected scientific journals and textbooks, a past winner of the prestigious Humboldt Prize for scientific research, a former consultant to NASA and holder of a patent on the world’s first artificial cerebellum, a technology that has already been integrated into research on advanced avionics systems. Because of his background, the Hungarian-born brain researcher might also become one of the first people to successfully launch a new company by
- using the Internet to gather momentum for a novel scientific idea.
The genes we know about today, Pellionisz says, can be thought of as something similar to machines that make bricks (proteins, in the case of genes), with certain junk-DNA sections providing a blueprint for the different ways those proteins are assembled. The notion that at least certain parts of junk DNA might have a purpose for example, many researchers
- now refer to with a far less derogatory term: introns.
In a provisional patent application filed July 31, Pellionisz claims to have
- unlocked a key to the hidden role junk DNA
plays in growth — and in life itself. His patent application covers all attempts to
- measure and
the fractal properties of introns for diagnostic and therapeutic purposes.
|The FractoGene Decade from Inception in 2002 Proofs of Concept and Impending Clinical Applications by 2012Junk DNA Revisited (SF Gate, 2002)The Future of Life, 50th Anniversary of DNA (Monterey, 2003)Mandelbrot and Pellionisz (Stanford, 2004)Morphogenesis, Physiology and Biophysics (Simons, Pellionisz 2005)PostGenetics; Genetics beyond Genes (Budapest, 2006)ENCODE-conclusion (Collins, 2007)The Principle of Recursive Genome Function (paper, YouTube, 2008)You Tube Cold Spring Harbor presentation of FractoGene (Cold Spring Harbor, 2009)Mr. President, the Genome is Fractal! (2009)HolGenTech, Inc. Founded (2010)Pellionisz on the Board of Advisers in the USA and India (2011)ENCODE – final admission (2012) Recursive Genome Function is Clogged by Fractal Defects in Hilbert-Curve (2012) Geometric Unification of Neuroscience and Genomics (2012) US Patent Office issues FractoGene 8,280,641 to Pellionisz (2012) http://www.junkdna.com/the_fractogene_decade.pdf|
The Hidden Fractal Language of Intron DNA
To fully understand Pellionisz’ idea, one must first know what a fractal is. Fractals are a way that nature organizes matter. Fractal patterns can be found in anything that has a nonsmooth surface (unlike a billiard ball), such as
- coastal seashores,
- the branches of a tree or
- the contours of a neuron (a nerve cell in the brain).
Some, but not all, fractals are self-similar and stop repeating their patterns at some stage;
- the branches of a tree, for example, can get only so small.
Because they are geometric, meaning they have a shape, fractals can be described in mathematical terms. It’s similar to the way a circle can be described by using a number to represent its radius (the distance from its center to its outer edge). When that number is known, it’s possible to draw the circle it represents without ever having seen it before. Although the math is much more complicated, the same is true of fractals. If one has the formula for a given fractal, it’s possible to use that formula to construct, or reconstruct, an image of whatever structure it represents, no matter how complicated. The mysteriously repetitive but not identical strands of genetic material are in reality
- building instructions organized in a special type of pattern known as a fractal.
It’s this pattern of fractal instructions, he says, that tells genes what they must do in order to form living tissue, everything from the wings of a fly to the entire body of a full-grown human. In a move sure to alienate some scientists, Pellionisz chose the unorthodox route of making his initial disclosures online on his own Web site. He picked that strategy, he says, because it is the fastest way he can document his claims and find scientific collaborators and investors. Most mainstream scientists usually blanch at such approaches, preferring more traditionally credible methods, such as publishing articles in peer-reviewed journals. Pellionisz’ idea is that a fractal set of building instructions in the DNA plays a role in organizing life itself. Decode the language, and in theory it could be reverse engineered. Just as knowing the radius of a circle lets one create that circle. The fractal-based formula
- would allow us to understand how a heart or disease-fighting antibodies is created.
The idea is encourage new collaborations across the boundaries that separate the intertwined
- disciplines of biology, mathematics and computer sciences.
Hal Plotkin, Special to SF Gate. Thursday, November 21, 2002. http://www.junkdna.com/ http://www.junkdna.com/the_fractogene_decade.pdf http://www.sciencentral.com/articles/view.php3?article_id=218392305 http://www.news-medical.net/health/Junk-DNA-What-is-Junk-DNA.aspx http://www.kurzweilai.net/junk-dna-plays-active-role-in-cancer-progression-researchers-find http://marginalrevolution.com/marginalrevolution/2013/05/the-battle-over-junk-dna http://profiles.nlm.nih.gov/SC/B/B/F/T/_/scbbft.pdf
Human Genome is Multifractal
The human genome: a multifractal analysis. Moreno PA, Vélez PE, Martínez E, et al. BMC Genomics 2011, 12:506. http://www.biomedcentral.com/1471-2164/12/506 Several studies have shown that genomes can be studied via a multifractal formalism. These researchers used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. They investigated the possibility that the human genome shows a similar behavior to that observed in the nematode. They report
- multifractality in the human genome sequence.
This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content.
- Gene function,
- cluster of orthologous genes,
- metabolic pathways, and
- tended to increase their frequencies with ranges of multifractality and
- large gene families were located in genomic regions with varied multifractality.
- a multifractal map and classification for human chromosomes are proposed.
They propose a descriptive non-linear model for the structure of the human genome. This model reveals a multifractal regionalization where many regions coexist that are
- far from equilibrium and this non-linear organization has significant
- molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome.
Given the role of Alu sequences in
- gene regulation
- genetic diseases
- human genetic diversity
- adaptation and phylogenetic analyses
these quantifications are especially useful.
MiIP: The Monomer Identification and Isolation Program
Bun C, Ziccardi W, Doering J and Putonti C. Evolutionary Bioinformatics 2012:8 293-300. http://dx.doi.org/10.4137/EBO.S9248 Repetitive elements within genomic DNA are both functionally and evolution-wise informative. Discovering these sequences ab initio is computationally challenging, compounded by the fact that sequence identity between repetitive elements can vary significantly. These investigators present a new application, the Monomer Identification and Isolation Program (MiIP),
- which provides functionality to both search for a particular repeat as well as
- discover repetitive elements within a larger genomic sequence.
To compare MiIP’s performance with other repeat detection tools, analysis was conducted for synthetic sequences as well as several a21-II clones and HC21 BAC sequences. The main benefit of MiIP is
- it is a single tool capable of searching for both known monomeric sequences
- discovering the occurrence of repeats ab initio
Triplex DNA: A third strand for DNA
The DNA double helix can under certain conditions accommodate
- a third strand in its major groove.
Researchers in the UK presented a complete set of four variant nucleotides that makes it
- possible to use this phenomenon in gene regulation and mutagenesis.
Natural DNA only forms a triplex if the targeted strand is rich in purines – guanine (G) and adenine (A) - which in addition to the bonds of the Watson-Crick base pairing
- can form two further hydrogen bonds,
- the ‘third strand’ oligonucleotide has the matching sequence of pyrimidines - cytosine (C) and thymine (T).
Any Cs or Ts in the target strand of the duplex will only bind very weakly, as
- they contribute just one hydrogen bond.
Moreover, the recognition of G requires the C in the probe strand to be protonated,
- triplex formation will only work at low pH.
To overcome all these problems, the groups of Tom Brown and Keith Fox at the University of Southampton have developed modified building blocks, and have now completed
- a set of four new nucleotides, each of which will bind to one DNA nucleotide from the major groove of the double helix.
They tested the binding of a 19-mer of these designer nucleotides to a double helix target sequence in comparison with the corresponding triplex-forming oligonucleotide made from natural DNA bases. Using fluorescence-monitored thermal melting and DNase I footprinting, the researchers showed that
- their construct forms stable triplex even at neutral pH.
Tests with mutated versions of the target sequence showed that
- three of the novel nucleotides are highly selective for their target base pair,
- while the ‘S’ nucleotide, designed to bind to T, also tolerates C.
DA Rusling et al, Nucleic Acids Res. 2005, 33, 3025 http://nucleicacidsres.com/Rusling_DA KM Vasquez et al, Science 2000, 290, 530 http://Science.org/2000/290.530/Vazquez_KM/ Frank-Kamenetskii MD, Mirkin SM. Annual Rev Biochem 1995; 64:69-95. http://www.annualreviews.org/aronline/1995/Frank-Kamenetski_MD/64.69/ Since the pioneering work of Felsenfeld, Davies, & Rich, double-stranded polynucleotides containing purines in one strand and pydmidines in the other strand [such as poly(A)/poly(U), poly(dA)/poly(dT), or poly(dAG)/ poly(dCT)] have been known to be able to undergo a stoichiometric transition forming a triple-stranded structure containing one polypurine and two poly-pyrimidine strands. Early on, it was assumed that the third strand was located in the major groove and associated with the duplex via non-Watson-Crick interactions now
- known as Hoogsteen pairing.
Triple helices consisting of one pyrimidine and two purine strands were also proposed. However, notwithstanding the fact that single-base triads in tRNA structures were well- documented, triple-helical DNA escaped wide attention before the mid-1980s. The interest in DNA triplexes arose due to two partially independent developments.
- homopurine-homopyrimidine stretches in super-coiled plasmids were found to adopt an unusual DNA structure, called H-DNA which includes a triplex.
- several groups demonstrated that homopyrimidine and some purine-rich oligonucleotides
- can form stable and sequence-specific complexes with
- corresponding homopurine-homopyrimidine sites on duplex DNA.
These complexes were shown to be triplex structures rather than D-loops, where
- the oligonucleotide invades the double helix and displaces one strand.
A characteristic feature of all these triplexes is that the two
- chemically homologous strands (both pyrimidine or both purine) are antiparallel.
These findings led explosive growth in triplex studies. One can easily imagine numerous ”geometrical” ways to form a triplex, and those that have been studied experimentally. The canonical intermolecular triplex consists of either
- three independent
- oligonucleotide chains or of a
- long DNA duplex carrying homopurine-homopyrimidine insert
- and the corresponding oligonucleotide.
Triplex formation strongly depends on the oligonucleotide(s) concentration. A single DNA
- chain may also fold into a triplex connected by two loops.
To comply with the sequence and polarity requirements for triplex formation, such a DNA strand must have a peculiar sequence: It contains a mirror repeat
- (homopyrimidine for YR*Y triplexes and homopurine for YR*R triplexes)
- flanked by a sequence complementary to one half of this repeat.
Such DNA sequences fold into triplex configuration much more readily than do the corresponding intermolecular triplexes, because all triplex forming segments are brought together within the same molecule. It has become clear that both
- sequence requirements and chain polarity rules for triplex formation
can be met by DNA target sequences built of clusters of purines and pyrimidines. The third strand consists of adjacent homopurine and homopyrimidine blocks forming Hoogsteen hydrogen bonds with purines on alternate strands of the target duplex, and
- this strand switch preserves the proper chain polarity.
These structures, called alternate-strand triplexes, have been experimentally observed as both intra- and inter-molecular triplexes. These results increase the number of potential targets for triplex formation in natural DNAs somewhat by adding sequences composed of purine and pyrimidine clusters, although arbitrary sequences are still not targetable because
- strand switching is energetically unfavorable.
References: Lyamichev VI, Mirkin SM, Frank-Kamenetskii MD. J. Biomol. Stract. Dyn. 1986; 3:667-69. http://JbiomolStractDyn.com/1986/Lyamichev_VI/3.667/ Filippov SA, Frank-Kamenetskii MD. Nature 1987; 330:495-97. http://Nature.com/1987/Fillipov_SA/330.495/ Demidov V, Frank-Kamenetskii MD, Egholm M, Buchardt O, Nielsen PE. Nucleic Acids Res. 1993; 21:2103-7. http://NucleicAcidsResearch.com/1993/Demidov_V/21.2103/ Mirkin SM, Frank-Kamenetskii MD. Anna. Rev. Biophys. Biomol. Struct. 1994; 23:541-76. http://AnnRevBiophysBiomolecStructure.com/1994/Mirkin_SM/23.541/ Hoogsteen K. Acta Crystallogr. 1963; 16:907-16 http://ActaCrystallogr.com/1963/Hoogsteen_K/16.907/ Malkov VA, Voloshin ON, Veselkov AG, Rostapshov VM, Jansen I, et al. Nucleic Acids Res. 1993; 21:105-11. http://NucleicAcidsResearch.com/1993/Malkov_VA/21.105 Malkov VA, Voloshin ON, Soyfer VN, Frank-Kamenetskii MD. Nucleic Acids Res. 1993; 21:585-91 http://NucleicAcidsRes.com/1993/Malkov_VA/21.585/ Chemy DY, Belotserkovskii BP, Frank-Kamenetskii MD, Egholm M, Buchardt O, et al. Proc. Natl. Acad. Sci. USA 1993; 90:1667-70 http://PNAS.org/1993/Chemy_DY/90.1667/ Triplex forming oligonucleotides Triplex forming oligonucleotides: sequence-specific tools for genetic targeting. Knauert MP, Glazer PM. Human Molec Genetics 2001; 10(20):2243-2251. http://HumanMolecGenetics.com/2001/Knauert_ MP/10.2243/ Triplex forming oligonucleotides (TFOs) bind in the major groove of duplex DNA with a
- high specificity and affinity.
Because of these characteristics, TFOs have been proposed as
- homing devices for genetic manipulation in vivo.
These investigators review work demonstrating the ability of TFOs and related molecules
- to alter gene expression and mediate gene modification in mammalian cells.
TFOs can mediate targeted gene knock out in mice, providing a foundation for potential
- application of these molecules in human gene therapy.
The Triplex Genetic Code
Novagon DNA John Allen Berger, founder of Novagon DNA and The Triplex Genetic Code Over the past 12+ years, Novagon DNA has amassed a vast array of empirical findings which
- challenge the “validity” of the “central dogma theory”, especially the current five nucleotide
- Watson-Crick DNA and RNA genetic codes. DNA = A1T1G1C1, RNA =A2U1G2C2.
We propose that our new Novagon DNA 6 nucleotide Triplex Genetic Code has more validity than
- the existing 5 nucleotide (A1T1U1G1C1) Watson-Crick genetic codes.
Our goal is to conduct a “world class” validation study to replicate and extend our findings.
Methods for Examining Genomic and Proteomic Interactions.
An Integrated Statistical Approach to Compare Transcriptomics Data Across Experiments: A Case Study on the Identification of Candidate Target Genes of the Transcription Factor PPARα Ullah MO, Müller M and Hooiveld GJEJ. Bioinformatics and Biology Insights 2012;6: 145–154. http://dx.doi.org/10.4137/BBI.S9529 http://www.ncbi.nlm.nih.gov/pubmed/22783064 Corresponding author email: firstname.lastname@example.org http://edepot.wur.nl/213859 An effective strategy to elucidate the signal transduction cascades activated by a transcription factor
- is to compare the transcriptional profiles of wild type and transcription factor knockout models.
Many statistical tests have been proposed for analyzing gene expression data, but
- most tests are based on pair-wise comparisons.
Since the analysis of microarrays involves the testing of multiple hypotheses within one study,
- it is generally accepted to control for false positives by the false discovery rate (FDR).
However, this may be an inappropriate metric for
- comparing data across different experiments.
Here we propose the simultaneous testing and integration of
- the three hypotheses (contrasts) using the cell means ANOVA model.
These three contrasts test for the effect of a treatment in
- wild type,
- gene knockout, and
- globally over all experimental groups
We compare differential expression of genes across experiments while
- controlling for multiple hypothesis testing,
- managing biological complexity across orthologs
- with a visual knowledgebase of documented biomolecular interactions.
Vincent Van Buren & Hailin Chen. Scientific Reports 2012; 2, Article number: 1011 http://dx.doi.org/10.1038/srep01011 The complexity of biomolecular interactions and influences is a major obstacle
- to their comprehension and elucidation.
Visualizing knowledge of biomolecular interactions increases
- comprehension and facilitates the development of new hypotheses.
The rapidly changing landscape of high-content experimental results also presents a challenge
- for the maintenance of comprehensive knowledgebases.
Distributing the responsibility for maintenance of a knowledgebase to a community of
- experts is an effective strategy for large, complex and rapidly changing knowledgebases.
Cognoscente serves these needs
- by building visualizations for queries of biomolecular interactions on demand,
- by managing the complexity of those visualizations, and
- by crowdsourcing to promote the incorporation of current knowledge from the literature.
Imputing functional associations
- between biomolecules and imputing directionality of regulation
- for those predictions each require a corpus of existing knowledge as a framework.
Comprehension of the complexity of this corpus of knowledge will be facilitated by effective
- visualizations of the corresponding biomolecular interaction networks.
Cognoscente (http://vanburenlab.medicine.tamhsc.edu/cognoscente.html) was designed and implemented to serve these roles as a knowledgebase and as
- an effective visualization tool for systems biology research and education.
Cognoscente currently contains over 413,000 documented interactions, with coverage across multiple species. Perl, HTML, GraphViz1, and a MySQL database were used in the development of Cognoscente. Cognoscente was motivated by the need to update the knowledgebase of
- biomolecular interactions at the user level, and
- flexibly visualize multi-molecule query results for
- heterogeneous interaction types across different orthologs.
Satisfying these needs provides a strong foundation for developing new hypotheses about
- regulatory and metabolic pathway topologies.
Several existing tools provide functions that are similar to Cognoscente.
- Globules of Globules of Globules: Research Reveals How Our Cells Pack in All That DNA (23andme.com)
- Mapping the Hilbert curve (bit-player.org)
- Math Monday: 3D Hilbert curve from plumbing supplies (makezine.com)
- Q&A: What is Roche 454 Pyrosequencing? (symposcium.com)