Feeds:
Posts
Comments

Archive for the ‘Cancer Informatics’ Category


AACR and Dr. Margaret Foti Announce Free Virtual Annual Meeting for April 27, 28 2020 and other Free Resources

Reporter: Stephen J. Williams, PhD

Please see the following email from Dr. Foti and the AACR on VIRTUAL MEETING to be conducted April 27 and 28, 2020.

This is truly a wonderful job by AACR.  In a previous posting I had considered the need for moving international scientific meetings to an online format which would make the information available to a wider audience as well as to those who don’t have the opportunity to travel to a meeting site.  At @pharma_BI we will curate and live tweet the talks in order to enhance meeting engagement, as part of the usual eConference Proceedings we do.

Again Great Job by the AACR!

Dear Colleagues,

We hope you are staying safe and well and are adjusting to the challenges of the COVID-19 global pandemic. During this crisis, we remain steadfast in supporting our members and our mission.

I am pleased to announce a number of actions that we are taking to disseminate innovative cancer science and medicine to the global cancer research community:

  • AACR Virtual Annual Meeting 2020: Selected Presentations. We were excited to receive more than 225 clinical trials for presentation at the Annual Meeting. Due to the time-sensitive nature of these trials—many of which are practice-changing—we are making them available to the community at the time of the original April meeting. Therefore, as per our recent announcement, the AACR will host a slate of selected sessions online featuring these cutting-edge data.
This Virtual Annual Meeting will be held on April 27 and 28, 2020, and will include more than 30 oral presentations in several clinical trial plenary sessions along with commentaries from expert discussants, as well as clinical trial poster sessions consisting of short videos providing the authors’ perspectives. The Virtual Meeting will feature a New Drugs on the Horizon session as well as nine minisymposia that will showcase a broad sample of basic and translational science. Topics will include genomics, tumor microenvironment, novel targets, drug discovery, therapeutics, immunotherapy, biomarkers, and cancer prevention. A special minisymposium titled “Advancing Cancer Research Through an International Cancer Registry” will feature use cases of data available through AACR Project GENIE.

This Virtual Meeting will be available free to everyone, although attendees will be asked to register to participate. The session and presentation titles for the Virtual Meeting, as well as a link to the registration site, will be posted to the AACR website by Monday, April 13.

  • Release of Abstracts. All of the abstracts scheduled for presentation in the Virtual Meeting—and any other clinical trial abstracts that are scheduled for presentation at the rescheduled meeting—will be posted online on Monday, April 27. All other abstracts that have been accepted for presentation at the rescheduled meeting will be posted online on Friday, May 15.
  • AACR Annual Meeting 2019: Free Webcast Presentations. The complete webcasts of the AACR Annual Meeting are typically made freely available 15 months after the conclusion of the meeting. However, we have made these webcast presentations available free effective immediately, so that you can review the most compelling science from the Annual Meeting 2019 which was held in Atlanta.
  • Free Access to AACR Journals. To ensure that all members of the cancer research community have access to the information they need during this challenging time, we have opened access to our nine highly esteemed journals effective today through the end of the virtual meeting. Please be sure to visit the AACR journals webpage for journal highlights, and to sign-up for eTOC alerts.
  • Rescheduled AACR Annual Meeting. We are planning to reschedule the Annual Meeting for late August while at the same time closely monitoring the developments surrounding COVID-19. An official announcement of the rescheduled meeting will be made in the near future.

We hope that these plans will enable you to continue your important work during this global health crisis. Thank you for all you do to accelerate progress against cancer, and thank you for your loyalty to the AACR.

Sincerely,
Margaret Foti, PhD, MD (hc)
Chief Executive Officer
American Association for Cancer Research

 

For more information on Virtual Meetings please see

Is It Time for the Virtual Scientific Conference?: Coronavirus, Travel Restrictions, Conferences Cancelled

and  REAL TIME conference coverage at https://pharmaceuticalintelligence.com/press-coverage/

and other article and e-conference proceedings on this Online Open Access Journal

Read Full Post »


Live Notes from Town Hall for Patients with Leading Oncologists on Lung Cancer and COVID19 3_28_20

Reporter: Stephen J. Williams, PhD

UPDATED 3/31/2020

Leading Thoracic Oncologists from the United States and Milan, Italy shared their opinions and views on treating lung cancer patients during this COVID-19 pandemic.  Included in the panel is a thoracic oncologist from Milan Italy who gave special insights into the difficulties and the procedures they are using to help control the spread of infection within this high at-risk patient population and changes to current treatment strategy in light of this current virus outbreak.  Please see live notes and can follow on Twitter at #LungCancerandCOVID19.  Included below is the recording of the Zoom session.

 

UPDATED 3/29/2020

Leading Lung Cancer Oncologists from around the world are meeting and discussing concerns for lung cancer patients and oncologist during the novel coronavirus (SARS-COV2; COVID19) pandemic.  The town hall “COVID-19 and the Impact on Thoracic Oncology” will be held on Zoom on Saturday March 28, 2020 at 10:00 – 11:30 AM EST. sponsored by Axiom Healthcare Strategies . You can register at

Please join this virtual Town Hall

Zoom link: https://us04web.zoom.us/j/846752048

Zoom Webinar ID: 846-752-048

eSpeakers

Anne Chiang, MD, PhD, Associate Professor; Chief Network Officer and Deputy Chief Medical Officer, Smilow Cancer Network

Roy S. Herbst, MD, PhD, Ensign Professor of Medicine (Medical Oncology) and Professor of Pharmacology; Chief of Medical Oncology, Yale Cancer Center and Smilow Cancer Hospital; Associate Cancer Center Director for Translational Research, Yale Cancer Center

 Kurt Schalper, MD, PhD Assistant Professor of Pathology; Director, Translational Immuno-oncology Laboratory

Martin J. Edelman, MD, Chair, Department of Hematology/Oncology, Fox Chase Cancer Center

Corey J. Langer, MD , Professor of Medicine, University of Pennsylvania

Hossain Borghaei, DO, MS , Chief of Thoracic Medical Oncology and Director of Lung Cancer Risk Assessment, Fox Chase Cancer Center

Marina Garassino, MD, Fondazione IRCCS Instituto Nazionale del Tumori

Kristen Ashley Marrone, MD, Thoracic Medical Oncologist. Johns Hopkins Bayview Medical Center

Taofeek Owonikoko, MD, PhD, MSCR, Medical Oncologist, Emory University School of Medicine

Jeffrey D. BradleyMD, FACR, FASTRO , Emory University School of Medicine

Brendon Stiles, M.D, Weil Cornell

@pharma_BI will be Live Tweeting in Real Time this Town Hall

Please follow at the following # (hashtags)

#LungCancerandCOVID19

#Livingwithcancer

#LungCancer

#NoOneAlone

and

UPDATED 3/29/2020

Below is a collection of live Tweets from this meeting as well as some notes and comments from each of the speakers and panelists.  The recording of this Town Hall will be posted on this site when available.  The Town Hall was well attended with over 250 participants

Town Hall Notes

The following represent some notes taken at this Town Hall.

Dr. Owonkiko: 1-2% lethality in China; for patients newly diagnosed with lung cancer 1) limit contact between patient, physician and healthcare facility = telemedicine and oral chemo suggested 2) for immunotherapy if i.v. must monitor health carefully

Dr. Kurt Schalper: on COVID19 testing: Three types of tests each having pros and cons.

  •     viral culture: not always practical as you need lots of specimen
  • ELISA: looking for circulating antibodies but not always specific for type of coronavirus
  • RT-PCR: most sensitive but right now not much clarity on best primers to use; he noted that there is a 15% variance in test results using different primers to different targeted COVID19 genes

Dr. Marina Garassino: The Lombardi outbreak was 1st in Italy and took them by surprise.  She admits they were about one month behind in preparation where they did not have enough masks as late as January 31.  It was impractical to socially distance given Italian customs in greeting each other.  In addition, they had to determine which facilities would be COVID negative and COVID positive an this required access to testing.  Right now they are only testing symptomatic patients and healthcare workers have to test negative multiple times.  As concerning therapy with lung cancer patients, they have been delaying as much as possible the initiation of therapy.  Patients that are on immunotherapy and immunosuppresive drugs are being monitored by CT scan more often during this pandemic so as instances of pneumotitis began increasing they were unsure if these patients are at increased risk of infection to COVID19 or just a bias in that they are screening more often so their risk to COVID 19 is unclear.  Dr. Garissino also felt we need to move from hospital based to community based measures of prevention against COVID infection (social distancing, citizens more vigilant).  She noted that usually the cancer patients are more careful with respect to preventative measures than the general populace.  Healthcare workers have to test negative twice in three days if they had been in close contact with a COVID postitive patient.  However her hospital is still running at 80% capacity so patients are getting treated. However there are ethical issues as to who gets treated, who gets respirators, and other ethical issues related to unfortunate rationing of care.

Dr. Anne Chiang: Scheduled visits have notably decreased.  They have seen patients visits decrease from 4500 down to 2300 in two weeks but telemedicine visits or virtual visits have increased to 1000 so are replacing the on site visits.  She also said they are trying to reduce or eliminate the extremely immuno-suppressive drugs from chemotherapy regimens.  For example they are removing pemetrexemed from standard regimens and also considering neoadjuvant chemotherapy.  As far as biopsies, liquid biopsies can be obtained in the home so more preferred as patients do not have to come in for biopsy.

Dr. Edelman: Fox Chase is somewhat unique in being an NCI center which only does oncology so they rely on neighboring Jeanes Hospital of the Temple University Health System for a lot of their outpatient and surgical and general medicine needs.  Patients who will be transferred back to Fox Chase are screened for COVID19.

Brenden Stiles: Lung cancer surgeries have ground to a halt.  He did only one last week.  The hospital wants to conserve resources and considers lung cancer surgery to great a COVID risk.  They have shut down elective surgeries and there are no clinical trials being conducted.  He said that lung cancer research will be negatively impacted by the pandemic as resources are shuttled to COVID research efforts.

 Live Tweets

 

Other article of note on Coronavirus (COVID19) please see our Coronavirus Portal at

https://pharmaceuticalintelligence.com/coronavirus-portal/

 

 

 

 

Read Full Post »


Cancer Genomics: Multiomic Analysis of Single Cells and Tumor Heterogeneity

Curator: Stephen J. Williams, PhD

 

scTrio-seq identifies colon cancer lineages

Single-cell multiomics sequencing and analyses of human colorectal cancer. Shuhui Bian et al. Science  30 Nov 2018:Vol. 362, Issue 6418, pp. 1060-1063

To better design treatments for cancer, it is important to understand the heterogeneity in tumors and how this contributes to metastasis. To examine this process, Bian et al. used a single-cell triple omics sequencing (scTrio-seq) technique to examine the mutations, transcriptome, and methylome within colorectal cancer tumors and metastases from 10 individual patients. The analysis provided insights into tumor evolution, linked DNA methylation to genetic lineages, and showed that DNA methylation levels are consistent within lineages but can differ substantially among clones.

Science, this issue p. 1060

Abstract

Although genomic instability, epigenetic abnormality, and gene expression dysregulation are hallmarks of colorectal cancer, these features have not been simultaneously analyzed at single-cell resolution. Using optimized single-cell multiomics sequencing together with multiregional sampling of the primary tumor and lymphatic and distant metastases, we developed insights beyond intratumoral heterogeneity. Genome-wide DNA methylation levels were relatively consistent within a single genetic sublineage. The genome-wide DNA demethylation patterns of cancer cells were consistent in all 10 patients whose DNA we sequenced. The cancer cells’ DNA demethylation degrees clearly correlated with the densities of the heterochromatin-associated histone modification H3K9me3 of normal tissue and those of repetitive element long interspersed nuclear element 1. Our work demonstrates the feasibility of reconstructing genetic lineages and tracing their epigenomic and transcriptomic dynamics with single-cell multiomics sequencing.

Fig. 1 Reconstruction of genetic lineages with scTrio-seq2.

Global SCNA patterns (250-kb resolution) of CRC01. Each row represents an individual cell. The subclonal SCNAs used for identifying genetic sublineages were marked and indexed; for details, see fig. S6B. On the top of the heatmap, the amplification or deletion frequency of each genomic bin (250 kb) of the non-hypermutated CRC samples from the TCGA Project and patient CRC01’s cancer cells are shown.

” data-icon-position=”” data-hide-link-title=”0″>

 

Fig. 1 Reconstruction of genetic lineages with scTrio-seq2.

Global SCNA patterns (250-kb resolution) of CRC01. Each row represents an individual cell. The subclonal SCNAs used for identifying genetic sublineages were marked and indexed; for details, see fig. S6B. On the top of the heatmap, the amplification or deletion frequency of each genomic bin (250 kb) of the non-hypermutated CRC samples

 

 

Read Full Post »


Single-cell RNA-seq helps in finding intra-tumoral heterogeneity in pancreatic cancer

Reporter and Curator: Dr. Sudipta Saha, Ph.D.

 

Pancreatic cancer is a significant cause of cancer mortality; therefore, the development of early diagnostic strategies and effective treatment is essential. Improvements in imaging technology, as well as use of biomarkers are changing the way that pancreas cancer is diagnosed and staged. Although progress in treatment for pancreas cancer has been incremental, development of combination therapies involving both chemotherapeutic and biologic agents is ongoing.

 

Cancer is an evolutionary disease, containing the hallmarks of an asexually reproducing unicellular organism subject to evolutionary paradigms. Pancreatic ductal adenocarcinoma (PDAC) is a particularly robust example of this phenomenon. Genomic features indicate that pancreatic cancer cells are selected for fitness advantages when encountering the geographic and resource-depleted constraints of the microenvironment. Phenotypic adaptations to these pressures help disseminated cells to survive in secondary sites, a major clinical problem for patients with this disease.

 

The immune system varies in cell types, states, and locations. The complex networks, interactions, and responses of immune cells produce diverse cellular ecosystems composed of multiple cell types, accompanied by genetic diversity in antigen receptors. Within this ecosystem, innate and adaptive immune cells maintain and protect tissue function, integrity, and homeostasis upon changes in functional demands and diverse insults. Characterizing this inherent complexity requires studies at single-cell resolution. Recent advances such as massively parallel single-cell RNA sequencing and sophisticated computational methods are catalyzing a revolution in our understanding of immunology.

 

PDAC is the most common type of pancreatic cancer featured with high intra-tumoral heterogeneity and poor prognosis. In the present study to comprehensively delineate the PDAC intra-tumoral heterogeneity and the underlying mechanism for PDAC progression, single-cell RNA-seq (scRNA-seq) was employed to acquire the transcriptomic atlas of 57,530 individual pancreatic cells from primary PDAC tumors and control pancreases. The diverse malignant and stromal cell types, including two ductal subtypes with abnormal and malignant gene expression profiles respectively, were identified in PDAC.

 

The researchers found that the heterogenous malignant subtype was composed of several subpopulations with differential proliferative and migratory potentials. Cell trajectory analysis revealed that components of multiple tumor-related pathways and transcription factors (TFs) were differentially expressed along PDAC progression. Furthermore, it was found a subset of ductal cells with unique proliferative features were associated with an inactivation state in tumor-infiltrating T cells, providing novel markers for the prediction of antitumor immune response. Together, the findings provided a valuable resource for deciphering the intra-tumoral heterogeneity in PDAC and uncover a connection between tumor intrinsic transcriptional state and T cell activation, suggesting potential biomarkers for anticancer treatment such as targeted therapy and immunotherapy.

 

References:

 

https://www.ncbi.nlm.nih.gov/pubmed/31273297

 

https://www.ncbi.nlm.nih.gov/pubmed/21491194

 

https://www.ncbi.nlm.nih.gov/pubmed/27444064

 

https://www.ncbi.nlm.nih.gov/pubmed/28983043

 

https://www.ncbi.nlm.nih.gov/pubmed/24976721

 

https://www.ncbi.nlm.nih.gov/pubmed/27693023

 

Read Full Post »


First Cost-Effectiveness Study of Multi-Gene Panel Sequencing in Advanced Non-Small Cell Lung Cancer Shows Moderate Cost-Effectiveness, Exposes Crucial Practice Gap

WASHINGTON (June 27, 2019) — The results of the first economic modeling study to estimate the cost-effectiveness of “multi-gene panel sequencing” (MGPS) as compared to standard-of-care, single-gene tests for patients with advanced non-small cell lung cancer (aNSCLC) show that the MGPS tests are moderately cost-effective but could deliver more value if patients with test results identifying actionable genetic mutations consistently received genetically guided treatments. The results of the study, which was commissioned by the Personalized Medicine Coalition (PMC), underline the need to align clinical practices with an era of personalized medicine in which physicians can use diagnostic tests to identify specific biological markers that inform targeted prevention and treatment plans.

The study, which was published yesterday in JCO Clinical Cancer Informatics, analyzed the clinical and economic value of using MGPS testing to identify patients with tumors that over-express genetic mutations that could be targeted by available therapies designed to inhibit the function of those genes — a mainstay of modern care for aNSCLC patients. Using data provided by Flatiron Health, researchers examined clinical and cost information associated with the care of 5,688 patients with aNSCLC treated between 2011 – 2016, separating them into cohorts who received MGPS tests that assess at least 30 genetic mutations at once and those who received only “single-marker genetic testing” (SMGT) of less than 30 genes.

Compared to SMGT, the MGPS testing strategy, including downstream treatment and monitoring of disease, incurred costs equal to $148,478 for each year of life that it facilitated, a level suggesting that MGPS is moderately cost-effective compared to commonly cited thresholds in the U.S., which range from $50,000 to $200,000 per life year (LY) gained.

The authors of the study point out, however, that physicians only prescribed a targeted therapy to some of the patients whose MGPS test results revealed actionable mutations. MGPS tests can only improve downstream patient outcomes if actionable results are used to put the patient on a targeted treatment regimen that is more effective than the therapy they would otherwise have been prescribed. It is therefore impossible for the cost of an MGPS test to translate into additional LYs if actionable results do not result in the selection of a targeted treatment regimen.

Although MGPS testing revealed actionable mutations in 30.1 percent of the patients in the study cohort, only 21.4 percent of patients who underwent MGPS testing received a targeted treatment.

The study’s authors calculated that if all MGPS-tested patients with actionable mutations had received a targeted therapy, MGPS testing would deliver measurably better value ($110,000 per LY gained).

“This research underlines the importance of ensuring that clinical practices keep pace with scientific progress in personalized medicine so that we can maximize the benefits of diagnostic tests that can improve patient care and make the health system more efficient by ensuring that safe and effective targeted therapies are prescribed to those patients who will benefit,” said PMC President Edward Abrahams.

The study’s authors include Dr. Lotte Steuten, Vice President and Head of Consulting, The Office of Health Economics, London, U.K., and Affiliate Associate Faculty Member, Hutchinson Institute for Cancer Outcomes Research, Fred Hutchinson Cancer Research Center; Dr. Bernardo Goulart, Associate Faculty Member, Hutchinson Institute for Cancer Outcomes Research, Fred Hutchinson Cancer Research Center; Dr. Neal Meropol, Vice President, Research Oncology, Flatiron Health; Dr. Daryl Pritchard, Senior Vice President, Science Policy, Personalized Medicine Coalition; and Dr. Scott Ramsey, Director, Hutchinson Institute for Cancer Outcomes Research, Fred Hutchinson Cancer Research Center.

###

About the Personalized Medicine Coalition:

The Personalized Medicine Coalition, representing innovators, scientists, patients, providers and payers, promotes the understanding and adoption of personalized medicine concepts, services and products to benefit patients and the health system. For more information, please visit www.personalizedmedicinecoalition.org.

SOURCE

From: Personalized Medicine Coalition <pmc@personalizedmedicinecoalition.org>

Reply-To: “Christopher Wells (PMC)” <cwells@personalizedmedicinecoalition.org>

Date: Thursday, June 27, 2019 at 9:32 AM

To: Aviva Lev-Ari <AvivaLev-Ari@alum.berkeley.edu>

Subject: First Cost-Effectiveness Study of MGPS in aNSCLC Shows Moderate Cost-Effectiveness, Exposes Crucial Practice Gap

Read Full Post »


A Nonlinear Methodology to Explain Complexity of the Genome and Bioinformatic Information

Reporter: Stephen J. Williams, Ph.D.

Multifractal bioinformatics: A proposal to the nonlinear interpretation of genome

The following is an open access article by Pedro Moreno on a methodology to analyze genetic information across species and in particular, the evolutionary trends of complex genomes, by a nonlinear analytic approach utilizing fractal geometry, coined “Nonlinear Bioinformatics”.  This fractal approach stems from the complex nature of higher eukaryotic genomes including mosaicism, multiple interdispersed  genomic elements such as intronic regions, noncoding regions, and also mobile elements such as transposable elements.  Although seemingly random, there exists a repetitive nature of these elements. Such complexity of DNA regulation, structure and genomic variation is felt best understood by developing algorithms based on fractal analysis, which can best model the regionalized and repetitive variability and structure within complex genomes by elucidating the individual components which contributes to an overall complex structure rather than using a “linear” or “reductionist” approach looking at individual coding regions, which does not take into consideration the aforementioned factors leading to genetic complexity and diversity.

Indeed, many other attempts to describe the complexities of DNA as a fractal geometric pattern have been described.  In a paper by Carlo Cattani “Fractals and Hidden Symmetries in DNA“, Carlo uses fractal analysis to construct a simple geometric pattern of the influenza A virus by modeling the primary sequence of this viral DNA, namely the bases A,G,C, and T. The main conclusions that

fractal shapes and symmetries in DNA sequences and DNA walks have been shown and compared with random and deterministic complex series. DNA sequences are structured in such a way that there exists some fractal behavior which can be observed both on the correlation matrix and on the DNA walks. Wavelet analysis confirms by a symmetrical clustering of wavelet coefficients the existence of scale symmetries.

suggested that, at least, the viral influenza genome structure could be analyzed into its basic components by fractal geometry.
This approach has been used to model the complex nature of cancer as discussed in a 2011 Seminars in Oncology paper
Abstract: Cancer is a highly complex disease due to the disruption of tissue architecture. Thus, tissues, and not individual cells, are the proper level of observation for the study of carcinogenesis. This paradigm shift from a reductionist approach to a systems biology approach is long overdue. Indeed, cell phenotypes are emergent modes arising through collective non-linear interactions among different cellular and microenvironmental components, generally described by “phase space diagrams”, where stable states (attractors) are embedded into a landscape model. Within this framework, cell states and cell transitions are generally conceived as mainly specified by gene-regulatory networks. However, the system s dynamics is not reducible to the integrated functioning of the genome-proteome network alone; the epithelia-stroma interacting system must be taken into consideration in order to give a more comprehensive picture. Given that cell shape represents the spatial geometric configuration acquired as a result of the integrated set of cellular and environmental cues, we posit that fractal-shape parameters represent “omics descriptors of the epithelium-stroma system. Within this framework, function appears to follow form, and not the other way around.

As authors conclude

” Transitions from one phenotype to another are reminiscent of phase transitions observed in physical systems. The description of such transitions could be obtained by a set of morphological, quantitative parameters, like fractal measures. These parameters provide reliable information about system complexity. “

Gene expression also displays a fractal nature. In a Frontiers in Physiology paper by Mahboobeh Ghorbani, Edmond A. Jonckheere and Paul Bogdan* “Gene Expression Is Not Random: Scaling, Long-Range Cross-Dependence, and Fractal Characteristics of Gene Regulatory Networks“,

the authors describe that gene expression networks display time series display fractal and long-range dependence characteristics.

Abstract: Gene expression is a vital process through which cells react to the environment and express functional behavior. Understanding the dynamics of gene expression could prove crucial in unraveling the physical complexities involved in this process. Specifically, understanding the coherent complex structure of transcriptional dynamics is the goal of numerous computational studies aiming to study and finally control cellular processes. Here, we report the scaling properties of gene expression time series in Escherichia coliand Saccharomyces cerevisiae. Unlike previous studies, which report the fractal and long-range dependency of DNA structure, we investigate the individual gene expression dynamics as well as the cross-dependency between them in the context of gene regulatory network. Our results demonstrate that the gene expression time series display fractal and long-range dependence characteristics. In addition, the dynamics between genes and linked transcription factors in gene regulatory networks are also fractal and long-range cross-correlated. The cross-correlation exponents in gene regulatory networks are not unique. The distribution of the cross-correlation exponents of gene regulatory networks for several types of cells can be interpreted as a measure of the complexity of their functional behavior.

 

Given that multitude of complex biomolecular networks and biomolecules can be described by fractal patterns, the development of bioinformatic algorithms  would enhance our understanding of the interdependence and cross funcitonality of these mutiple biological networks, particularly in disease and drug resistance.  The article below by Pedro Moreno describes the development of such bioinformatic algorithms.

Pedro A. Moreno
Escuela de Ingeniería de Sistemas y Computación, Facultad de Ingeniería, Universidad del Valle, Cali, Colombia
E-mail: pedro.moreno@correounivalle.edu.co

Eje temático: Ingeniería de sistemas / System engineering
Recibido: 19 de septiembre de 2012
Aceptado: 16 de diciembre de 2013


 

 


Abstract

The first draft of the human genome (HG) sequence was published in 2001 by two competing consortia. Since then, several structural and functional characteristics for the HG organization have been revealed. Today, more than 2.000 HG have been sequenced and these findings are impacting strongly on the academy and public health. Despite all this, a major bottleneck, called the genome interpretation persists. That is, the lack of a theory that explains the complex puzzles of coding and non-coding features that compose the HG as a whole. Ten years after the HG sequenced, two recent studies, discussed in the multifractal formalism allow proposing a nonlinear theory that helps interpret the structural and functional variation of the genetic information of the genomes. The present review article discusses this new approach, called: “Multifractal bioinformatics”.

Keywords: Omics sciences, bioinformatics, human genome, multifractal analysis.


1. Introduction

Omic Sciences and Bioinformatics

In order to study the genomes, their life properties and the pathological consequences of impairment, the Human Genome Project (HGP) was created in 1990. Since then, about 500 Gpb (EMBL) represented in thousands of prokaryotic genomes and tens of different eukaryotic genomes have been sequenced (NCBI, 1000 Genomes, ENCODE). Today, Genomics is defined as the set of sciences and technologies dedicated to the comprehensive study of the structure, function and origin of genomes. Several types of genomic have arisen as a result of the expansion and implementation of genomics to the study of the Central Dogma of Molecular Biology (CDMB), Figure 1 (above). The catalog of different types of genomics uses the Latin suffix “-omic” meaning “set of” to mean the new massive approaches of the new omics sciences (Moreno et al, 2009). Given the large amount of genomic information available in the databases and the urgency of its actual interpretation, the balance has begun to lean heavily toward the requirements of bioinformatics infrastructure research laboratories Figure 1 (below).

The bioinformatics or Computational Biology is defined as the application of computer and information technology to the analysis of biological data (Mount, 2004). An interdisciplinary science that requires the use of computing, applied mathematics, statistics, computer science, artificial intelligence, biophysical information, biochemistry, genetics, and molecular biology. Bioinformatics was born from the need to understand the sequences of nucleotide or amino acid symbols that make up DNA and proteins, respectively. These analyzes are made possible by the development of powerful algorithms that predict and reveal an infinity of structural and functional features in genomic sequences, as gene location, discovery of homologies between macromolecules databases (Blast), algorithms for phylogenetic analysis, for the regulatory analysis or the prediction of protein folding, among others. This great development has created a multiplicity of approaches giving rise to new types of Bioinformatics, such as Multifractal Bioinformatics (MFB) that is proposed here.

1.1 Multifractal Bioinformatics and Theoretical Background

MFB is a proposal to analyze information content in genomes and their life properties in a non-linear way. This is part of a specialized sub-discipline called “nonlinear Bioinformatics”, which uses a number of related techniques for the study of nonlinearity (fractal geometry, Hurts exponents, power laws, wavelets, among others.) and applied to the study of biological problems (https://pharmaceuticalintelligence.com/tag/fractal-geometry/). For its application, we must take into account a detailed knowledge of the structure of the genome to be analyzed and an appropriate knowledge of the multifractal analysis.

1.2 From the Worm Genome toward Human Genome

To explore a complex genome such as the HG it is relevant to implement multifractal analysis (MFA) in a simpler genome in order to show its practical utility. For example, the genome of the small nematode Caenorhabditis elegans is an excellent model to learn many extrapolated lessons of complex organisms. Thus, if the MFA explains some of the structural properties in that genome it is expected that this same analysis reveals some similar properties in the HG.

The C. elegans nuclear genome is composed of about 100 Mbp, with six chromosomes distributed into five autosomes and one sex chromosome. The molecular structure of the genome is particularly homogeneous along with the chromosome sequences, due to the presence of several regular features, including large contents of genes and introns of similar sizes. The C. elegans genome has also a regional organization of the chromosomes, mainly because the majority of the repeated sequences are located in the chromosome arms, Figure 2 (left) (C. elegans Sequencing Consortium, 1998). Given these regular and irregular features, the MFA could be an appropriate approach to analyze such distributions.

Meanwhile, the HG sequencing revealed a surprising mosaicism in coding (genes) and noncoding (repetitive DNA) sequences, Figure 2 (right) (Venter et al., 2001). This structure of 6 Gbp is divided into 23 pairs of chromosomes (diploid cells) and these highly regionalized sequences introduce complex patterns of regularity and irregularity to understand the gene structure, the composition of sequences of repetitive DNA and its role in the study and application of life sciences. The coding regions of the genome are estimated at ~25,000 genes which constitute 1.4% of GH. These genes are involved in a giant sea of various types of non-coding sequences which compose 98.6% of HG (misnamed popularly as “junk DNA”). The non-coding regions are characterized by many types of repeated DNA sequences, where 10.6% consists of Alu sequences, a type of SINE (short and dispersed repeated elements) sequence and preferentially located towards the genes. LINES, MIR, MER, LTR, DNA transposons and introns are another type of non-coding sequences which form about 86% of the genome. Some of these sequences overlap with each other; as with CpG islands, which complicates the analysis of genomic landscape. This standard genomic landscape was recently clarified, the last studies show that 80.4% of HG is functional due to the discovery of more than five million “switches” that operate and regulate gene activity, re-evaluating the concept of “junk DNA”. (The ENCODE Project Consortium, 2012).

Given that all these genomic variations both in worm and human produce regionalized genomic landscapes it is proposed that Fractal Geometry (FG) would allow measuring how the genetic information content is fragmented. In this paper the methodology and the nonlinear descriptive models for each of these genomes will be reviewed.

1.3 The MFA and its Application to Genome Studies

Most problems in physics are implicitly non-linear in nature, generating phenomena such as chaos theory, a science that deals with certain types of (non-linear) but very sensitive dynamic systems to initial conditions, nonetheless of deterministic rigor, that is that their behavior can be completely determined by knowing initial conditions (Peitgen et al, 1992). In turn, the FG is an appropriate tool to study the chaotic dynamic systems (CDS). In other words, the FG and chaos are closely related because the space region toward which a chaotic orbit tends asymptotically has a fractal structure (strange attractors). Therefore, the FG allows studying the framework on which CDS are defined (Moon, 1992). And this is how it is expected for the genome structure and function to be organized.

The MFA is an extension of the FG and it is related to (Shannon) information theory, disciplines that have been very useful to study the information content over a sequence of symbols. Initially, Mandelbrot established the FG in the 80’s, as a geometry capable of measuring the irregularity of nature by calculating the fractal dimension (D), an exponent derived from a power law (Mandelbrot, 1982). The value of the D gives us a measure of the level of fragmentation or the information content for a complex phenomenon. That is because the D measures the scaling degree that the fragmented self-similarity of the system has. Thus, the FG looks for self-similar properties in structures and processes at different scales of resolution and these self-similarities are organized following scaling or power laws.

Sometimes, an exponent is not sufficient to characterize a complex phenomenon; so more exponents are required. The multifractal formalism allows this, and applies when many subgroups of fractals with different scalar properties with a large number of exponents or fractal dimensions coexist simultaneously. As a result, when a spectrum of multifractal singularity measurement is generated, the scaling behavior of the frequency of symbols of a sequence can be quantified (Vélez et al, 2010).

The MFA has been implemented to study the spatial heterogeneity of theoretical and experimental fractal patterns in different disciplines. In post-genomics times, the MFA was used to study multiple biological problems (Vélez et al, 2010). Nonetheless, very little attention has been given to the use of MFA to characterize the content of the structural genetic information of the genomes obtained from the images of the Chaos Representation Game (CRG). First studies at this level were made recently to the analysis of the C. elegans genome (Vélez et al, 2010) and human genomes (Moreno et al, 2011). The MFA methodology applied for the study of these genomes will be developed below.

2. Methodology

The Multifractal Formalism from the CGR

2.1 Data Acquisition and Molecular Parameters

Databases for the C. elegans and the 36.2 Hs_ refseq HG version were downloaded from the NCBI FTP server. Then, several strategies were designed to fragment the genomic DNA sequences of different length ranges. For example, the C. elegans genome was divided into 18 fragments, Figure 2 (left) and the human genome in 9,379 fragments. According to their annotation systems, the contents of molecular parameters of coding sequences (genes, exons and introns), noncoding sequences (repetitive DNA, Alu, LINES, MIR, MER, LTR, promoters, etc.) and coding/ non-coding DNA (TTAGGC, AAAAT, AAATT, TTTTC, TTTTT, CpG islands, etc.) are counted for each sequence.

2.2 Construction of the CGR 2.3 Fractal Measurement by the Box Counting Method

Subsequently, the CGR, a recursive algorithm (Jeffrey, 1990; Restrepo et al, 2009) is applied to each selected DNA sequence, Figure 3 (above, left) and from which an image is obtained, which is quantified by the box-counting algorithm. For example, in Figure 3 (above, left) a CGR image for a human DNA sequence of 80,000 bp in length is shown. Here, dark regions represent sub-quadrants with a high number of points (or nucleotides). Clear regions, sections with a low number of points. The calculation for the D for the Koch curve by the box-counting method is illustrated by a progression of changes in the grid size, and its Cartesian graph, Table 1

The CGR image for a given DNA sequence is quantified by a standard fractal analysis. A fractal is a fragmented geometric figure whose parts are an approximated copy at full scale, that is, the figure has self-similarity. The D is basically a scaling rule that the figure obeys. Generally, a power law is given by the following expression:

Where N(E) is the number of parts required for covering the figure when a scaling factor E is applied. The power law permits to calculate the fractal dimension as:

The D obtained by the box-counting algorithm covers the figure with disjoint boxes ɛ = 1/E and counts the number of boxes required. Figure 4 (above, left) shows the multifractal measure at momentum q=1.

2.4 Multifractal Measurement

When generalizing the box-counting algorithm for the multifractal case and according to the method of moments q, we obtain the equation (3) (Gutiérrez et al, 1998; Yu et al, 2001):

Where the Mi number of points falling in the i-th grid is determined and related to the total number Mand ɛ to box size. Thus, the MFA is used when multiple scaling rules are applied. Figure 4 (above, right) shows the calculation of the multifractal measures at different momentum q (partition function). Here, linear regressions must have a coefficient of determination equal or close to 1. From each linear regression D are obtained, which generate an spectrum of generalized fractal dimensions Dfor all q integers, Figure 4 (below, left). So, the multifractal spectrum is obtained as the limit:

The variation of the q integer allows emphasizing different regions and discriminating their fractal a high Dq is synonymous of the structure’s richness and the properties of these regions. Negative values emphasize the scarce regions; a high Dindicates a lot of structure and properties in these regions. In real world applications, the limit Dqreadily approximated from the data using a linear fitting: the transformation of the equation (3) yields:

Which shows that ln In(Mi )= for set q is a linear function in the ln(ɛ), Dq can therefore be evaluated as q the slope of a fixed relationship between In(Mi )= and (q-1) ln(ɛ). The methodologies and approaches for the method of box-counting and MFA are detailed in Moreno et al, 2000, Yu et al, 2001; Moreno, 2005. For a rigorous mathematical development of MFA from images consult Multifractal system, wikipedia.

2.5 Measurement of Information Content

Subsequently, from the spectrum of generalized dimensions Dq, the degree of multifractality ΔDq(MD) is calculated as the difference between the maximum and minimum values of : ΔD qq Dqmax – Dqmin (Ivanov et al, 1999). When qmaxqmin ΔDis high, the multifractal spectrum is rich in information and highly aperiodic, when ΔDq is small, the resulting dimension spectrum is poor in information and highly periodic. It is expected then, that the aperiodicity in the genome would be related to highly polymorphic genomic aperiodic structures and those periodic regions with highly repetitive and not very polymorphic genomic structures. The correlation exponent t(q) = (– 1)DqFigure 4 (below, right ) can also be obtained from the multifractal dimension Dq. The generalized dimension also provides significant specific information. D(q = 0) is equal to the Capacity dimension, which in this analysis is the size of the “box count”. D(q = 1) is equal to the Information dimension and D(q = 2) to the Correlation dimension. Based on these multifractal parameters, many of the structural genomic properties can be quantified, related, and interpreted.

2.6 Multifractal Parameters and Statistical and Discrimination Analyses

Once the multifractal parameters are calculated (D= (-20, 20), ΔDq, πq, etc.), correlations with the molecular parameters are sought. These relations are established by plotting the number of genome molecular parameters versus MD by discriminant analysis with Cartesian graphs in 2-D, Figure 5 (below, left) and 3-D and combining multifractal and molecular parameters. Finally, simple linear regression analysis, multivariate analysis, and analyses by ranges and clusterings are made to establish statistical significance.

3 Results and Discussion

3.1 Non-linear Descriptive Model for the C. elegans Genome

When analyzing the C. elegans genome with the multifractal formalism it revealed what symmetry and asymmetry on the genome nucleotide composition suggested. Thus, the multifractal scaling of the C. elegans genome is of interest because it indicates that the molecular structure of the chromosome may be organized as a system operating far from equilibrium following nonlinear laws (Ivanov et al, 1999; Burgos and Moreno-Tovar, 1996). This can be discussed from two points of view:

1) When comparing C. elegans chromosomes with each other, the X chromosome showed the lowest multifractality, Figure 5 (above). This means that the X chromosome is operating close to equilibrium, which results in an increased genetic instability. Thus, the instability of the X could selectively contribute to the molecular mechanism that determines sex (XX or X0) during meiosis. Thus, the X chromosome would be operating closer to equilibrium in order to maintain their particular sexual dimorphism.

2) When comparing different chromosome regions of the C. elegans genome, changes in multifractality were found in relation to the regional organization (at the center and arms) exhibited by the chromosomes, Figure 5 (below, left). These behaviors are associated with changes in the content of repetitive DNA, Figure 5 (below, right). The results indicated that the chromosome arms are even more complex than previously anticipated. Thus, TTAGGC telomere sequences would be operating far from equilibrium to protect the genetic information encoded by the entire chromosome.

All these biological arguments may explain why C. elegans genome is organized in a nonlinear way. These findings provide insight to quantify and understand the organization of the non-linear structure of the C. elegans genome, which may be extended to other genomes, including the HG (Vélez et al, 2010).

3.2 Nonlinear Descriptive Model for the Human Genome

Once the multifractal approach was validated in C. elegans genome, HG was analyzed exhaustively. This allowed us to propose a nonlinear model for the HG structure which will be discussed under three points of view.

1) It was found that the HG high multifractality depends strongly on the contents of Alu sequences and to a lesser extent on the content of CpG islands. These contents would be located primarily in highly aperiodic regions, thus taking the chromosome far from equilibrium and giving to it greater genetic stability, protection and attraction of mutations, Figure 6 (A-C). Thus, hundreds of regions in the HG may have high genetic stability and the most important genetic information of the HG, the genes, would be safeguarded from environmental fluctuations. Other repeated elements (LINES, MIR, MER, LTRs) showed no significant relationship,

Figure 6 (D). Consequently, the human multifractal map developed in Moreno et al, 2011 constitutes a good tool to identify those regions rich in genetic information and genomic stability. 2) The multifractal context seems to be a significant requirement for the structural and functional organization of thousands of genes and gene families. Thus, a high multifractal context (aperiodic) appears to be a “genomic attractor” for many genes (KOGs, KEEGs), Figure 6 (E) and some gene families, Figure 6 (F) are involved in genetic and deterministic processes, in order to maintain a deterministic regulation control in the genome, although most of HG sequences may be subject to a complex epigenetic control.

3) The classification of human chromosomes and chromosome regions analysis may have some medical implications (Moreno et al, 2002; Moreno et al, 2009). This means that the structure of low nonlinearity exhibited by some chromosomes (or chromosome regions) involve an environmental predisposition, as potential targets to undergo structural or numerical chromosomal alterations in Figure 6 (G). Additionally, sex chromosomes should have low multifractality to maintain sexual dimorphism and probably the X chromosome inactivation.

All these fractals and biological arguments could explain why Alu elements are shaping the HG in a nonlinearly manner (Moreno et al, 2011). Finally, the multifractal modeling of the HG serves as theoretical framework to examine new discoveries made by the ENCODE project and new approaches about human epigenomes. That is, the non-linear organization of HG might help to explain why it is expected that most of the GH is functional.

4. Conclusions

All these results show that the multifractal formalism is appropriate to quantify and evaluate genetic information contents in genomes and to relate it with the known molecular anatomy of the genome and some of the expected properties. Thus, the MFB allows interpreting in a logic manner the structural nature and variation of the genome.

The MFB allows understanding why a number of chromosomal diseases are likely to occur in the genome, thus opening a new perspective toward personalized medicine to study and interpret the GH and its diseases.

The entire genome contains nonlinear information organizing it and supposedly making it function, concluding that virtually 100% of HG is functional. Bioinformatics in general, is enriched with a novel approach (MFB) making it possible to quantify the genetic information content of any DNA sequence and their practical applications to different disciplines in biology, medicine and agriculture. This novel breakthrough in computational genomic analysis and diseases contributes to define Biology as a “hard” science.

MFB opens a door to develop a research program towards the establishment of an integrative discipline that contributes to “break” the code of human life. (http://pharmaceuticalintelligence. com/page/3/).

5. Acknowledgements

Thanks to the directives of the EISC, the Universidad del Valle and the School of Engineering for offering an academic, scientific and administrative space for conducting this research. Likewise, thanks to co authors (professors and students) who participated in the implementation of excerpts from some of the works cited here. Finally, thanks to Colciencias by the biotechnology project grant # 1103-12-16765.


6. References

Blanco, S., & Moreno, P.A. (2007). Representación del juego del caos para el análisis de secuencias de ADN y proteínas mediante el análisis multifractal (método “box-counting”). In The Second International Seminar on Genomics and Proteomics, Bioinformatics and Systems Biology (pp. 17-25). Popayán, Colombia.         [ Links ]

Burgos, J.D., & Moreno-Tovar, P. (1996). Zipf scaling behavior in the immune system. BioSystem , 39, 227-232.         [ Links ]

C. elegans Sequencing Consortium. (1998). Genome sequence of the nematode C. elegans: a platform for investigating biology. Science , 282, 2012-2018.         [ Links ]

Gutiérrez, J.M., Iglesias A., Rodríguez, M.A., Burgos, J.D., & Moreno, P.A. (1998). Analyzing the multifractals structure of DNA nucleotide sequences. In, M. Barbie & S. Chillemi (Eds.) Chaos and Noise in Biology and Medicine (cap. 4). Hackensack (NJ): World Scientific Publishing Co.         [ Links ]

Ivanov, P.Ch., Nunes, L.A., Golberger, A.L., Havlin, S., Rosenblum, M.G., Struzikk, Z.R., & Stanley, H.E. (1999). Multifractality in human heartbeat dynamics. Nature , 399, 461-465.         [ Links ]

Jeffrey, H.J. (1990). Chaos game representation of gene structure. Nucleic Acids Research , 18, 2163-2175.         [ Links ]

Mandelbrot, B. (1982). La geometría fractal de la naturaleza. Barcelona. España: Tusquets editores.         [ Links ]

Moon, F.C. (1992). Chaotic and fractal dynamics. New York: John Wiley.         [ Links ]

Moreno, P.A. (2005). Large scale and small scale bioinformatics studies on the Caenorhabditis elegans enome. Doctoral thesis. Department of Biology and Biochemistry, University of Houston, Houston, USA.         [ Links ]

Moreno, P.A., Burgos, J.D., Vélez, P.E., Gutiérrez, J.M., & et al., (2000). Multifractal analysis of complete genomes. In P roceedings of the 12th International Genome Sequencing and Analysis Conference (pp. 80-81). Miami Beach (FL).         [ Links ]

Moreno, P.A., Rodríguez, J.G., Vélez, P.E., Cubillos, J.R., & Del Portillo, P. (2002). La genómica aplicada en salud humana. Colombia Ciencia y Tecnología. Colciencias , 20, 14-21.         [ Links ]

Moreno, P.A., Vélez, P.E., & Burgos, J.D. (2009). Biología molecular, genómica y post-genómica. Pioneros, principios y tecnologías. Popayán, Colombia: Editorial Universidad del Cauca.         [ Links ]

Moreno, P.A., Vélez, P.E., Martínez, E., Garreta, L., Díaz, D., Amador, S., Gutiérrez, J.M., et. al. (2011). The human genome: a multifractal analysis. BMC Genomics , 12, 506.         [ Links ]

Mount, D.W. (2004). Bioinformatics. Sequence and ge nome analysis. New York: Cold Spring Harbor Laboratory Press.         [ Links ]

Peitgen, H.O., Jürgen, H., & Saupe D. (1992). Chaos and Fractals. New Frontiers of Science. New York: Springer-Verlag.         [ Links ]

Restrepo, S., Pinzón, A., Rodríguez, L.M., Sierra, R., Grajales, A., Bernal, A., Barreto, E. et. al. (2009). Computational biology in Colombia. PLoS Computational Biology, 5 (10), e1000535.         [ Links ]

The ENCODE Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature , 489, 57-74.         [ Links ]

Vélez, P.E., Garreta, L.E., Martínez, E., Díaz, N., Amador, S., Gutiérrez, J.M., Tischer, I., & Moreno, P.A. (2010). The Caenorhabditis elegans genome: a multifractal analysis. Genet and Mol Res , 9, 949-965.         [ Links ]

Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., & et al. (2001). The sequence of the human genome. Science , 291, 1304-1351.         [ Links ]

Yu, Z.G., Anh, V., & Lau, K.S. (2001). Measure representation and multifractal analysis of complete genomes. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics , 64, 031903.         [ Links ]

 

Other articles on Bioinformatics on this Open Access Journal include:

Bioinformatics Tool Review: Genome Variant Analysis Tools

2017 Agenda – BioInformatics: Track 6: BioIT World Conference & Expo ’17, May 23-35, 2017, Seaport World Trade Center, Boston, MA

Better bioinformatics

Broad Institute, Google Genomics combine bioinformatics and computing expertise

Autophagy-Modulating Proteins and Small Molecules Candidate Targets for Cancer Therapy: Commentary of Bioinformatics Approaches

CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics

Read Full Post »


Lesson 10 on Cancer, Oncogenes, and Aberrant Cell Signal Termination in Disease for #TUBiol3373

Curator: Stephen J. Williams

Please click on the following file to get the Powerpoint Presentation for this lecture

cell signaling 10 lesson_SJW 2019

There is a good reference to read on The Hallmarks of Cancer published first in 2000 and then updated with 2 new hallmarks in 2011 (namely the ability of cancer cells to reprogram their metabolism and 2. the ability of cancer cells to evade the immune system)

a link to the PDF is given here:

hallmarks2000

hallmarks2011

Please also go to other articles on this site which are relevant to this lecture.  You can use the search box in the upper right hand corner of the Home Page or these are few links you might find interesting

Development of Chemoresistance to Targeted Therapies: Alterations of Cell Signaling & the Kinome

Proteomics, Metabolomics, Signaling Pathways, and Cell Regulation: a Compilation of Articles in the Journal http://pharmaceuticalintelligence.com

Feeling the Heat – the Link between Inflammation and Cancer

Lesson 4 Cell Signaling And Motility: G Proteins, Signal Transduction: Curations and Articles of reference as supplemental information: #TUBiol3373

Immunotherapy Resistance Rears Its Ugly Head: PD-1 Resistant Metastatic Melanoma and More

Novel Mechanisms of Resistance to Novel Agents

 

Read Full Post »

Older Posts »