Advertisements
Feeds:
Posts
Comments

Archive for the ‘BioIT: BioInformatics, NGS, Clinical & Translational, Pharmaceutical R&D Informatics, Clinical Genomics, Cancer Informatics’ Category


eProceedings for BIO 2019 International Convention, June 3-6, 2019 Philadelphia Convention Center; Philadelphia PA, Real Time Coverage by Stephen J. Williams, PhD @StephenJWillia2

 

CONFERENCE OVERVIEW

Real Time Coverage of BIO 2019 International Convention, June 3-6, 2019 Philadelphia Convention Center; Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/05/31/real-time-coverage-of-bio-international-convention-june-3-6-2019-philadelphia-convention-center-philadelphia-pa/

 

LECTURES & PANELS

Real Time Coverage @BIOConvention #BIO2019: Machine Learning and Artificial Intelligence: Realizing Precision Medicine One Patient at a Time, 6/5/2019, Philadelphia PA

Reporter: Stephen J Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-machine-learning-and-artificial-intelligence-realizing-precision-medicine-one-patient-at-a-time/

 

Real Time Coverage @BIOConvention #BIO2019: Genome Editing and Regulatory Harmonization: Progress and Challenges, 6/5/2019. Philadelphia PA

Reporter: Stephen J Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-genome-editing-and-regulatory-harmonization-progress-and-challenges/

 

Real Time Coverage @BIOConvention #BIO2019: Precision Medicine Beyond Oncology June 5, 2019, Philadelphia PA

Reporter: Stephen J Williams PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-precision-medicine-beyond-oncology-june-5-philadelphia-pa/

 

Real Time @BIOConvention #BIO2019:#Bitcoin Your Data! From Trusted Pharma Silos to Trustless Community-Owned Blockchain-Based Precision Medicine Data Trials, 6/5/2019, Philadelphia PA

Reporter: Stephen J Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-bioconvention-bio2019bitcoin-your-data-from-trusted-pharma-silos-to-trustless-community-owned-blockchain-based-precision-medicine-data-trials/

 

Real Time Coverage @BIOConvention #BIO2019: Keynote Address Jamie Dimon CEO @jpmorgan June 5, 2019, Philadelphia, PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-keynote-address-jamie-dimon-ceo-jpmorgan-june-5-philadelphia/

 

Real Time Coverage @BIOConvention #BIO2019: Chat with @FDA Commissioner, & Challenges in Biotech & Gene Therapy June 4, 2019, Philadelphia, PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-chat-with-fda-commissioner-challenges-in-biotech-gene-therapy-june-4-philadelphia/

 

Falling in Love with Science: Championing Science for Everyone, Everywhere June 4 2019, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-falling-in-love-with-science-championing-science-for-everyone-everywhere/

 

Real Time Coverage @BIOConvention #BIO2019: June 4 Morning Sessions; Global Biotech Investment & Public-Private Partnerships, 6/4/2019, Philadelphia PA

Reporter: Stephen J Williams PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-june-4-morning-sessions-global-biotech-investment-public-private-partnerships/

 

Real Time Coverage @BIOConvention #BIO2019: Understanding the Voices of Patients: Unique Perspectives on Healthcare; June 4, 2019, 11:00 AM, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-understanding-the-voices-of-patients-unique-perspectives-on-healthcare-june-4/

 

Real Time Coverage @BIOConvention #BIO2019: Keynote: Siddhartha Mukherjee, Oncologist and Pulitzer Author; June 4 2019, 9AM, Philadelphia PA

Reporter: Stephen J. Williams, PhD. @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-keynote-siddhartha-mukherjee-oncologist-and-pulitzer-author-june-4-9am-philadelphia-pa/

 

Real Time Coverage @BIOConvention #BIO2019:  Issues of Risk and Reproduceability in Translational and Academic Collaboration; 2:30-4:00 June 3, 2019, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-issues-of-risk-and-reproduceability-in-translational-and-academic-collaboration-230-400-june-3-philadelphia-pareal-time-coverage-bioconvention-bi/

 

Real Time Coverage @BIOConvention #BIO2019: What’s Next: The Landscape of Innovation in 2019 and Beyond. 3-4 PM June 3, 2019, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-whats-next-the-landscape-of-innovation-in-2019-and-beyond-3-4-pm-june-3-philadelphia-pa/

 

Real Time Coverage @BIOConvention #BIO2019: After Trump’s Drug Pricing Blueprint: What Happens Next? A View from Washington; June 3, 2019 1:00 PM, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-after-trumps-drug-pricing-blueprint-what-happens-next-a-view-from-washington-june-3-2019-100-pm-philadelphia-pa/

 

Real Time Coverage @BIOConvention #BIO2019: International Cancer Clusters Showcase June 3, 2019, Philadelphia PA

Reporter: Stephen J. Williams PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-international-cancer-clusters-showcase-june-3-philadelphia-pa/

Advertisements

Read Full Post »


Real Time Coverage @BIOConvention #BIO2019: Machine Learning and Artificial Intelligence: Realizing Precision Medicine One Patient at a Time

Reporter: Stephen J Williams, PhD @StephenJWillia2

The impact of Machine Learning (ML) and Artificial Intelligence (AI) during the last decade has been tremendous. With the rise of infobesity, ML/AI is evolving to an essential capability to help mine the sheer volume of patient genomics, omics, sensor/wearables and real-world data, and unravel the knot of healthcare’s most complex questions.

Despite the advancements in technology, organizations struggle to prioritize and implement ML/AI to achieve the anticipated value, whilst managing the disruption that comes with it. In this session, panelists will discuss ML/AI implementation and adoption strategies that work. Panelists will draw upon their experiences as they share their success stories, discuss how to implement digital diagnostics, track disease progression and treatment, and increase commercial value and ROI compared against traditional approaches.

  • most of trials which are done are still in training AI/ML algorithms with training data sets.  The best results however have been about 80% accuracy in training sets.  Needs to improve
  • All data sets can be biased.  For example a professor was looking at heartrate using a IR detector on a wearable but it wound up that different types of skin would generate a different signal to the detector so training sets maybe population biases (you are getting data from one group)
  • clinical grade equipment actually haven’t been trained on a large set like commercial versions of wearables, Commercial grade is tested on a larger study population.  This can affect the AI/ML algorithms.
  • Regulations:  The regulatory bodies responsible is up to debate.  Whether FDA or FTC is responsible for AI/ML in healtcare and healthcare tech and IT is not fully decided yet.  We don’t have the guidances for these new technologies
  • some rules: never use your own encryption always use industry standards especially when getting personal data from wearables.  One hospital corrupted their system because their computer system was not up to date and could not protect against a virus transmitted by a wearable.
  • pharma companies understand they need to increase value of their products so very interested in how AI/ML can be used.

Please follow LIVE on TWITTER using the following @ handles and # hashtags:

@Handles

@pharma_BI

@AVIVA1950

@BIOConvention

# Hashtags

#BIO2019 (official meeting hashtag)

Read Full Post »


Simulation Tools of Genomic Next Generation Sequencing Data: Comparative Analysis & Genetic Simulation Resources

Reporting: Aviva Lev-Ari, PhD, RN

 

INTRODUCTION

What is next generation sequencing?

Behjati S, Tarpey PS.

Arch Dis Child Educ Pract Ed. 2013 Dec;98(6):236-8. doi: 10.1136/archdischild-2013-304340. Epub 2013 Aug 28. Review.

Computational pan-genomics: status, promises and challenges.

Computational Pan-Genomics Consortium.

Brief Bioinform. 2018 Jan 1;19(1):118-135. doi: 10.1093/bib/bbw089. Review.

Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.

Dahlö M, Scofield DG, Schaal W, Spjuth O.

Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy028.

NGS IN THE CLINIC

[Clinical Applications of Next-Generation Sequencing].

Rebollar-Vega RG, Arriaga-Canon C, de la Rosa-Velázquez IA.

Rev Invest Clin. 2018;70(4):153-157. doi: 10.24875/RIC.18002544.

PMID:
30067721

Free Article

 

Clinical Genomics: Challenges and Opportunities.

Vijay P, McIntyre AB, Mason CE, Greenfield JP, Li S.

Crit Rev Eukaryot Gene Expr. 2016;26(2):97-113. doi: 10.1615/CritRevEukaryotGeneExpr.2016015724. Review.

Next-generation sequencing in the clinic: promises and challenges.

Xuan J, Yu Y, Qing T, Guo L, Shi L.

Cancer Lett. 2013 Nov 1;340(2):284-95. doi: 10.1016/j.canlet.2012.11.025. Epub 2012 Nov 19. Review.

The Future of Whole-Genome Sequencing for Public Health and the Clinic.

Allard MW.

J Clin Microbiol. 2016 Aug;54(8):1946-8. doi: 10.1128/JCM.01082-16. Epub 2016 Jun 15.

PMID:
27307454

Free PMC Article

 

Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists.

Roy S, Coldren C, Karunamurthy A, Kip NS, Klee EW, Lincoln SE, Leon A, Pullambhatla M, Temple-Smolkin RL, Voelkerding KV, Wang C, Carter AB.

J Mol Diagn. 2018 Jan;20(1):4-27. doi: 10.1016/j.jmoldx.2017.11.003. Epub 2017 Nov 21. Review.

PMID:
29154853

MUTATION ANALYSIS – GENE ENCODING

Next-Generation Sequencing and Mutational Analysis: Implications for Genes Encoding LINC Complex Proteins.

Nagy PL, Worman HJ.

Methods Mol Biol. 2018;1840:321-336. doi: 10.1007/978-1-4939-8691-0_22.

PMID:
30141054

Genome-wide genetic marker discovery and genotyping using next-generation sequencing.

Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML.

Nat Rev Genet. 2011 Jun 17;12(7):499-510. doi: 10.1038/nrg3012. Review.

PMID:
21681211

 

Best practices for evaluating mutation prediction methods.

Rogan PK, Zou GY.

Hum Mutat. 2013 Nov;34(11):1581-2. doi: 10.1002/humu.22401. Epub 2013 Sep 10. No abstract available.

PMID:
23955774

MITOCHONDRIAL VATIATIONS

mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial variations from next-generation sequencing datasets.

Vellarikkal SK, Dhiman H, Joshi K, Hasija Y, Sivasubbu S, Scaria V.

Hum Mutat. 2015 Apr;36(4):419-24. doi: 10.1002/humu.22767.

PMID:
25677119

VARIANT ANALYSIS

A survey of tools for variant analysis of next-generation genome sequencing data.

Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z.

Brief Bioinform. 2014 Mar;15(2):256-78. doi: 10.1093/bib/bbs086. Epub 2013 Jan 21.

PMID:
23341494

Free PMC Article

 

Variant callers for next-generation sequencing data: a comparison study.

Liu X, Han S, Wang Z, Gelernter J, Yang BZ.

PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.

VARIANT DETECTION IN HEREDITARY CANCER GENES

ICO amplicon NGS data analysis: a Web tool for variant detection in common high-risk hereditary cancer genes analyzed by amplicon GS Junior next-generation sequencing.

Lopez-Doriga A, Feliubadaló L, Menéndez M, Lopez-Doriga S, Morón-Duran FD, del Valle J, Tornero E, Montes E, Cuesta R, Campos O, Gómez C, Pineda M, González S, Moreno V, Capellá G, Lázaro C.

Hum Mutat. 2014 Mar;35(3):271-7.

PMID:
24227591

 

Development and analytical validation of a 25-gene next generation sequencing panel that includes the BRCA1 and BRCA2 genes to assess hereditary cancer risk.

Judkins T, Leclair B, Bowles K, Gutin N, Trost J, McCulloch J, Bhatnagar S, Murray A, Craft J, Wardell B, Bastian M, Mitchell J, Chen J, Tran T, Williams D, Potter J, Jammulapati S, Perry M, Morris B, Roa B, Timms K.

BMC Cancer. 2015 Apr 2;15:215. doi: 10.1186/s12885-015-1224-y.

Clinical Applications of Next-Generation Sequencing in Cancer Diagnosis.

Sabour L, Sabour M, Ghorbian S.

Pathol Oncol Res. 2017 Apr;23(2):225-234. doi: 10.1007/s12253-016-0124-z. Epub 2016 Oct 8. Review.

PMID:
27722982

 

Studying cancer genomics through next-generation DNA sequencing and bioinformatics.

Doyle MA, Li J, Doig K, Fellowes A, Wong SQ.

Methods Mol Biol. 2014;1168:83-98. doi: 10.1007/978-1-4939-0847-9_6. Review.

PMID:
24870132

IMMUNOINFORMATICS

Immunoinformatics and epitope prediction in the age of genomic medicine.

Backert L, Kohlbacher O.

Genome Med. 2015 Nov 20;7:119. doi: 10.1186/s13073-015-0245-0. Review.

IgSimulator: a versatile immunosequencing simulator.

Safonova Y, Lapidus A, Lill J.

Bioinformatics. 2015 Oct 1;31(19):3213-5. doi: 10.1093/bioinformatics/btv326. Epub 2015 May 25.

PMID:
26007226

 

Computational genomics tools for dissecting tumour-immune cell interactions.

Hackl H, Charoentong P, Finotello F, Trajanoski Z.

Nat Rev Genet. 2016 Jul 4;17(8):441-58. doi: 10.1038/nrg.2016.67. Review.

PMID:
27376489

RNA SEQUENCING

SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines.

Audoux J, Salson M, Grosset CF, Beaumeunier S, Holder JM, Commes T, Philippe N.

BMC Bioinformatics. 2017 Sep 29;18(1):428. doi: 10.1186/s12859-017-1831-5.

PMID:
28969586

Free PMC Article

COMPLEX INSERTIONS AND DELETIONS

INDELseek: detection of complex insertions and deletions from next-generation sequencing data.

Au CH, Leung AY, Kwong A, Chan TL, Ma ES.

BMC Genomics. 2017 Jan 5;18(1):16. doi: 10.1186/s12864-016-3449-9.

PMID:
28056804

Free PMC Article

EVOLUTIONARY BIOLOGY

The State of Software for Evolutionary Biology.

Darriba D, Flouri T, Stamatakis A.

Mol Biol Evol. 2018 May 1;35(5):1037-1046. doi: 10.1093/molbev/msy014. Review.

SIMULATION PROGRAMS

PMCID: PMC5224698
EMSID: EMS70941
PMID: 27320129

Systematic review of next-generation sequencing simulators: computational tools, features and perspectives.

Zhao M, Liu D, Qu H.

Brief Funct Genomics. 2017 May 1;16(3):121-128. doi: 10.1093/bfgp/elw012. Review.

PMID:
27069250

 

A comparison of tools for the simulation of genomic next-generation sequencing data

Online Summary

  1. There is a large number of tools for the simulation of genomic data for all currently available NGS platforms, with partially overlapped functionality. Here we review 23 of these tools, highlighting their distinct functionalities, requirements and potential applications.

  2. The parameterization of these simulators is often complex. The user may decide between using existing sets of parameters values called profiles or re-estimating them from its own data.

  3. Parameters than can be modulated in these simulations include the effects of the PCR amplification of the libraries, read features and quality scores, base call errors, variation of sequencing depth across the genomes and the introduction of genomic variants.

  4. Several types of genomic variants can be introduced in the simulated reads, such as SNPs, indels, inversions, translocations, copy-number variants and short-tandem repeats.

  5. Reads can be generated from single or multiple genomes, and with distinct ploidy levels. NGS data from metagenomic communities can be simulated given an “abundance profile” that reflects the proportion of taxa in a given sample.

  6. Many of the simulators have not been formally described and/or tested in dedicated publications. We encourage the formal publication of these tools and the realization of comprehensive, comparative benchmarkings.

  7. Choosing among the different genomic NGS simulators is not easy. Here we provide a guidance tree to help users choosing a suitable tool for their specific interests.

Abstract

Computer simulation of genomic data has become increasingly popular for assessing and validating biological models or to gain understanding about specific datasets. Multiple computational tools for the simulation of next-generation sequencing (NGS) data have been developed in recent years, which could be used to compare existing and new NGS analytical pipelines. Here we review 23 of these tools, highlighting their distinct functionality, requirements and potential applications. We also provide a decision tree for the informed selection of an appropriate NGS simulation tool for the specific question at hand.

Image source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224698/

An overview of current NGS technologies

The most popular NGS technologies on the market are Illumina’s sequencing by synthesis, which is probably the most widely used platform at present, Roche’s 454 pyrosequencing (454), SOLiD sequencing-by-ligation (SOLiD), IonTorrent semiconductor sequencing (IonTorrent), Pacific Biosciences’s (PacBio) single molecule real-time sequencing, and Oxford Nanopore Technologies (Nanopore) single-cell DNA template strand sequencing. These strategies can differ, for example, regarding the type of reads they produce or the kind of sequencing errors they introduce (Table 1). Only two of the current technologies (Illumina and SOLiD) are capable of producing all three sequencing read types —single endpaired end and mate pair. Read length is also dependent on the machine and the kit used; in platforms like Illumina, SOLiD, or IonTorrent it is possible to specify the number of desired base pairs per read. According to the sequencing run type selected it is possible to obtain reads with maximum lengths of 75 bp (SOLiD), 300 bp (Illumina) or 400bp (IonTorrent). On the other hand, in platforms like 454, Nanopore or PacBio, information is only given about the mean and maximum read length that can be obtained, with average lengths of 700 bp, 10 kb and 15 kb and maximum lengths of 1 kb, 10 kb and 15 kb, respectively. Error rates vary depending on the platform from <=1% in Illumina to ~30% in Nanopore. Further overviews and comparisons of NGS strategies can be found in ,.

Table 1

Main characteristics of current NGS technologies.
Technology Run Type Maximum Read Length Quality Scores Error Rates References
Single-read Paired-end Mate-pair
Illumina X X X 300 bp > Q30 0.0034 – 1%
SOLiD X X X 75 bp > Q30 0.01 – 1%
IonTorrent X X 400 bp ~ Q20 1.78%
454 X X ~700 bp (up to 1 Kb) > Q20 1.07 – 1.7% ,
Nanopore X 5.4 – 10 Kb NAY 10 – 40%
PacBio X ~15 Kb (up to 40 Kb) < Q10 5 – 10% ,

Simulation parameters

The existing sequencing platforms use distinct protocols that result in datasets with different characteristics. Many of these attributes can be taken into account by the simulators (Fig. 2), although there is not a single tool that incorporates all possible variations. The main characteristics of the 23 simulators considered here are summarized in Tables 2 and and3.3. These tools differ in multiple aspects, such as sequencing technology, input requirements or output format, but maintain several common aspects. With some exceptions, all programs need a reference sequence, multiple parameter values indicating the characteristics of the sequencing experiment to be simulated (read length, error distribution, type of variation to be generated, if any, etc.) and/or a profile (a set of parameter values, conditions and/or data used for controlling the simulation), which can be provided by the simulator or estimated de novo from empirical data. The outcome will be aligned or unaligned reads in different standard file formats, such as FASTQ, FASTA or BAM. An overview of the NGS data simulation process is represented in Fig. 3. In the following sections we delve into the different steps involved.

An external file that holds a picture, illustration, etc. Object name is emss-70941-f002.jpg

General overview of the sequencing process and steps that can be parameterized in the simulations.

NGS simulators try to imitate the real sequencing process as closely as possible by considering all the steps that could influence the characteristics of the reads. a | NGS simulators do not take into account the effect of the different DNA extraction protocols in the resulting data. However, they can consider whether the sample we want to sequence includes one or more individuals, from the same or different organisms (e.g., pool-sequencing, metagenomics). Pools of related genomes can be simulated by replicating the reference sequence and introducing variants on the resulting genomes. Some tools can also simulate metagenomes with distinct taxa abundance. b | Simulators can try to mimic the length range of DNA fragmentation (empirically obtained by sonication or digestion protocols) or assume a fixed amplicon length. c | Library preparation involves ligating sequencing–platform dependent adaptors and/or barcodes to the selected DNA fragments (inserts). Some simulators can control the insert size, and produce reads with adaptors/barcodes. d | | Most NGS techniques include an amplification step for the preparation of libraries. Several simulators can take this step into account (for example, by introducing errors and/or chimaeras), with the possibility of specifying the number of reads per amplicons. e | Sequencing runs imply a decision about coverage, read length, read type (single-end, paired-end, mate-pair) and a given platform (with their specific errors and biases). Simulators exist for the different platforms, and they can use particular parameter profiles, often estimated from real data.

An external file that holds a picture, illustration, etc. Object name is emss-70941-f003.jpg

General overview of NGS simulation.

The simulation process begins with the input of a reference sequence (most cases) and simulation parameters. Some of the parameters can be given via a profile, that is estimated (by the simulator or other tools) from other reads or alignments. The outcome of this process may be reads (with or without quality information) or genome alignments in different formats.

CONCLUSIONS

NGS is having a big impact in a broad range of areas that benefit from genetic information, from medical genomics, phylogenetic and population genomics, to the reconstruction of ancient genomes, epigenomics and environmental barcoding. These applications include approaches such as de novo sequencing, resequencing, target sequencing or genome reduction methods. In all cases, caution is necessary in choosing a proper sequencing design and/or a reliable analytical approach for the specific biological question of interest. The simulation of NGS data can be extremely useful for planning experiments, testing hypotheses, benchmarking tools and evaluating particular results. Given a reference genome or dataset, for instance, one can play with an array of sequencing technologies to choose the best-suited technology and parameters for the particular goal, possibly optimizing time and costs. Yet, this is still not the standard practice and researchers often base their choices on practical considerations like technology and money availability. As shown throughout this Review, simulation of NGS data from known genomes or transcriptomes can be extremely useful when evaluating assembly, mapping, phasing or genotyping algorithms e.g. ,,,, exposing their advantages and drawbacks under different circumstances.

Altogether, current NGS simulators consider most, if not all, of the important features regarding the generation of NGS data. However, they are not problem-free. The different simulators are largely redundant, implementing the same or very similar procedures. In our opinion, many are poorly documented and can be difficult to use for non-experts, and some of them are no longer maintained. Most importantly, for the most part they have not been benchmarked or validated. Remarkably, among the 23 tools considered here, only 13 have been described in dedicated application notes, 3 have been mentioned as add-ons in the methods section of bigger articles, and 5 have never been referenced in a journal. Indeed, peer-reviewed publication of these tools in dedicated articles would be highly desirable. While this would not definitively guarantee quality, at least it would encourage authors to reach minimum standards in terms of validation, benchmarking, and documentation. Collaborative efforts like the Assemblathon e.g.  or iEvo (http://www.ievobio.org/) might be also a source of inspiration. Meanwhile, we hope that the decision tree presented in Fig. 1 helps users making appropriate choices.

SOURCE
REFERENCES
Serghei Mangul, Lana S. Martin, Brian L. Hill, Angela Ka-Mei Lam, Margaret G. Distler, Alex Zelikovsky, Eleazar Eskin, Jonathan Flint
Nat Commun. 2019; 10: 1393. Published online 2019 Mar 27. doi: 10.1038/s41467-019-09406-4
PMCID:
PMC6437167
Ge Tan, Lennart Opitz, Ralph Schlapbach, Hubert Rehrauer
Sci Rep. 2019; 9: 2856. Published online 2019 Feb 27. doi: 10.1038/s41598-019-39076-7
PMCID:
PMC6393434
Apostolos Dimitromanolakis, Jingxiong Xu, Agnieszka Krol, Laurent Briollais
BMC Bioinformatics. 2019; 20: 26. Published online 2019 Jan 15. doi: 10.1186/s12859-019-2611-1
PMCID:
PMC6332552
Kathleen E. Lotterhos, Jason H. Moore, Ann E. Stapleton
PLoS Biol. 2018 Dec; 16(12): e3000070. Published online 2018 Dec 10. doi: 10.1371/journal.pbio.3000070
PMCID:
PMC6301703
Hayley Cassidy, Randy Poelman, Marjolein Knoester, Coretta C. Van Leer-Buter, Hubert G. M. Niesters
Front Microbiol. 2018; 9: 2677. Published online 2018 Nov 13. doi: 10.3389/fmicb.2018.02677
PMCID:
PMC6243117
Genetic Simulation Resources and the GSR Certification Program
Bo Peng, Man Chong Leong, Huann-Sheng Chen, Melissa Rotunno, Katy R Brignole, John Clarke, Leah E Mechanic
Bioinformatics. 2019 Feb 15; 35(4): 709–710. Published online 2018 Aug 7. doi: 10.1093/bioinformatics/bty666
PMCID:
PMC6378936
Hadrien Gourlé, Oskar Karlsson-Lindsjö, Juliette Hayer, Erik Bongcam-Rudloff
Bioinformatics. 2019 Feb 1; 35(3): 521–522. Published online 2018 Jul 19. doi: 10.1093/bioinformatics/bty630
PMCID:
PMC6361232
Ze-Gang Wei, Shao-Wu Zhang
BMC Bioinformatics. 2018; 19: 177. Published online 2018 May 22. doi: 10.1186/s12859-018-2208-0
PMCID:
PMC5964698
Yu Li, Renmin Han, Chongwei Bi, Mo Li, Sheng Wang, Xin Gao
Bioinformatics. 2018 Sep 1; 34(17): 2899–2908. Published online 2018 Apr 6. doi: 10.1093/bioinformatics/bty223
PMCID:
PMC6129308
Roberto Semeraro, Valerio Orlandini, Alberto Magi
PLoS One. 2018; 13(4): e0194472. Published online 2018 Apr 5. doi: 10.1371/journal.pone.0194472
PMCID:
PMC5886411
Soroush Samadian, Jeff P. Bruce, Trevor J. Pugh
PLoS Comput Biol. 2018 Mar; 14(3): e1006080. Published online 2018 Mar 28. doi: 10.1371/journal.pcbi.1006080
PMCID:
PMC5891060
Brandon J. Varela, David Lesbarrères, Roberto Ibáñez, David M. Green
Front Microbiol. 2018; 9: 298. Published online 2018 Feb 22. doi: 10.3389/fmicb.2018.00298
PMCID:
PMC5826957
Fedor M. Naumenko, Irina I. Abnizova, Nathan Beka, Mikhail A. Genaev, Yuriy L. Orlov
BMC Genomics. 2018; 19(Suppl 3): 92. Published online 2018 Feb 9. doi: 10.1186/s12864-018-4475-6
PMCID:
PMC5836841
Weizhi Song, Kerrin Steensen, Torsten Thomas
PeerJ. 2017; 5: e4015. Published online 2017 Nov 8. doi: 10.7717/peerj.4015
PMCID:
PMC5681852
Haibao Tang, Ewen F. Kirkness, Christoph Lippert, William H. Biggs, Martin Fabani, Ernesto Guzman, Smriti Ramakrishnan, Victor Lavrenko, Boyko Kakaradov, Claire Hou, Barry Hicks, David Heckerman, Franz J. Och, C. Thomas Caskey, J. Craig Venter, Amalio Telenti
Am J Hum Genet. 2017 Nov 2; 101(5): 700–715. Published online 2017 Nov 2. doi: 10.1016/j.ajhg.2017.09.013
PMCID:
PMC5673627
Minh Duc Cao, Devika Ganesamoorthy, Chenxi Zhou, Lachlan J M Coin
Bioinformatics. 2018 Mar 1; 34(5): 873–874. Published online 2017 Oct 28. doi: 10.1093/bioinformatics/btx691
PMCID:
PMC6192212
Yair Motro, Jacob Moran-Gilad
Biomol Detect Quantif. 2017 Dec; 14: 1–6. Published online 2017 Oct 23. doi: 10.1016/j.bdq.2017.10.002
PMCID:
PMC5727008
Jacquiline W Mugo, Ephifania Geza, Joel Defo, Samar S M Elsheikh, Gaston K Mazandu, Nicola J Mulder, Emile R Chimusa
Bioinformatics. 2017 Oct 1; 33(19): 2995–3002. Published online 2017 Jun 24. doi: 10.1093/bioinformatics/btx369
PMCID:
PMC5870573
Ryan R. Wick, Louise M. Judd, Claire L. Gorrie, Kathryn E. Holt
PLoS Comput Biol. 2017 Jun; 13(6): e1005595. Published online 2017 Jun 8. doi: 10.1371/journal.pcbi.1005595
PMCID:
PMC5481147
Chen Yang, Justin Chu, René L Warren, Inanç Birol
Gigascience. 2017 Apr; 6(4): 1–6. Published online 2017 Feb 24. doi: 10.1093/gigascience/gix010
PMCID:
PMC5530317

Read Full Post »


Seven Alternative Designs to Quantum Computing Platform – The Race by IBM, Google, Microsoft, and Others

 

Reporter: Aviva Lev-Ari, PhD, RN

 

Business Bets on a Quantum Leap

Quantum computing could help companies address problems as huge as supply chains and climate change. Here’s how IBM, Google, Microsoft, and others are racing to bring the tech from theory to practice.
May 21, 2019

quantum computer at IonQ, an Alphabet-backed startup

A version of this article appears in the June 2019 issue of Fortune with the headline “The Race for Quantum Domination.”

Medicine

One day, your health may depend on a quantum leap.

  • Pharmaceutical giant Biogen teamed up with consultancy Accenture and startup 1QBit on a quantum computing experiment in 2017 aimed at molecular modeling, one of the more complex disciplines in medicine. The goal: finding candidate drugs to treat neurodegenerative diseases.
  • Microsoft is collaborating with Case Western Reserve University to improve the accuracy of MRI machines, which help detect cancer, using so-called quantum-inspired algorithms.

 

7 ways to win the quantum race

There are multiple ways that quantum computing could work.

Here’s a guide to which companies are backing which tech.

Superconducting uses an electrical current, flowing through special semiconductor chips cooled to near absolute zero, to produce computational “qubits.” Google, IBM, and Intel are pursuing this approach, which has so far been the front-runner.

Ion trap relies on charged atoms that are manipulated by lasers in a vacuum, which helps to reduce noisy interference that can contribute to errors. Industrial giant Honeywell is betting on this technique. So is IonQ, a startup with backing from Alphabet.

Neutral Atom Similar to the ion-trap method, except it uses, you guessed it, neutral atoms. Physicist Mikhail Lukin’s lab at Harvard is a pioneer.

Annealing designed to find the lowest-energy (and therefore speediest) solutions to math problems. Canadian firm D-Wave has sold multimillion-dollar machines based on the idea to Google and NASA. They’re fast, but skeptics question whether they qualify as “quantum.”

Silicon spin uses single electrons trapped in transistors. Intel is hedging its bets between the more mature superconducting qubits and this younger, equally semiconductor-friendly method.

Topological uses exotic, highly stable quasi-particles called “anyons.” Microsoft deems this unproven moonshot as the best candidate in the long run, though the company has yet to produce a single one.

Photonics uses light particles sent through special silicon chips. The particles interact with one another very little (good), but can scatter and disappear (bad). Three-year-old stealth startup Psi Quantum is tinkering away on this idea.

SOURCE

http://fortune.com/longform/business-quantum-computing/

 

Other related articles published in this Open Access Online Scientific Journal include the following:

 

  • R&D for Artificial Intelligence Tools & Applications: Google’s Research Efforts in 2018

Reporter: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2019/01/16/rd-for-artificial-intelligence-tools-applications-googles-research-efforts-in-2018/

 

  • LIVE Day Two – World Medical Innovation Forum ARTIFICIAL INTELLIGENCE, Boston, MA USA, Monday, April 9, 2019

www.worldmedicalinnovation.org

https://pharmaceuticalintelligence.com/2019/04/09/live-day-two-world-medical-innovation-forum-artificial-intelligence-boston-ma-usa-monday-april-9-2019/

 

  • Research and Development (R&D) Expenditure by Country represent time, capital, and effort being put into researching and designing the products of the future – Data from the UNESCO Institute for Statistics adjusted for purchasing-power parity (PPP).

Reporter: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2019/05/26/research-and-development-rd-expenditure-by-country-represent-time-capital-and-effort-being-put-into-researching-and-designing-the-products-of-the-future-data-from-the-unesco-institute-for-s/

 

  • Resources on Artificial Intelligence in Health Care and in Medicine: Articles of Note at PharmaceuticalIntelligence.com @AVIVA1950 @pharma_BI

https://www.linkedin.com/pulse/resources-artificial-intelligence-health-care-note-lev-ari-phd-rn/

 

  • IBM’s Watson Health division – How will the Future look like?I

Reporter: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2019/04/24/ibms-watson-health-division-how-will-the-future-look-like/

Read Full Post »


Reporter and Curator: Dr. Sudipta Saha, Ph.D.

 

RNA plays various roles in determining how the information in our genes drives cell behavior. One of its roles is to carry information encoded by our genes from the cell nucleus to the rest of the cell where it can be acted on by other cell components. Rresearchers have now defined how RNA also participates in transmitting information outside cells, known as extracellular RNA or exRNA. This new role of RNA in cell-to-cell communication has led to new discoveries of potential disease biomarkers and therapeutic targets. Cells using RNA to talk to each other is a significant shift in the general thought process about RNA biology.

 

Researchers explored basic exRNA biology, including how exRNA molecules and their transport packages (or carriers) were made, how they were expelled by producer cells and taken up by target cells, and what the exRNA molecules did when they got to their destination. They encountered surprising complexity both in the types of carriers that transport exRNA molecules between cells and in the different types of exRNA molecules associated with the carriers. The researchers had to be exceptionally creative in developing molecular and data-centric tools to begin making sense of the complexity, and found that the type of carrier affected how exRNA messages were sent and received.

 

As couriers of information between cells, exRNA molecules and their carriers give researchers an opportunity to intercept exRNA messages to see if they are associated with disease. If scientists could change or engineer designer exRNA messages, it may be a new way to treat disease. The researchers identified potential exRNA biomarkers for nearly 30 diseases including cardiovascular disease, diseases of the brain and central nervous system, pregnancy complications, glaucoma, diabetes, autoimmune diseases and multiple types of cancer.

 

As for example some researchers found that exRNA in urine showed promise as a biomarker of muscular dystrophy where current studies rely on markers obtained through painful muscle biopsies. Some other researchers laid the groundwork for exRNA as therapeutics with preliminary studies demonstrating how researchers might load exRNA molecules into suitable carriers and target carriers to intended recipient cells, and determining whether engineered carriers could have adverse side effects. Scientists engineered carriers with designer RNA messages to target lab-grown breast cancer cells displaying a certain protein on their surface. In an animal model of breast cancer with the cell surface protein, the researchers showed a reduction in tumor growth after engineered carriers deposited their RNA cargo.

 

Other than the above research work the scientists also created a catalog of exRNA molecules found in human biofluids like plasma, saliva and urine. They analyzed over 50,000 samples from over 2000 donors, generating exRNA profiles for 13 biofluids. This included over 1000 exRNA profiles from healthy volunteers. The researchers found that exRNA profiles varied greatly among healthy individuals depending on characteristics like age and environmental factors like exercise. This means that exRNA profiles can give important and detailed information about health and disease, but careful comparisons need to be made with exRNA data generated from people with similar characteristics.

 

Next the researchers will develop tools to efficiently and reproducibly isolate, identify and analyze different carrier types and their exRNA cargos and allow analysis of one carrier and its cargo at a time. These tools will be shared with the research community to fill gaps in knowledge generated till now and to continue to move this field forward.

 

References:

 

https://www.nih.gov/news-events/news-releases/scientists-explore-new-roles-rna

 

https://www.cell.com/consortium/exRNA

 

https://www.sciencedaily.com/releases/2016/06/160606120230.htm

 

https://www.pasteur.fr/en/multiple-roles-rnas

 

https://www.nature.com/scitable/topicpage/rna-functions-352

 

https://www.umassmed.edu/rti/biology/role-of-rna-in-biology/

 

Read Full Post »


A Nonlinear Methodology to Explain Complexity of the Genome and Bioinformatic Information

Reporter: Stephen J. Williams, Ph.D.

Multifractal bioinformatics: A proposal to the nonlinear interpretation of genome

The following is an open access article by Pedro Moreno on a methodology to analyze genetic information across species and in particular, the evolutionary trends of complex genomes, by a nonlinear analytic approach utilizing fractal geometry, coined “Nonlinear Bioinformatics”.  This fractal approach stems from the complex nature of higher eukaryotic genomes including mosaicism, multiple interdispersed  genomic elements such as intronic regions, noncoding regions, and also mobile elements such as transposable elements.  Although seemingly random, there exists a repetitive nature of these elements. Such complexity of DNA regulation, structure and genomic variation is felt best understood by developing algorithms based on fractal analysis, which can best model the regionalized and repetitive variability and structure within complex genomes by elucidating the individual components which contributes to an overall complex structure rather than using a “linear” or “reductionist” approach looking at individual coding regions, which does not take into consideration the aforementioned factors leading to genetic complexity and diversity.

Indeed, many other attempts to describe the complexities of DNA as a fractal geometric pattern have been described.  In a paper by Carlo Cattani “Fractals and Hidden Symmetries in DNA“, Carlo uses fractal analysis to construct a simple geometric pattern of the influenza A virus by modeling the primary sequence of this viral DNA, namely the bases A,G,C, and T. The main conclusions that

fractal shapes and symmetries in DNA sequences and DNA walks have been shown and compared with random and deterministic complex series. DNA sequences are structured in such a way that there exists some fractal behavior which can be observed both on the correlation matrix and on the DNA walks. Wavelet analysis confirms by a symmetrical clustering of wavelet coefficients the existence of scale symmetries.

suggested that, at least, the viral influenza genome structure could be analyzed into its basic components by fractal geometry.
This approach has been used to model the complex nature of cancer as discussed in a 2011 Seminars in Oncology paper
Abstract: Cancer is a highly complex disease due to the disruption of tissue architecture. Thus, tissues, and not individual cells, are the proper level of observation for the study of carcinogenesis. This paradigm shift from a reductionist approach to a systems biology approach is long overdue. Indeed, cell phenotypes are emergent modes arising through collective non-linear interactions among different cellular and microenvironmental components, generally described by “phase space diagrams”, where stable states (attractors) are embedded into a landscape model. Within this framework, cell states and cell transitions are generally conceived as mainly specified by gene-regulatory networks. However, the system s dynamics is not reducible to the integrated functioning of the genome-proteome network alone; the epithelia-stroma interacting system must be taken into consideration in order to give a more comprehensive picture. Given that cell shape represents the spatial geometric configuration acquired as a result of the integrated set of cellular and environmental cues, we posit that fractal-shape parameters represent “omics descriptors of the epithelium-stroma system. Within this framework, function appears to follow form, and not the other way around.

As authors conclude

” Transitions from one phenotype to another are reminiscent of phase transitions observed in physical systems. The description of such transitions could be obtained by a set of morphological, quantitative parameters, like fractal measures. These parameters provide reliable information about system complexity. “

Gene expression also displays a fractal nature. In a Frontiers in Physiology paper by Mahboobeh Ghorbani, Edmond A. Jonckheere and Paul Bogdan* “Gene Expression Is Not Random: Scaling, Long-Range Cross-Dependence, and Fractal Characteristics of Gene Regulatory Networks“,

the authors describe that gene expression networks display time series display fractal and long-range dependence characteristics.

Abstract: Gene expression is a vital process through which cells react to the environment and express functional behavior. Understanding the dynamics of gene expression could prove crucial in unraveling the physical complexities involved in this process. Specifically, understanding the coherent complex structure of transcriptional dynamics is the goal of numerous computational studies aiming to study and finally control cellular processes. Here, we report the scaling properties of gene expression time series in Escherichia coliand Saccharomyces cerevisiae. Unlike previous studies, which report the fractal and long-range dependency of DNA structure, we investigate the individual gene expression dynamics as well as the cross-dependency between them in the context of gene regulatory network. Our results demonstrate that the gene expression time series display fractal and long-range dependence characteristics. In addition, the dynamics between genes and linked transcription factors in gene regulatory networks are also fractal and long-range cross-correlated. The cross-correlation exponents in gene regulatory networks are not unique. The distribution of the cross-correlation exponents of gene regulatory networks for several types of cells can be interpreted as a measure of the complexity of their functional behavior.

 

Given that multitude of complex biomolecular networks and biomolecules can be described by fractal patterns, the development of bioinformatic algorithms  would enhance our understanding of the interdependence and cross funcitonality of these mutiple biological networks, particularly in disease and drug resistance.  The article below by Pedro Moreno describes the development of such bioinformatic algorithms.

Pedro A. Moreno
Escuela de Ingeniería de Sistemas y Computación, Facultad de Ingeniería, Universidad del Valle, Cali, Colombia
E-mail: pedro.moreno@correounivalle.edu.co

Eje temático: Ingeniería de sistemas / System engineering
Recibido: 19 de septiembre de 2012
Aceptado: 16 de diciembre de 2013


 

 


Abstract

The first draft of the human genome (HG) sequence was published in 2001 by two competing consortia. Since then, several structural and functional characteristics for the HG organization have been revealed. Today, more than 2.000 HG have been sequenced and these findings are impacting strongly on the academy and public health. Despite all this, a major bottleneck, called the genome interpretation persists. That is, the lack of a theory that explains the complex puzzles of coding and non-coding features that compose the HG as a whole. Ten years after the HG sequenced, two recent studies, discussed in the multifractal formalism allow proposing a nonlinear theory that helps interpret the structural and functional variation of the genetic information of the genomes. The present review article discusses this new approach, called: “Multifractal bioinformatics”.

Keywords: Omics sciences, bioinformatics, human genome, multifractal analysis.


1. Introduction

Omic Sciences and Bioinformatics

In order to study the genomes, their life properties and the pathological consequences of impairment, the Human Genome Project (HGP) was created in 1990. Since then, about 500 Gpb (EMBL) represented in thousands of prokaryotic genomes and tens of different eukaryotic genomes have been sequenced (NCBI, 1000 Genomes, ENCODE). Today, Genomics is defined as the set of sciences and technologies dedicated to the comprehensive study of the structure, function and origin of genomes. Several types of genomic have arisen as a result of the expansion and implementation of genomics to the study of the Central Dogma of Molecular Biology (CDMB), Figure 1 (above). The catalog of different types of genomics uses the Latin suffix “-omic” meaning “set of” to mean the new massive approaches of the new omics sciences (Moreno et al, 2009). Given the large amount of genomic information available in the databases and the urgency of its actual interpretation, the balance has begun to lean heavily toward the requirements of bioinformatics infrastructure research laboratories Figure 1 (below).

The bioinformatics or Computational Biology is defined as the application of computer and information technology to the analysis of biological data (Mount, 2004). An interdisciplinary science that requires the use of computing, applied mathematics, statistics, computer science, artificial intelligence, biophysical information, biochemistry, genetics, and molecular biology. Bioinformatics was born from the need to understand the sequences of nucleotide or amino acid symbols that make up DNA and proteins, respectively. These analyzes are made possible by the development of powerful algorithms that predict and reveal an infinity of structural and functional features in genomic sequences, as gene location, discovery of homologies between macromolecules databases (Blast), algorithms for phylogenetic analysis, for the regulatory analysis or the prediction of protein folding, among others. This great development has created a multiplicity of approaches giving rise to new types of Bioinformatics, such as Multifractal Bioinformatics (MFB) that is proposed here.

1.1 Multifractal Bioinformatics and Theoretical Background

MFB is a proposal to analyze information content in genomes and their life properties in a non-linear way. This is part of a specialized sub-discipline called “nonlinear Bioinformatics”, which uses a number of related techniques for the study of nonlinearity (fractal geometry, Hurts exponents, power laws, wavelets, among others.) and applied to the study of biological problems (https://pharmaceuticalintelligence.com/tag/fractal-geometry/). For its application, we must take into account a detailed knowledge of the structure of the genome to be analyzed and an appropriate knowledge of the multifractal analysis.

1.2 From the Worm Genome toward Human Genome

To explore a complex genome such as the HG it is relevant to implement multifractal analysis (MFA) in a simpler genome in order to show its practical utility. For example, the genome of the small nematode Caenorhabditis elegans is an excellent model to learn many extrapolated lessons of complex organisms. Thus, if the MFA explains some of the structural properties in that genome it is expected that this same analysis reveals some similar properties in the HG.

The C. elegans nuclear genome is composed of about 100 Mbp, with six chromosomes distributed into five autosomes and one sex chromosome. The molecular structure of the genome is particularly homogeneous along with the chromosome sequences, due to the presence of several regular features, including large contents of genes and introns of similar sizes. The C. elegans genome has also a regional organization of the chromosomes, mainly because the majority of the repeated sequences are located in the chromosome arms, Figure 2 (left) (C. elegans Sequencing Consortium, 1998). Given these regular and irregular features, the MFA could be an appropriate approach to analyze such distributions.

Meanwhile, the HG sequencing revealed a surprising mosaicism in coding (genes) and noncoding (repetitive DNA) sequences, Figure 2 (right) (Venter et al., 2001). This structure of 6 Gbp is divided into 23 pairs of chromosomes (diploid cells) and these highly regionalized sequences introduce complex patterns of regularity and irregularity to understand the gene structure, the composition of sequences of repetitive DNA and its role in the study and application of life sciences. The coding regions of the genome are estimated at ~25,000 genes which constitute 1.4% of GH. These genes are involved in a giant sea of various types of non-coding sequences which compose 98.6% of HG (misnamed popularly as “junk DNA”). The non-coding regions are characterized by many types of repeated DNA sequences, where 10.6% consists of Alu sequences, a type of SINE (short and dispersed repeated elements) sequence and preferentially located towards the genes. LINES, MIR, MER, LTR, DNA transposons and introns are another type of non-coding sequences which form about 86% of the genome. Some of these sequences overlap with each other; as with CpG islands, which complicates the analysis of genomic landscape. This standard genomic landscape was recently clarified, the last studies show that 80.4% of HG is functional due to the discovery of more than five million “switches” that operate and regulate gene activity, re-evaluating the concept of “junk DNA”. (The ENCODE Project Consortium, 2012).

Given that all these genomic variations both in worm and human produce regionalized genomic landscapes it is proposed that Fractal Geometry (FG) would allow measuring how the genetic information content is fragmented. In this paper the methodology and the nonlinear descriptive models for each of these genomes will be reviewed.

1.3 The MFA and its Application to Genome Studies

Most problems in physics are implicitly non-linear in nature, generating phenomena such as chaos theory, a science that deals with certain types of (non-linear) but very sensitive dynamic systems to initial conditions, nonetheless of deterministic rigor, that is that their behavior can be completely determined by knowing initial conditions (Peitgen et al, 1992). In turn, the FG is an appropriate tool to study the chaotic dynamic systems (CDS). In other words, the FG and chaos are closely related because the space region toward which a chaotic orbit tends asymptotically has a fractal structure (strange attractors). Therefore, the FG allows studying the framework on which CDS are defined (Moon, 1992). And this is how it is expected for the genome structure and function to be organized.

The MFA is an extension of the FG and it is related to (Shannon) information theory, disciplines that have been very useful to study the information content over a sequence of symbols. Initially, Mandelbrot established the FG in the 80’s, as a geometry capable of measuring the irregularity of nature by calculating the fractal dimension (D), an exponent derived from a power law (Mandelbrot, 1982). The value of the D gives us a measure of the level of fragmentation or the information content for a complex phenomenon. That is because the D measures the scaling degree that the fragmented self-similarity of the system has. Thus, the FG looks for self-similar properties in structures and processes at different scales of resolution and these self-similarities are organized following scaling or power laws.

Sometimes, an exponent is not sufficient to characterize a complex phenomenon; so more exponents are required. The multifractal formalism allows this, and applies when many subgroups of fractals with different scalar properties with a large number of exponents or fractal dimensions coexist simultaneously. As a result, when a spectrum of multifractal singularity measurement is generated, the scaling behavior of the frequency of symbols of a sequence can be quantified (Vélez et al, 2010).

The MFA has been implemented to study the spatial heterogeneity of theoretical and experimental fractal patterns in different disciplines. In post-genomics times, the MFA was used to study multiple biological problems (Vélez et al, 2010). Nonetheless, very little attention has been given to the use of MFA to characterize the content of the structural genetic information of the genomes obtained from the images of the Chaos Representation Game (CRG). First studies at this level were made recently to the analysis of the C. elegans genome (Vélez et al, 2010) and human genomes (Moreno et al, 2011). The MFA methodology applied for the study of these genomes will be developed below.

2. Methodology

The Multifractal Formalism from the CGR

2.1 Data Acquisition and Molecular Parameters

Databases for the C. elegans and the 36.2 Hs_ refseq HG version were downloaded from the NCBI FTP server. Then, several strategies were designed to fragment the genomic DNA sequences of different length ranges. For example, the C. elegans genome was divided into 18 fragments, Figure 2 (left) and the human genome in 9,379 fragments. According to their annotation systems, the contents of molecular parameters of coding sequences (genes, exons and introns), noncoding sequences (repetitive DNA, Alu, LINES, MIR, MER, LTR, promoters, etc.) and coding/ non-coding DNA (TTAGGC, AAAAT, AAATT, TTTTC, TTTTT, CpG islands, etc.) are counted for each sequence.

2.2 Construction of the CGR 2.3 Fractal Measurement by the Box Counting Method

Subsequently, the CGR, a recursive algorithm (Jeffrey, 1990; Restrepo et al, 2009) is applied to each selected DNA sequence, Figure 3 (above, left) and from which an image is obtained, which is quantified by the box-counting algorithm. For example, in Figure 3 (above, left) a CGR image for a human DNA sequence of 80,000 bp in length is shown. Here, dark regions represent sub-quadrants with a high number of points (or nucleotides). Clear regions, sections with a low number of points. The calculation for the D for the Koch curve by the box-counting method is illustrated by a progression of changes in the grid size, and its Cartesian graph, Table 1

The CGR image for a given DNA sequence is quantified by a standard fractal analysis. A fractal is a fragmented geometric figure whose parts are an approximated copy at full scale, that is, the figure has self-similarity. The D is basically a scaling rule that the figure obeys. Generally, a power law is given by the following expression:

Where N(E) is the number of parts required for covering the figure when a scaling factor E is applied. The power law permits to calculate the fractal dimension as:

The D obtained by the box-counting algorithm covers the figure with disjoint boxes ɛ = 1/E and counts the number of boxes required. Figure 4 (above, left) shows the multifractal measure at momentum q=1.

2.4 Multifractal Measurement

When generalizing the box-counting algorithm for the multifractal case and according to the method of moments q, we obtain the equation (3) (Gutiérrez et al, 1998; Yu et al, 2001):

Where the Mi number of points falling in the i-th grid is determined and related to the total number Mand ɛ to box size. Thus, the MFA is used when multiple scaling rules are applied. Figure 4 (above, right) shows the calculation of the multifractal measures at different momentum q (partition function). Here, linear regressions must have a coefficient of determination equal or close to 1. From each linear regression D are obtained, which generate an spectrum of generalized fractal dimensions Dfor all q integers, Figure 4 (below, left). So, the multifractal spectrum is obtained as the limit:

The variation of the q integer allows emphasizing different regions and discriminating their fractal a high Dq is synonymous of the structure’s richness and the properties of these regions. Negative values emphasize the scarce regions; a high Dindicates a lot of structure and properties in these regions. In real world applications, the limit Dqreadily approximated from the data using a linear fitting: the transformation of the equation (3) yields:

Which shows that ln In(Mi )= for set q is a linear function in the ln(ɛ), Dq can therefore be evaluated as q the slope of a fixed relationship between In(Mi )= and (q-1) ln(ɛ). The methodologies and approaches for the method of box-counting and MFA are detailed in Moreno et al, 2000, Yu et al, 2001; Moreno, 2005. For a rigorous mathematical development of MFA from images consult Multifractal system, wikipedia.

2.5 Measurement of Information Content

Subsequently, from the spectrum of generalized dimensions Dq, the degree of multifractality ΔDq(MD) is calculated as the difference between the maximum and minimum values of : ΔD qq Dqmax – Dqmin (Ivanov et al, 1999). When qmaxqmin ΔDis high, the multifractal spectrum is rich in information and highly aperiodic, when ΔDq is small, the resulting dimension spectrum is poor in information and highly periodic. It is expected then, that the aperiodicity in the genome would be related to highly polymorphic genomic aperiodic structures and those periodic regions with highly repetitive and not very polymorphic genomic structures. The correlation exponent t(q) = (– 1)DqFigure 4 (below, right ) can also be obtained from the multifractal dimension Dq. The generalized dimension also provides significant specific information. D(q = 0) is equal to the Capacity dimension, which in this analysis is the size of the “box count”. D(q = 1) is equal to the Information dimension and D(q = 2) to the Correlation dimension. Based on these multifractal parameters, many of the structural genomic properties can be quantified, related, and interpreted.

2.6 Multifractal Parameters and Statistical and Discrimination Analyses

Once the multifractal parameters are calculated (D= (-20, 20), ΔDq, πq, etc.), correlations with the molecular parameters are sought. These relations are established by plotting the number of genome molecular parameters versus MD by discriminant analysis with Cartesian graphs in 2-D, Figure 5 (below, left) and 3-D and combining multifractal and molecular parameters. Finally, simple linear regression analysis, multivariate analysis, and analyses by ranges and clusterings are made to establish statistical significance.

3 Results and Discussion

3.1 Non-linear Descriptive Model for the C. elegans Genome

When analyzing the C. elegans genome with the multifractal formalism it revealed what symmetry and asymmetry on the genome nucleotide composition suggested. Thus, the multifractal scaling of the C. elegans genome is of interest because it indicates that the molecular structure of the chromosome may be organized as a system operating far from equilibrium following nonlinear laws (Ivanov et al, 1999; Burgos and Moreno-Tovar, 1996). This can be discussed from two points of view:

1) When comparing C. elegans chromosomes with each other, the X chromosome showed the lowest multifractality, Figure 5 (above). This means that the X chromosome is operating close to equilibrium, which results in an increased genetic instability. Thus, the instability of the X could selectively contribute to the molecular mechanism that determines sex (XX or X0) during meiosis. Thus, the X chromosome would be operating closer to equilibrium in order to maintain their particular sexual dimorphism.

2) When comparing different chromosome regions of the C. elegans genome, changes in multifractality were found in relation to the regional organization (at the center and arms) exhibited by the chromosomes, Figure 5 (below, left). These behaviors are associated with changes in the content of repetitive DNA, Figure 5 (below, right). The results indicated that the chromosome arms are even more complex than previously anticipated. Thus, TTAGGC telomere sequences would be operating far from equilibrium to protect the genetic information encoded by the entire chromosome.

All these biological arguments may explain why C. elegans genome is organized in a nonlinear way. These findings provide insight to quantify and understand the organization of the non-linear structure of the C. elegans genome, which may be extended to other genomes, including the HG (Vélez et al, 2010).

3.2 Nonlinear Descriptive Model for the Human Genome

Once the multifractal approach was validated in C. elegans genome, HG was analyzed exhaustively. This allowed us to propose a nonlinear model for the HG structure which will be discussed under three points of view.

1) It was found that the HG high multifractality depends strongly on the contents of Alu sequences and to a lesser extent on the content of CpG islands. These contents would be located primarily in highly aperiodic regions, thus taking the chromosome far from equilibrium and giving to it greater genetic stability, protection and attraction of mutations, Figure 6 (A-C). Thus, hundreds of regions in the HG may have high genetic stability and the most important genetic information of the HG, the genes, would be safeguarded from environmental fluctuations. Other repeated elements (LINES, MIR, MER, LTRs) showed no significant relationship,

Figure 6 (D). Consequently, the human multifractal map developed in Moreno et al, 2011 constitutes a good tool to identify those regions rich in genetic information and genomic stability. 2) The multifractal context seems to be a significant requirement for the structural and functional organization of thousands of genes and gene families. Thus, a high multifractal context (aperiodic) appears to be a “genomic attractor” for many genes (KOGs, KEEGs), Figure 6 (E) and some gene families, Figure 6 (F) are involved in genetic and deterministic processes, in order to maintain a deterministic regulation control in the genome, although most of HG sequences may be subject to a complex epigenetic control.

3) The classification of human chromosomes and chromosome regions analysis may have some medical implications (Moreno et al, 2002; Moreno et al, 2009). This means that the structure of low nonlinearity exhibited by some chromosomes (or chromosome regions) involve an environmental predisposition, as potential targets to undergo structural or numerical chromosomal alterations in Figure 6 (G). Additionally, sex chromosomes should have low multifractality to maintain sexual dimorphism and probably the X chromosome inactivation.

All these fractals and biological arguments could explain why Alu elements are shaping the HG in a nonlinearly manner (Moreno et al, 2011). Finally, the multifractal modeling of the HG serves as theoretical framework to examine new discoveries made by the ENCODE project and new approaches about human epigenomes. That is, the non-linear organization of HG might help to explain why it is expected that most of the GH is functional.

4. Conclusions

All these results show that the multifractal formalism is appropriate to quantify and evaluate genetic information contents in genomes and to relate it with the known molecular anatomy of the genome and some of the expected properties. Thus, the MFB allows interpreting in a logic manner the structural nature and variation of the genome.

The MFB allows understanding why a number of chromosomal diseases are likely to occur in the genome, thus opening a new perspective toward personalized medicine to study and interpret the GH and its diseases.

The entire genome contains nonlinear information organizing it and supposedly making it function, concluding that virtually 100% of HG is functional. Bioinformatics in general, is enriched with a novel approach (MFB) making it possible to quantify the genetic information content of any DNA sequence and their practical applications to different disciplines in biology, medicine and agriculture. This novel breakthrough in computational genomic analysis and diseases contributes to define Biology as a “hard” science.

MFB opens a door to develop a research program towards the establishment of an integrative discipline that contributes to “break” the code of human life. (http://pharmaceuticalintelligence. com/page/3/).

5. Acknowledgements

Thanks to the directives of the EISC, the Universidad del Valle and the School of Engineering for offering an academic, scientific and administrative space for conducting this research. Likewise, thanks to co authors (professors and students) who participated in the implementation of excerpts from some of the works cited here. Finally, thanks to Colciencias by the biotechnology project grant # 1103-12-16765.


6. References

Blanco, S., & Moreno, P.A. (2007). Representación del juego del caos para el análisis de secuencias de ADN y proteínas mediante el análisis multifractal (método “box-counting”). In The Second International Seminar on Genomics and Proteomics, Bioinformatics and Systems Biology (pp. 17-25). Popayán, Colombia.         [ Links ]

Burgos, J.D., & Moreno-Tovar, P. (1996). Zipf scaling behavior in the immune system. BioSystem , 39, 227-232.         [ Links ]

C. elegans Sequencing Consortium. (1998). Genome sequence of the nematode C. elegans: a platform for investigating biology. Science , 282, 2012-2018.         [ Links ]

Gutiérrez, J.M., Iglesias A., Rodríguez, M.A., Burgos, J.D., & Moreno, P.A. (1998). Analyzing the multifractals structure of DNA nucleotide sequences. In, M. Barbie & S. Chillemi (Eds.) Chaos and Noise in Biology and Medicine (cap. 4). Hackensack (NJ): World Scientific Publishing Co.         [ Links ]

Ivanov, P.Ch., Nunes, L.A., Golberger, A.L., Havlin, S., Rosenblum, M.G., Struzikk, Z.R., & Stanley, H.E. (1999). Multifractality in human heartbeat dynamics. Nature , 399, 461-465.         [ Links ]

Jeffrey, H.J. (1990). Chaos game representation of gene structure. Nucleic Acids Research , 18, 2163-2175.         [ Links ]

Mandelbrot, B. (1982). La geometría fractal de la naturaleza. Barcelona. España: Tusquets editores.         [ Links ]

Moon, F.C. (1992). Chaotic and fractal dynamics. New York: John Wiley.         [ Links ]

Moreno, P.A. (2005). Large scale and small scale bioinformatics studies on the Caenorhabditis elegans enome. Doctoral thesis. Department of Biology and Biochemistry, University of Houston, Houston, USA.         [ Links ]

Moreno, P.A., Burgos, J.D., Vélez, P.E., Gutiérrez, J.M., & et al., (2000). Multifractal analysis of complete genomes. In P roceedings of the 12th International Genome Sequencing and Analysis Conference (pp. 80-81). Miami Beach (FL).         [ Links ]

Moreno, P.A., Rodríguez, J.G., Vélez, P.E., Cubillos, J.R., & Del Portillo, P. (2002). La genómica aplicada en salud humana. Colombia Ciencia y Tecnología. Colciencias , 20, 14-21.         [ Links ]

Moreno, P.A., Vélez, P.E., & Burgos, J.D. (2009). Biología molecular, genómica y post-genómica. Pioneros, principios y tecnologías. Popayán, Colombia: Editorial Universidad del Cauca.         [ Links ]

Moreno, P.A., Vélez, P.E., Martínez, E., Garreta, L., Díaz, D., Amador, S., Gutiérrez, J.M., et. al. (2011). The human genome: a multifractal analysis. BMC Genomics , 12, 506.         [ Links ]

Mount, D.W. (2004). Bioinformatics. Sequence and ge nome analysis. New York: Cold Spring Harbor Laboratory Press.         [ Links ]

Peitgen, H.O., Jürgen, H., & Saupe D. (1992). Chaos and Fractals. New Frontiers of Science. New York: Springer-Verlag.         [ Links ]

Restrepo, S., Pinzón, A., Rodríguez, L.M., Sierra, R., Grajales, A., Bernal, A., Barreto, E. et. al. (2009). Computational biology in Colombia. PLoS Computational Biology, 5 (10), e1000535.         [ Links ]

The ENCODE Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature , 489, 57-74.         [ Links ]

Vélez, P.E., Garreta, L.E., Martínez, E., Díaz, N., Amador, S., Gutiérrez, J.M., Tischer, I., & Moreno, P.A. (2010). The Caenorhabditis elegans genome: a multifractal analysis. Genet and Mol Res , 9, 949-965.         [ Links ]

Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., & et al. (2001). The sequence of the human genome. Science , 291, 1304-1351.         [ Links ]

Yu, Z.G., Anh, V., & Lau, K.S. (2001). Measure representation and multifractal analysis of complete genomes. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics , 64, 031903.         [ Links ]

 

Other articles on Bioinformatics on this Open Access Journal include:

Bioinformatics Tool Review: Genome Variant Analysis Tools

2017 Agenda – BioInformatics: Track 6: BioIT World Conference & Expo ’17, May 23-35, 2017, Seaport World Trade Center, Boston, MA

Better bioinformatics

Broad Institute, Google Genomics combine bioinformatics and computing expertise

Autophagy-Modulating Proteins and Small Molecules Candidate Targets for Cancer Therapy: Commentary of Bioinformatics Approaches

CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics

Read Full Post »


18th Annual 2019 BioIT, Conference & Expo, April 16-18, 2019, Boston, Seaport World Trade Center, Track 5 Next-Gen Sequencing Informatics – Advances in Large-Scale Computing

 

https://www.bio-itworldexpo.com/programs

https://www.bio-itworldexpo.com/next-gen-sequencing-informatics

 

 

Leaders in Pharmaceutical Business Intelligence (LPBI) Group

represented by Founder & Director, Aviva Lev-Ari, PhD, RN will cover this event in REAL TIME using Social Media

@pharma_BI

@AVIVA1950

@evanKristel 

TUESDAY, APRIL 16

2:00 – 6:30 Main Conference Registration Open

 

4:00 PLENARY KEYNOTE SESSION
Amphitheater

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing

 

WEDNESDAY, APRIL 17

7:30 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION
Amphitheater

9:45 Coffee Break in the Exhibit Hall with Poster Viewing

 

CURRENT AND EMERGING TECHNOLOGIES
Waterfront 3

10:50 Chairperson’s Remarks

David LaBrosse, Director, Genomics, Research, Life Sciences & Healthcare, NetApp

11:00 Long Read Sequencing

Justin Zook, PhD, Researcher, National Institute of Standards and Technology

11:20 NovoGraph: Loading 7 Human Genomes into Graphs

Evan Biederstedt, Computational Biologist, Memorial Sloan Kettering Cancer Center

11:40 Building a Usable Human Pangenome: A Human Pangenomics Hackathon Run by NCBI at UCSC

Ben Busby, PhD, Scientific Lead, NCBI Hackathons Group, National Center for Biotechnology Information (NCBI)

netapp12:00 pm Co-Presentation: Faster Genomic Data

Michael Hultner, PhD, Senior Vice President, Strategy; General Manager, US Operations, PetaGene

David LaBrosse, Director, Genomics, Research, Life Sciences & Healthcare, NetApp

Genetic testing demand is driving up the volume of genomic data that must be processed, analyzed, and stored. Gigabyte-scale genome sample files and terabyte- to petabyte-scale cohort data sets must be moved from data generation to processing to analysis sites, historically a slow, arduous process. NetApp and PetaGene will describe compression and data transfer technologies that overcome I/O bottlenecks to accelerate the movement of genomic data and reduce the time to process and analyze it.

12:30 Session Break

12:40 Luncheon Presentation I: Deep Phenotypic and Genomic Analysis of UK Biobank Data on the WuXi NextCODE Platform

Saliha Yilmaz, PhD, Research Geneticist, WuXi NextCODE

The increasing size and complexity of genetic and phenotypic data to include hundreds of thousands of participants poses a significant challenge for data storage and analysis. We demonstrate use of the GOR database and query language underlying our platform to mine UK Biobank and other datasets for efficient phenotype selection, GWAS and PheWAS, and to archive and query the results.

Seven-Bridges-rectangular1:10 NEW: Luncheon Co-Presentation II: Optimizing Drug Discovery and Development with Data-Driven Insights

Christian Frech, PhD, Associate Director, Scientific Operations, Seven Bridges

Serhat Tetikol, Research & Development Engineer, Seven Bridges

1:40 Session Break

DATA VISUALIZATION, EXPLORATION & ANALYSIS
Waterfront 3

1:50 Chairperson’s Remarks

Jeffrey Rosenfeld, PhD, Manager of the Biomedical Informatics Shared Resource and Assistant Professor of Pathology, Rutgers Cancer Institute of NJ

1:55 AbbVie’s Target and Genomics Compilation (ATGC): A Target Knowledge Platform

Rishi Gupta, PhD, Senior Research Scientist, Information Research, AbbVie, Inc.

Author: Anne-Sophie Barthelet, Scientific Developer, Discngine

ATGC is a web-based platform that allows AbbVie scientists to gather relevant information to make accurate decisions on target ID, target validation, biomarker selection and drug discovery. This platform provides in-depth information on several key pieces of information such as gene expression, RNA expression, protein expression, mouse knockout studies, etc. for each target. This talk focuses on key aspects of this application including application architecture, currently available tool sets and how various pieces of information are provided to the user.

2:25 Self Service Data Visualization and Exploration at Genentech Research

Kiran Mukhyala, Senior Software Engineer, Bioinformatics and Computational Biology, Genentech Research and Early Development

Genomic data requires specialized infrastructure to enable data exploration and analysis at scale. We built an integrated, modular, end-to-end gene expression analysis platform implementing data import, storage, processing, analysis and visualization. The multi-layered architecture of the platform supports general, high-level applications for self-service analytics, as well as infrastructure for prototyping, incubating and integrating scientist-driven innovations. The platform coexists with other in-house and commercial software to provide a wide range of genomic data analysis and visualization options for Research scientists.

2:55 Exploring and Visualizing Single-cell RNA Sequencing Data

Michael DeRan, PhD, Scientific Consultant, Diamond Age Data Science

Recent advances in single-cell RNA sequencing (scRNA-seq) technology have made this powerful method accessible to many researchers, but have not brought with them a clear, simple workflow for data analysis. As the number of scRNA-seq datasets has increased, so too has the number of analysis tools available; for those looking to perform their first scRNA-seq analysis the range of options can seem daunting. In working with our clients, I have had the opportunity to apply many different tools to scRNA-seq data from a variety of tissues and organisms. I have used this experience to select a set of tools that are flexible and suitable to many common scRNA-seq analysis tasks. In this talk I will introduce popular tools and methods for identifying cell populations, assessing differential expression and visualizing biological processes. I will discuss common pitfalls encountered in analyzing this data and make recommendations that anyone can use in their own analysis.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing, Meet the Experts: Bio-IT World Editorial Team, and Book Signing with Joseph Kvedar, MD, Author, The Internet of Healthy Things℠ (Book will be available for purchase onsite)

NGS APPROACHES FOR CANCER
Waterfront 3

4:00 Comparison of Different Approaches for Clinical Cancer Sequencing

Jeffrey Rosenfeld, PhD, Manager of the Biomedical Informatics Shared Resource and Assistant Professor of Pathology, Rutgers Cancer Institute of NJ

The sequencing of tumors is important for guiding the treatment of cancer patients. While it is agreed that there is a need to perform sequencing of the tumor, there are a wide variety of approaches ranging from paired whole genome tumor-normal sequencing to tumor-only small panel sequencing with many intermediate possibilities. Each of the approaches has a different cost and associated benefit. I will present a comparison of different methods and their efficacy for guiding cancer treatment.

4:30 Integrated NGS Analysis to Accelerate Disease Understanding for Drug Discovery

Helen Li, Director- Research IT – Biologics & Informatics, Eli Lilly and Company

5:00 Identification of Cancer Biomarker Genes

Maryam Nazarieh, PhD, Postdoctoral Researcher, Center for Bioinformatics, Universität des Saarlandes, Saarbrücken, Germany

Identification of biomarker genes plays a crucial role in disease detection and treatment. Computational approaches enhance the insights derived from experiments and reduce the efforts of biologists and experimentalists to identify biomarker genes which play key roles in complex diseases. This is essentially achieved through prioritizing a set of genes with certain attributes (1). Here, I propose a set of transcription factors that make the largest strongly connected component of the pluripotency network in embryonic stem cells as the global regulators that control differentiation process determining cell fate. This component can be controlled by a set of master regulatory genes.  The regulatory mechanisms underlying stem cells inspired us to formulate the problem where a set of master regulatory genes in regulatory networks is identified with two combinatorial optimization problems namely as minimum dominating set and minimum connected dominating set in weakly and strongly connected components. The developed methods were applied to regulatory cancer networks to identify disease-associated genes and anti-cancer drug targets in breast cancer and hepatocellular carcinoma.  As not all the nodes in the solutions are critical, a prioritization method was developed named TopControl to rank a set of candidate genes which relate to a certain disease based on systematic analysis of the genes that are differentially expressed in tumor and normal conditions. To this purpose, the NGS data were utilized taken from The Cancer Genome Atlas for matched tumor and normal samples of liver hepatocellular carcinoma (LIHC) and breast invasive carcinoma (BRCA) datasets. Moreover, the topological features were demonstrated in regulatory networks surrounding differentially expressed genes that were highly consistent in terms of using the output of several analysis tools. We present several web servers and software packages that are publicly available at no cost. The Cytoscape plugin of minimum connected dominating set identifies a set of key regulatory genes in a user provided regulatory network based on a heuristic approach. The ILP formulations of minimum dominating set and minimum connected dominating set return the optimal solutions for the aforementioned problems. Our source code is publicly available. The web servers TFmiR and TFmiR2 construct disease-, tissue-, process-specific networks for the sets of deregulated genes and miRNAs provided by a user. They highlight topological hotspots and offer detection of three- and four-node FFL motifs as a separate web service for both organisms mouse and human. 1) Maryam Nazarieh, Understanding regulatory mechanisms underlying stem cells helps to identify cancer biomarkers. Ph.D. thesis, Saarland University, Saarbrücken, Germany (2018).

5:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

Read Full Post »

Older Posts »