Advertisements
Feeds:
Posts
Comments

Archive for the ‘Big Data’ Category


MinneBOS 2019, Field Guide to Data Science & Emerging Tech in the Boston Community

August 22, 2019, 8AM to 5PM at Boston University Questrom School of Business, 595 Commonwealth Avenue, Boston, MA

 

 

MinneBOS – Boston’s Field Guide to Data Science & Emerging Tech

Announcement

Leaders in Pharmaceutical Business Intelligence (LPBI) Group

 

REAL TIME Press Coverage for

 http://pharmaceuticalintelligence.com 

by

 Aviva Lev-Ari, PhD, RN

Director & Founder, Leaders in Pharmaceutical Business Intelligence (LPBI) Group, Boston

Editor-in-Chief, Open Access Online Scientific Journal, http://pharmaceuticalintelligence.com

Editor-in-Chief, BioMed e-Series, 16 Volumes in Medicine, https://pharmaceuticalintelligence.com/biomed-e-books/

@pharma_BI

@AVIVA1950

 

Logo, Leaders in Pharmaceutical Business Intelligence (LPBI) Group, Boston

Our BioMed e-series

WE ARE ON AMAZON.COM

 

https://lnkd.in/ekWGNqA

 

Thursday, August 22

TBA

 Senior Leadership Panel: Future Directions of Analytics

This panel includes senior leaders from across industry, academia & government to discuss challenges they are tackling, needs they anticipate and goals they will achieve

Moderators

avatar for Bonnie Holub, PhD

Bonnie Holub, PhD

Industry & Business Data Science, Teradata
Bonnie has a PhD in Artificial Intelligence and specializes in correlating disparate sets of Big Data for actionable results.

 

Thursday August 22, 2019 TBA

TBA

 Creating value with AI: Building and managing data science teams for maximum business impact

We live in an era in which data is unquestionably the new currency. All sectors of the industry and disciplines in academia are searching for creative ideas to leverage the ever expanding information space. As is with any new trend, we see some justified skepticism as well as a lot of opportunistic hype around AI. There is no generic rule for hitting the right level of investment into AI and how to structure the business operations in tandem with Machine Learning/Analytics/Informatics efforts. We already see a small number of companies taking minimal risk with smart choices and thereby making headway in revolutionizing how they wield information. Creating value with AI is often not only related to creating fancier ML/DL algorithms. A holistic overhaul to create a synergistic ecosystem has proven to be critical for success. I will talk about what I’ve learned over two decades from successes and failures about how to execute relevant strategies so we can more effectively create value with AI.

Speakers

avatar for Bülent Kiziltan, PhD

Bülent Kiziltan, PhD

Head of Data Science & Analytics, Stealth Mode Elite Consultancy Startup
Dr. Bülent Kiziltan is an accomplished scientist and an AI executive who uses artificial intelligence to create value in many business verticals. He has worked at Harvard, NASA and MIT in close collaboration with pioneers of their respective fields.

Thursday August 22, 2019 TBA

TBA

 Health and Healthcare Data Visualization – See how you’re doing

Health and healthcare organizations are swimming in data but few have the skills to show and see the story in their data using the best practices of data visualization. This presentation raises awareness about the research that inform these best practice and stories from the front of groups who are embracing them and re-imagining how they display their data and information. These groups include the NYC Dept of Health & Mental Hygiene, The Centers for Medicare and Medicaid (CMS), and leading medical centers and providers across the country.

Speakers

avatar for Katherine Rowell

Katherine Rowell

Co-Founder & Principal, Health Data Viz
Katherine Rowell is a health, healthcare, and data visualization expert. She is Co-founder and Principal of HealthDataViz, a Boston firm that specializes in helping healthcare organizations organize, design and present visual displays of data to inform their decisions and stimulate… Read More →

Thursday August 22, 2019 TBA

TBA

 Navigating a Data Driven Career

In 2017 I made the decision to pursue my passion for analytics and to move out of my traditional finance role. Navigating the wealth of areas and emphasis that come along with analytics and data science, I found my desired destination. There were learnings and resources that proved very valuable. I’m looking to pass on this field operative view and experience for those looking to navigate the same space.

Speakers

avatar for David Barton

David Barton

Associate Director of Business Analytics, Optum / Post Acute
David Barton has 11+ years with Optum/UHG in FP&A, with previous work with Target and Wells Fargo in analytics, portfolio management, and strategic planning. David recently made the career transition from Finance into Analytics with the help of communities like MinneAnalytics.

Thursday August 22, 2019 TBA

TBA

 The Ethics Of Analytics

As more and more data is being collected, concerns are constantly being raised about what data is appropriate to collect and how (or if) it should be analyzed. There are many ethical, privacy, and legal issues to consider, and no clear standards exist in many cases as to what is fair and what is foul. This means that organizations must consider their own principles and risk tolerance in order to implement the right policies.

This talk explores a range of ethical, privacy, and legal issues that surround analytics today, framing the big questions to consider and detailing some of the trade-offs and ambiguities that must be addressed to answer them.


Speakers

avatar for Bill Franks

Bill Franks

Chief Analytics Officer, International Institute for Analytics
Internationally recognized analytics, data science, AI, & big data thought leader, speaker, executive, and author

Thursday August 22, 2019 TBA

TBA

 Where are enterprises on becoming truly data-driven, and what are “power-users” doing differently to get ahead?

This session will highlight findings from a research study about the emerging use of data and analytics as strategic business assets. Including insights from 262 enterprise decision makers across the Global 2000, we will present a nuanced view into the strategic imperatives around data, advanced analytics, and machine learning; the current and planned use of smart analytics; the main challenges along each step of the evolution; and emerging best practices of high performing enterprises in developing true, data-driven businesses.

Speakers

avatar for Reetika Fleming

Reetika Fleming

Research Vice President, HFS Research LLC
Reetika Fleming leads research coverage for the broad use of data and analytics within enterprises, with a focus on emerging strategies to institutionalize machine learning and other AI techniques.

Thursday August 22, 2019 TBA

TBA

 You Built It, But They Didn’t Come: How Human-Centered Design Increases the Value of Decision Support Tools

In 2019, Gartner predicted 80%+ of analytics insights won’t deliver outcomes through 2022—despite ongoing and sizable investments in technology and data. Executives are worried about having an AI strategy. Data scientists worry about getting their models to be as accurate as possible. IOT teams stay busy juggling telemetry, alerts, and APIs. Report developers do their best to visualize the data, and engineers try to glue it all together and ship it. However, if business value is dependent on specific users engaging successfully with a decision support application or data product, then teams must design these solutions around the people using them—not the data or technology. Human-centered design provides a process to help teams discover, define, and fall in love with customer problems and needs so that solutions encourage meaningful engagement and outcomes, and the business realizes value from its investment in analytics. In this mini-workshop, Brian will share some common causes of low engagement with data products, introduce the design process, and teach attendees to apply one design technique in a small group setting.

Speakers

avatar for Brian O'Neill

Brian O’Neill

Founder and Principal, Designing for Analytics
Brian helps enterprise companies turn data into indispensable data products and services. He is a product designer and host of the podcast, “Experiencing Data.”

Thursday August 22, 2019 TBA

TBA

 10x Your Mindset. Because Tomorrow is Today.

The pace of change and the confluence of exponentially accelerating technologies is fundamentally changing business, our customers and us. Marjan Mohsenin introduces frameworks for understanding what exponential growth really means and leaders can optimize these disruptive times by first, thinking exponentially. Learn the different mindset, skill sets and leadership required to successfully navigate this incredible time of change.

In this session, you will:

  1. Gain a macro view of exponentials and convergences
  2. Create a toolset to train your brain to think exponentially; we are working with a 2 million-year-old brain!
  3. Uncover business opportunities in disruption, plus the risks of not changing

Speakers

avatar for Marjan Mohsenin

Marjan Mohsenin

Senior Director, Strategic Relations, Singularity University
Marjan prescribes strategies on how to think more imaginatively about the future and acquire the mindsets, skills, and behaviors to bring that future to life.  Adding the unique perspective of being both the disruptor and disrupted.

Thursday August 22, 2019 TBA

TBA

 AI in Healthcare

Benefits, challenges and impact of AI and Cybersecurity on medicine.

Speakers

avatar for Vinit Nijhawan

Vinit Nijhawan

Lecturer, Boston University
Vinit Nijhawan is an Entrepreneur, Academic, and Board Member with a track record of success, including 4 startups in 20 years.

Thursday August 22, 2019 TBA

TBA

 Machine Learning in Practice: Anomaly Detection for Army ERP Data

Machine learning and artificial intelligence (AI) are two areas that show tremendous potential for a wide variety of use cases, helping to augment, inform, and supplement human processes.  In practice, most companies have fallen short in actually implementing these strategies, since most organizations are still trying to make sense of their data.

However, this doesn’t mean machine learning solutions should be scrapped until your organization’s data is flawless. Instead, machine learning and AI can be used to better understand operational data, uncover the root cause of data issues, resolve existing data errors, and prevent future errors by addressing the source of each anomaly.

In this session, we’ll review a machine learning case study for the Department of Defense. During this project, the team set out to evaluate the potential of machine learning and AI in improving operational data quality and thereby increasing Army readiness. The team launched a pilot analysis and proof of principle to demonstrate how machine learning and AI algorithms, protocols, and methodologies can address known flaws in existing datasets and capitalize on pattern recognition to produce data cleansing models. A progression of analysis and machine learning approaches were leveraged to better understand problematic datasets within the Army’s ERP environment, identify and classify anomalies, and ultimately provide a path to resolution.

During this session, we’ll highlight the machine learning technologies and approaches, both supervised and unsupervised, that were used to best uncover and resolve data anomalies. We’ll also review the anticipated next steps for the Army with AI that are designed to prevent future data quality issues and actively monitor their ERP environment.


Speakers

avatar for Tanya Cashorali

Tanya Cashorali

Founder, TCB Analytics
Tanya Cashorali is the founder of TCB Analytics, a data and analytics consultancy. She leads a world-wide community network of 400 data enthusiasts, has helped universities launch data science programs, and is a frequent speaker at tech conferences.

Thursday August 22, 2019 TBA

TBA

 Minding the Gap: Understanding and Mitigating Bias in AI

The presentation will highlight the ways inherent bias can exist in AI programs and how market researchers can identify and navigate these potential land mines.

Speakers

avatar for Jackie Anderson

Jackie Anderson

Growth Strategist, ScaleHouse
Jackie helps businesses identify strengths, mitigate weaknesses,& accelerate growth. Previously, Jackie served as Chief Client Officer at Simmons Research and held various roles at Forrester & J.D. Power & Assoc. She is currently the head of WIRe Boston.

Thursday August 22, 2019 TBA

TBA

 Patient centric AI: Saving lives with ML driven hospital interventions

This presentation will cover the use of machine learning for maximizing the impact of a hospital readmissions intervention program. With machine learning, clinical care teams can identify and focus their intervention efforts on patients with the highest risk of readmission. The talk will go over the goals, logistics, and considerations for defining, implementing, and measuring our ML driven intervention program. While covering some technical details, this presentation will focus on the business implementation of advanced technology for helping people live healthier lives.

Speakers

avatar for Miguel Martinez

Miguel Martinez

Data Scientist, Optum
Miguel Martinez is a Data Scientist at Optum Enterprise Analytics. Relied on as a tech lead in advancing AI healthcare initiatives, he is passionate about identifying and developing data science solutions for the benefit of organizations and people.

Thursday August 22, 2019 TBA

TBA

 Predicting Leaders (and Followers) from online interactions

What if we could take video (and text, chat, etc) and use what we know about human signals and social physics to *augment* our online interactions and start to make them better and maybe even more informative than in-person meetings? What could we learn? How might that help us be more effective in remote teams?

Speakers

avatar for Beth Porter, MA

Beth Porter, MA

CEO, Riff Learning Inc
Beth’s philosophy is that people learn best from each other, and learning fosters both personal growth and organizational innovation and change. She teaches IT Strategies at Boston University and in the Media Ventures team at the MIT Media Lab.

Thursday August 22, 2019 TBA

TBA

 Using Ontologies to Power AI Systems

There’s a great deal of confusion about the role of a knowledge architecture in artificial intelligence projects. Some people don’t believe that any reference data is necessary. But in reality reference data is required- even if there is no metadata or architecture definitions outside defined externally for an AI algorithm, someone has made the decisions about architecture and classification within the program. However, this will not work for every organization because there are terms, workflows, product attributes, and organizing principles that are unique to the organization and that need to be defined for AI tools to work most effectively.

Speakers

avatar for Seth Earley

Seth Earley

CEO, Earley Information Science
Seth Earley is a published author and public speaker about artificial intelligence and information architecture. He wrote “There’s no AI without IA” which has become an industry catchphrase used by a number of people including Ginny Rometty, the CEO of IBM.

Thursday August 22, 2019 TBA

TBA

 A Review of Data Sources used in Sports Analytics

Sports Analytics is a growing industry, and many practitioners have asked me about how to get a start in answering their own questions.  This talk will review and discuss the publicly available datasets for baseball, basketball, American Football, hockey and soccer.

Speakers

avatar for Andy Andres, PhD

Andy Andres, PhD

Senior Lecturer, Boston Univeristy
Andy Andres created and taught the still popular Sports Analytics course on edx.org (over 50,000 registered students), “Sabermetrics 101: An Introduction to Baseball Analytics”

Thursday August 22, 2019 TBA

TBA

 Accelerate delivery of ML-based products

This talk will examine a case study in using open source tools (ioModel) and a grounding in statistics to rapidly develop and deliver to market an instrument capable of forecasting the onset of dementia 1-10 years before a doctor could diagnose it. This same disciplined method and approach can and should be leveraged in the commercial industry to rapidly accelerate the delivery of ML-based products and features while reducing cost and the rate of failure of data science initiatives.

Speakers

avatar for Matt Hogan

Matt Hogan

Founder, Twin Tech Labs
Matt is a passionate technologist and futurist and believes that we stand to build the greatest products by studying the intersection of people and technology, both when products are being built and in how they are being used.

Thursday August 22, 2019 TBA

TBA

 Data Lake for Innovation

Will talk about how building on prem, hybrid and cloud-based data lake is driving organizations quest for innovation and become data driven. This session will focus on architecture and technologies for on prem and cloud-based data lake and case studies on organizations journey in big data.

Speakers

avatar for Shahidul Mannan

Shahidul Mannan

Head, Data Engineer & Innovation, Partners Healthcare
Shahidul Mannan is Founder/CEO of MBS Analytics, now Head of Data Engineering and Innovation at Partners Healthcare. With 15+ years of Executive leadership, he was Chief Analytics Officer, Human Health and Global Head, Data Analytics at EMC

Thursday August 22, 2019 TBA

TBA

 Data Science for a Home Insurance Product

Plymouth Rock Home Group (BunkerHill insurance) started a new product in 2016 with the best customer service experience in mind.  When customers come shopping for insurance needs, instead of the traditional lengthy quoting process, we offer them fast quoting by pre-rate all properties.  As the core of the business strategies, Data Science is playing a vital role for the product. Our home product is heavily digitalized, leaning on multiple data and advanced predictive analytics.  We apply Data Science on our multiple data assets to identify customers who could be good targets for our product and reach out to them proactively. In this presentation, we will review how our Data Science team is supporting business strategies around marketing, underwriting, claims, and renewals.

Speakers

avatar for Shawn Jin

Shawn Jin

Head of Analytics, Bunkerhill Insurance
Shawn Jin leads Data Science team to support rapid growth of the home product in Plymouth Rock Home Group (Bunker Hill Insurance) through advanced analytics.  Previously Shawn focused on analytics in AIG, McKinsey, Targetbase, Merkle and CapitalOne.

Thursday August 22, 2019 TBA

TBA

 Deep Learning for Radiology Text Report Classification

In this talk, I will discuss about our proposed advanced deep learning models for classifying free text radiology reports based on the presence of pulmonary emboli (PE). The models are trained on a subset of Stanford training set (2512 reports) and evaluated on reports collected from four major healthcare centers. Our experiments suggest feasibility of broader usage of neural networks in automated classification of multi-institutional imaging text reports for various applications including evaluation of imaging utilization, imaging yield, and clinical decision support tools.

Speakers

avatar for Sadid Hasan, PhD

Sadid Hasan, PhD

Senior Scientist (Tech Lead), Philips Research
Sadid Hasan is a Senior Scientist (Tech Lead) of the Artificial Intelligence Group at Philips Research, Cambridge, MA. His recent research focuses on solving NLP problems related to Information Extraction and Text Classification using Deep Learning.

Thursday August 22, 2019 TBA

TBA

 Deep learning image recognition and classification models for fashion items

Large scale image recognition and classification is an interesting and challenging problem. This case study uses fashion-MNIST dataset that involves 60000 training images and 10000 testing images. Several popular deep learning models are explored in this study to arrive at a suitable model with high accuracy. Although convolutional neural networks have emerged as a gold-standard for image recognition and classification problems due to speed and accuracy advantages, arriving at an optimal model and making several choices at the time of specifying model architecture, is still a challenging task. This case study provides the best practices and interesting insights.

Speakers

avatar for Bharatendra Rai

Bharatendra Rai

Professor, UMass Dartmouth
Bharatendra Rai, Ph.D. is Professor of Business Analytics in the Charlton College of Business at UMass Dartmouth. His research interests include machine learning & deep learning applications.

Thursday August 22, 2019 TBA

TBA

 Did a Machine Write My Talk?

For years, news agencies have been automatically generating formulaic news articles.   Now, however, deep learning architectures enable computers to summarize highly technical or specialized texts at a high level.   What is the state of the art, where is it going and should we be concerned?

Speakers

avatar for Brian Ulicny, PhD

Brian Ulicny, PhD

VP, Americas, Thomson Reuters
Dr. Brian Ulicny is VP, Thomson Reuters Labs – Americas. The Labs partner with customers, internal teams, start-ups and academics, to create new data-driven innovations utilizing Thomson Reuters’ vast, curated data sets across many disciplines.

Thursday August 22, 2019 TBA

TBA

 Experiences in using data science and machine learning tasks for K-12 education

In this presentation, attendees will learn how the K-12 education sector can use classification tasks such as decision-trees to uncover new insight about students. Various linear regression models, and even failed experiments will also be discussed and what we have learned from them.  Since the use of machine learning and data science is relatively new in K-12 education space, I plan on discussing the challenges of the skill gap and introducing these concepts to the decision-makers within the education space.

Speakers

avatar for Rich Huebner

Rich Huebner

Director, Data Science & Architecture, Houghton Mifflin Harcourt
Dr. Huebner has held both industry and academic roles for the last 20+ years that have centered around business intelligence, data science, and analytics solutions. He also teaches MBA and doctoral classes at New England College of Business.

Thursday August 22, 2019 TBA

TBA

 Multilingual NLP for clinical text: impacting healthcare with big data, globally

At Droice, we leverage massive repositories of clinical text to build deep learning/NLP solutions to help clinicians make better decisions for individual patients. With the widespread adoption of electronic medical records (EMRs) and recent advances in machine learning, natural language processing has come to the forefront in clinical AI. Despite the challenges of working with unstructured text, doctors’ notes and other clinical text contains some of the richest information about a patient. However, building systems that can work with clinical text in languages other than English remains a challenge to this day. In this talk, we will present several real-world use cases of NLP-powered solutions in several languages.

Speakers

avatar for Mayur Saxena, PhD

Mayur Saxena, PhD

CEO, Droice Labs
Mayur serves as the CEO of Droice Labs, an AI/Big Data healthcare company based in NY. Mayur earlier co-founded Ardent Cell Technologies, a cell therapeutics company in the diabetes space, where the technology is undergoing human clinical trials.

avatar for Tasha Nagamine, MS, PhD

Tasha Nagamine, MS, PhD

Chief AI Officer, Droice Labs

Thursday August 22, 2019 TBA

TBA

 Reducing ML prediction uncertainty with systems-thinking in high-stakes events

This presentation will cover our current research on an unsolved problem – how to apply systems thinking to reduce prediction uncertainty from machine learning models applied to high-consequence outcomes. The specific focus of this presentation will be on machine damage progression to catastrophic failure. Every wrong prediction about the health of machine can either lead to a machine failure costing millions of dollars or causing an operator to stop a machine thinking it is damaged when it is not – both scenarios are bad. This presentation will focus on a specific kind of large machine – wind turbines and discuss our current challenges dealing with prediction uncertainty and related adverse business outcomes from trying to do this for thousands of such turbines.

The presentation will show our attempts at building a system to represent the entire system of factors which cause large machines to have damage progression and eventual failure. The presenter will try to illustrate why this is hard to build and deploy as features in machine learning models and discuss our various attempts at it. Uncertainty is a less-talked-about topic in machine learning but it is critical when these algorithms directly impact the physical world. In terms of technical content, this will touch upon topics such as signal-to-noise ratio in stochastic time-series, difference in spatial and temporal resolution of various data sources, uncertainty quantification and propagation framework (e.g. Kalman Filters) for variety of machine learning models and last but not the least, adverse impact of benign human habits – we will cover ideas that we have tried to deploy to quantify and detect these issues.

The overall goal of this presentation is to (a) describe a hard, unsolved problem of significant consequence not only economically but also on one of the greatest challenges facing us – climate change and, (b) simulate a discussion and ideas-exchange in the wider community.


Speakers

avatar for Vijayant Kumar, PhD

Vijayant Kumar, PhD

Vice President – Data Science & Engineering, Sentient Science
Vijayant Kimar leads the predictive analytics team at Sentient Science and is focused on using data science and physics-driven modeling to provide diagnostics and prognostics to allow optimized predictive maintenance of large machinery.

Thursday August 22, 2019 TBA

TBA

 Regularization and Functional Methods to Predict Nutrients

Mineral nutrients play an important role in the biochemistry of grapevine and its growth. Grapevines are known to store significant quantities of certain nutrients to overcome their short-term scarcities in the soil. Hence viticulturists have developed a lot of interest in studying the relationship between the biochemistry of the leaf/petiole and its spectral reflectance to understand the fruit ripening rate, water status, nutrient levels, and disease.

The dataset obtained by measuring the spectral reflectance, defined as the ratio of backscattered radiance from a surface and the incident radiance on that surface, directly over the leaves during the bloom period of growth data in the wavelength region of 330 – 2500 nanometers. This will yield a high dimensional reflectance data with an ill-conditioned covariance matrix. Four regularization and one functional regression method is compared to improve the estimation accuracy and enhance the model interpretability by selecting continuous, unbiased, sparse and useful variables (wavelengths).


Speakers

avatar for Uday Jha, MS

Uday Jha, MS

Full-Time Lecturer, University of Massachusetts, Dartmouth
Uday Jha teaches Business Statistics and Business Analytics at University of Massachusetts, Dartmouth.

Thursday August 22, 2019 TBA

TBA

 Technical drivers of cloud centralization and megacorporate domination

Find out why data is becoming more centralized and how analytics drive the collection of data.

Speakers

avatar for Andrew Oram

Andrew Oram

Editor, O’Reilly Media
Andy Oram brought to publication O’Reilly’s Linux series, the ground-breaking book Peer-to-Peer, and the best-seller Beautiful Code. Andy has also authored many reports on technical topics such as data lakes and open source software.

Thursday August 22, 2019 TBA

TBA

 What do you do with limited data?

While businesses are utilizing big data in predictive analytics, what can you do when data is limited in quantity or quality? Come to this session to learn best practices in generating insights from insurance companies who have implemented analytics solutions in sales, marketing, operations, claims, finance. You are encouraged to submit questions ahead of time.

Speakers

avatar for Nirav Dagli

Nirav Dagli

President and CEO, Spinnaker Analytics LLC
Nirav is the founder and CEO of Spinnaker Analytics. He was a partner at Oliver Wyman. Worked at MITRE Corporation. He serves on the boards of financial services and energy companies and as Chairman of the Boston Children’s Museum board.

avatar for Manish Gupta

Manish Gupta

Data Scientist, Spinnaker Analytics LLC

Thursday August 22, 2019 TBA

TBA

 A Bayesian Look at Clinical Risk Prediction

Building reliable predictive and prognostic models that leverage the growing scale of medical data and clinical records can make tremendous impact to the healthcare industry. Traditional survival analysis originated from clinical research focuses on identifying variates and factors that affect the hazard function; time series modelling approaches emphasize predicting future values based on previous observations; machine learning models are often formulated as mapping between large feature spaces to binary outcomes. However, all of these methods have their unique limitations and there are still much more to explore in the context of treating clinical survival analysis with machine learning models. In this talk, we will present our recent work on a new Bayesian framework that uniquely connects machine learning tasks (classification/regression) with event time analysis to provide risk prediction capabilities. We validate and demonstrate the utility of this approach with simulation data where the ground truths are known. We will then show a specific use case of this approach to perform risk prediction with real medical datasets. We will also discuss how this model can be implemented into clinical solutions.

Speakers

avatar for Kang Liu

Kang Liu

Applied Data Scientist, Wolters Kluwer Health
Dr. Kang Liu graduated with his Ph.D. from Boston University in 2013 and now works as an Applied Data scientist at Wolters Kluwer Health where he builds machine learning models for the early prediction of hospital-acquired infection.

Thursday August 22, 2019 TBA

TBA

 Advancing Cancer Research with Deep Learning Image Analysis

Histopathological images are the gold standard tool for cancer diagnosis, whose interpretation requires manual inspection by expert pathologists. This process is time-consuming for the patients and subject to human error. Recent advances in deep learning models, particularly convolutional neural networks, combined with big databases of patient histopathology images will pave the path for cancer researchers to create more accurate guiding tools for pathologists. In this talk, I will review the latest advances of big data in healthcare analytics and focus on deep learning applications in cancer research. Targeted at a general audience, I will provide a high-level overview of technical concepts in deep learning image analysis, and describe a typical cloud-based workflow for tackling such big data problems. I will conclude my talk by sharing some of our most recent results based on a wide range of cancer types.

Speakers

avatar for Mohammad Soltanieh-ha

Mohammad Soltanieh-ha

Clinical Assistant Professor, Boston University – Questrom
Mohammad is a faculty at Boston University, Questrom School of Business, where he teaches data analytics and big data to master’s students. Mohammad’s current research area involves deep learning and its applications in cancer research.

Thursday August 22, 2019 TBA

TBA

 Applying Artificial Intelligence to Python Coding Courses

A pilot was run at Southern New Hampshire University testing the efficacy of personalized feedback in sections of an online Introduction to Python course.  We saw a statistically significant increase in student submission and success rates, and are continuing the pilot as we continue to improve the course.

Speakers

avatar for Candace Sleeman, MS, PhD

Candace Sleeman, MS, PhD

STEM Technical Program Facilitator, Southern New Hampshire University
Dr. Sleeman specializes in technical research team leadership, big data analytics and applied data science. She holds a PhD in Mathematics from Drexel University, and is a member of the Board of Trustees for the NorthEast Regional Computing Program.

Thursday August 22, 2019 TBA

TBA

 Deep Learning Framework for Joint POI discovery & Scene Classification of ground level imagery

We propose a Deep Learning framework that focuses on the utilization of geotagged ground-level imagery for the purpose of scene classification and accurate identification of Points of Interest (POIs) categories (e.g. restaurants, hotels, and schools, etc.) so as to augment efforts in improving location intelligence, such as context-aware POI mapping and for improving land use classification.

Speakers

avatar for Seema Chouhan

Seema Chouhan

Manager, Genpact
Seema is currently a machine learning research associate at ORNL. She pursued her B.S. & M.S. in Chemical Engineering from Indian Institute of Technology Delhi and pursued M.S. in Environmental Science from the University of Massachusetts Amherst.

Thursday August 22, 2019 TBA

TBA

 Enterprise Studio Collections: Machine Learning at Scale

The Disease Prediction and Progression OptumIQ Studio Collection contains over 200 models built to allow earlier detection of at-risk individuals, enabling providers to intervene more effectively, among many other use cases. These models span two lines of business, four modeling domains, and 25 distinct conditions. In order to build this Collection, we first needed to construct a generalized modeling framework. Leveraging the Customer Reporting Mart data and the cloud-computing resources made available via the OptumIQStudio Workbench, we developed a scalable modeling pipeline, reducing the time required to train hundreds of supervised, machine learning models to a matter of days.

Speakers

avatar for Ahmed Kayal, MS

Ahmed Kayal, MS

Data Scientist, Optum
As an Optum Data Scientist, Ahmed’s work centers around the development and implementation of machine learning models in the Healthcare space. With an interest in improving patient outcomes, he is largely focused on disease progression use cases.

Thursday August 22, 2019 TBA

TBA

 Future of HPC & AI Convergence

The talk covers the convergence of HPC & AI, then provides examples of AI workloads in Imaging, High Content Screening, de novo Chemical Structure Generation, and Genomics.

Speakers

avatar for Michael McManus

Michael McManus

Principal Engineer, Intel
Dr. McManus has over 30 years of scientific software/hardware and business experience.

Thursday August 22, 2019 TBA

TBA

 Rapid Data Science

Most companies today require fast, traceable, and actionable answers to their data questions. This talk will present the structure of the data science process along with cutting edge developments in computing and data science technology (DST) with direct applications to real world problems (with a lot of pictures!). Everything from modeling to team building will be discussed, with clear business applications.

Speakers

avatar for Erez Kaminski

Erez Kaminski

Leaders Global Operations Fellow, MIT
Erez has spent his career helping companies solve problems using data science. He is currently a graduate student in computer science and business at MIT. Previously, he worked in data science at Amgen Inc. and as a technologist at Wolfram Research.

Thursday August 22, 2019 TBA

TBA

 Recommendation systems modeling

This presentation will go over some key components of successful recommendation system building, including algorithm selection, accuracy metrics selection and other modeling aspects applicable across multiple industries.

Speakers

avatar for Lily Lavitas

Lily Lavitas

Senior Data Scientist, TripAdvisor
Lily Lavitas is a Data Scientist with a Ph.D. in Statistics and 8+ years of professional experience in various companies, including a start-up, a Fortune 100 company, and Amazon.

Thursday August 22, 2019 TBA

TBA

 Shapelets via Procrustes Tangent Distance matching

Shapelets are compact abstracted feature descriptors mined from time series. They have been used for qualitative and high school cross-correlation of series, and for classifying phenomena and behaviors in evidence within series, particularly if the record is of human activity. Traditionally these are devised by hand, with domain knowledge, and then procedures like variable selection used to winnow and tune them. Automatic discovery has been explored, using techniques like Dynamic Time Warping, but these can be expensive. Matching using Procrustes Tangent Distance (Dryden, Mardia, Kent, Rohlf, 1993, 1994, 1999) is introduced here, and libraries of features are built by auto-correlating series using this operator. Libraries from different series can then be compared to discern similarities and generalize. Applications to electricity consumption series and to hydrological stream flow are used to illustrate.

Speakers

avatar for Jan Galkowski

Jan Galkowski

Statistician, Quantitative Engineer, Westwood Statistical Studios
Professionally, I do Internet sociology; I also volunteer statistical skills for sustainability and environmental purposes and causes; I am also teaching an online course “Climate Science for Climate Activists.”

Thursday August 22, 2019 TBA

TBA

 Anomaly Detection Introduction

If you have a data pipeline, you will have data anomalies in it sooner orlater.  If you don’t notice them, at best you’ll get degraded model performance, and at worst you’ll miss important changes in your business environment.

In this talk you will learn how to apply regression and dimensionality reduction techniques to the anomaly detection problem. We’ll also discuss investigation and diagnosis of the potential anomaly once it has been detected. Don’t be intimidated by all the papers about novel anomaly detection algorithms – idiosyncratic algorithms are not equired, and in fact can sometimes produce warnings that are harder to diagnose than using simpler techniques.

To benefit from this talk, attendees should have a conceptual familiarity with at least one of linear regression, principal component analysis, and chi-squared tests.


Speakers

avatar for Terran Melconian, SM

Terran Melconian, SM

Data Science Trainer and Consultant, Independent
Terran has worked in all aspects of the data lifecycle at companies like TripAdvisor and Google: software, ranking, warehousing, and modeling. He is now a data science educator and consultant, sharing his knowledge with developing professionals.

Thursday August 22, 2019 TBA

TBA

 Scaling ML/AI analytic execution for production datasets

Technical overview of how we are scaling dozens of ML models to run on tens of millions of patients data using Azure, Kubernetes, and Spark.

Speakers

avatar for John Lavoie

John Lavoie

Sr Principal Engineer, Optum
John Lavoie is a Senior Principal Engineer at Optum focused on scaling up analytics for Optum-sized datasets.  Major projects include the Analytics Common Capability and Optum IQ Studio.

Thursday August 22, 2019 TBA

TBA

 The NPU Era

Behind the use of artificial intelligence capabilities is a new and foundational piece of technology: the Neural Processing Unit. These AI-only processors are changing the rules for machine learning power and affordability, creating new and ideal conditions for intelligent devices. In this talk, we will explore the history, recent breakthroughs, and future impact, covering everything you need to know about the Age of the NPU.

Speakers

avatar for Dan Abdinoor

Dan Abdinoor

CEO & Cofounder, Fritz
Dan Abdinoor is CEO and Cofounder of Fritz, an AI platform that enables mobile apps to see, hear, sense, and think. Dan leads teams in the Boston tech startup community, and previously scaled businesses at HubSpot, BabbaCo, Wyth, and Jana.

Thursday August 22, 2019 TBA
Advertisements

Read Full Post »


eProceedings for BIO 2019 International Convention, June 3-6, 2019 Philadelphia Convention Center; Philadelphia PA, Real Time Coverage by Stephen J. Williams, PhD @StephenJWillia2

 

CONFERENCE OVERVIEW

Real Time Coverage of BIO 2019 International Convention, June 3-6, 2019 Philadelphia Convention Center; Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/05/31/real-time-coverage-of-bio-international-convention-june-3-6-2019-philadelphia-convention-center-philadelphia-pa/

 

LECTURES & PANELS

Real Time Coverage @BIOConvention #BIO2019: Machine Learning and Artificial Intelligence: Realizing Precision Medicine One Patient at a Time, 6/5/2019, Philadelphia PA

Reporter: Stephen J Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-machine-learning-and-artificial-intelligence-realizing-precision-medicine-one-patient-at-a-time/

 

Real Time Coverage @BIOConvention #BIO2019: Genome Editing and Regulatory Harmonization: Progress and Challenges, 6/5/2019. Philadelphia PA

Reporter: Stephen J Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-genome-editing-and-regulatory-harmonization-progress-and-challenges/

 

Real Time Coverage @BIOConvention #BIO2019: Precision Medicine Beyond Oncology June 5, 2019, Philadelphia PA

Reporter: Stephen J Williams PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-precision-medicine-beyond-oncology-june-5-philadelphia-pa/

 

Real Time @BIOConvention #BIO2019:#Bitcoin Your Data! From Trusted Pharma Silos to Trustless Community-Owned Blockchain-Based Precision Medicine Data Trials, 6/5/2019, Philadelphia PA

Reporter: Stephen J Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-bioconvention-bio2019bitcoin-your-data-from-trusted-pharma-silos-to-trustless-community-owned-blockchain-based-precision-medicine-data-trials/

 

Real Time Coverage @BIOConvention #BIO2019: Keynote Address Jamie Dimon CEO @jpmorgan June 5, 2019, Philadelphia, PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-keynote-address-jamie-dimon-ceo-jpmorgan-june-5-philadelphia/

 

Real Time Coverage @BIOConvention #BIO2019: Chat with @FDA Commissioner, & Challenges in Biotech & Gene Therapy June 4, 2019, Philadelphia, PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-chat-with-fda-commissioner-challenges-in-biotech-gene-therapy-june-4-philadelphia/

 

Falling in Love with Science: Championing Science for Everyone, Everywhere June 4 2019, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-falling-in-love-with-science-championing-science-for-everyone-everywhere/

 

Real Time Coverage @BIOConvention #BIO2019: June 4 Morning Sessions; Global Biotech Investment & Public-Private Partnerships, 6/4/2019, Philadelphia PA

Reporter: Stephen J Williams PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-june-4-morning-sessions-global-biotech-investment-public-private-partnerships/

 

Real Time Coverage @BIOConvention #BIO2019: Understanding the Voices of Patients: Unique Perspectives on Healthcare; June 4, 2019, 11:00 AM, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-understanding-the-voices-of-patients-unique-perspectives-on-healthcare-june-4/

 

Real Time Coverage @BIOConvention #BIO2019: Keynote: Siddhartha Mukherjee, Oncologist and Pulitzer Author; June 4 2019, 9AM, Philadelphia PA

Reporter: Stephen J. Williams, PhD. @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-keynote-siddhartha-mukherjee-oncologist-and-pulitzer-author-june-4-9am-philadelphia-pa/

 

Real Time Coverage @BIOConvention #BIO2019:  Issues of Risk and Reproduceability in Translational and Academic Collaboration; 2:30-4:00 June 3, 2019, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-issues-of-risk-and-reproduceability-in-translational-and-academic-collaboration-230-400-june-3-philadelphia-pareal-time-coverage-bioconvention-bi/

 

Real Time Coverage @BIOConvention #BIO2019: What’s Next: The Landscape of Innovation in 2019 and Beyond. 3-4 PM June 3, 2019, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-whats-next-the-landscape-of-innovation-in-2019-and-beyond-3-4-pm-june-3-philadelphia-pa/

 

Real Time Coverage @BIOConvention #BIO2019: After Trump’s Drug Pricing Blueprint: What Happens Next? A View from Washington; June 3, 2019 1:00 PM, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-after-trumps-drug-pricing-blueprint-what-happens-next-a-view-from-washington-june-3-2019-100-pm-philadelphia-pa/

 

Real Time Coverage @BIOConvention #BIO2019: International Cancer Clusters Showcase June 3, 2019, Philadelphia PA

Reporter: Stephen J. Williams PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-international-cancer-clusters-showcase-june-3-philadelphia-pa/

Read Full Post »


Real Time Coverage @BIOConvention #BIO2019: Machine Learning and Artificial Intelligence: Realizing Precision Medicine One Patient at a Time

Reporter: Stephen J Williams, PhD @StephenJWillia2

The impact of Machine Learning (ML) and Artificial Intelligence (AI) during the last decade has been tremendous. With the rise of infobesity, ML/AI is evolving to an essential capability to help mine the sheer volume of patient genomics, omics, sensor/wearables and real-world data, and unravel the knot of healthcare’s most complex questions.

Despite the advancements in technology, organizations struggle to prioritize and implement ML/AI to achieve the anticipated value, whilst managing the disruption that comes with it. In this session, panelists will discuss ML/AI implementation and adoption strategies that work. Panelists will draw upon their experiences as they share their success stories, discuss how to implement digital diagnostics, track disease progression and treatment, and increase commercial value and ROI compared against traditional approaches.

  • most of trials which are done are still in training AI/ML algorithms with training data sets.  The best results however have been about 80% accuracy in training sets.  Needs to improve
  • All data sets can be biased.  For example a professor was looking at heartrate using a IR detector on a wearable but it wound up that different types of skin would generate a different signal to the detector so training sets maybe population biases (you are getting data from one group)
  • clinical grade equipment actually haven’t been trained on a large set like commercial versions of wearables, Commercial grade is tested on a larger study population.  This can affect the AI/ML algorithms.
  • Regulations:  The regulatory bodies responsible is up to debate.  Whether FDA or FTC is responsible for AI/ML in healtcare and healthcare tech and IT is not fully decided yet.  We don’t have the guidances for these new technologies
  • some rules: never use your own encryption always use industry standards especially when getting personal data from wearables.  One hospital corrupted their system because their computer system was not up to date and could not protect against a virus transmitted by a wearable.
  • pharma companies understand they need to increase value of their products so very interested in how AI/ML can be used.

Please follow LIVE on TWITTER using the following @ handles and # hashtags:

@Handles

@pharma_BI

@AVIVA1950

@BIOConvention

# Hashtags

#BIO2019 (official meeting hashtag)

Read Full Post »


Real Time Coverage of BIO 2019 International Convention, June 3-6, 2019 Philadelphia Convention Center, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

Please follow LIVE on TWITTER using the following @ handles and # hashtags:

@Handles

@pharma_BI

@AVIVA1950

@BIOConvention

# Hashtags

#BIO2019 (official meeting hashtag)

Please check daily on this OPEN ACCESS JOURNAL for updates on one of the most important BIO Conferences of the year for meeting notes, posts, as well as occasional PODCASTS.

 

The BIO International Convention is the largest global event for the biotechnology industry and attracts the biggest names in biotech, offers key networking and partnering opportunities, and provides insights and inspiration on the major trends affecting the industry. The event features keynotes and sessions from key policymakers, scientists, CEOs, and celebrities.  The Convention also features the BIO Business Forum (One-on-One Partnering), hundreds of sessions covering biotech trends, policy issues and technological innovations, and the world’s largest biotechnology exhibition – the BIO Exhibition.

The BIO International Convention is hosted by the Biotechnology Innovation Organization (BIO). BIO represents more than 1,100 biotechnology companies, academic institutions, state biotechnology centers and related organizations across the United States and in more than 30 other nations. BIO members are involved in the research and development of innovative healthcare, agricultural, industrial and environmental biotechnology products.

 

Keynote Speakers INCLUDE:

Fireside Chat with Margaret (Peggy) Hamburg, MD, Foreign Secretary, National Academy of Medicine; Chairman of the Board, American Association for the Advancement of Science

Tuesday Keynote: Siddhartha Mukherjee (Author of the bestsellers Emperor of All Maladies: A Biography of Cancer and  The Gene: An Intimate History)

Fireside Chat with Jeffrey Solomon, Chief Executive Officer, COWEN

Fireside Chat with Christi Shaw, Senior Vice President and President, Lilly BIO-Medicines, Eli Lilly and Company

Wednesday Keynote: Jamie Dimon (Chairman JP Morgan Chase)

Fireside Chat with Kenneth C. Frazier, Chairman of the Board and Chief Executive Officer, Merck & Co., Inc.

Fireside Chat: Understanding the Voices of Patients: Unique Perspectives on Healthcare

Fireside Chat: FDA Town Hall

 

ALSO SUPERSESSIONS including:

Super Session: What’s Next: The Landscape of Innovation in 2019 and Beyond

Super Session: Falling in Love with Science: Championing Science for Everyone, Everywhere

Super Session: Digital Health in Practice: A Conversation with Ameet Nathawani, Chief Digital Officer, Chief Medical Falling in Love with Science: Championing Science for Everyone, Everywhere

Super Session: Realizing the Promise of Gene Therapies for Patients Around the World

Super Session: Biotech’s Contribution to Innovation: Current and Future Drivers of Success

Super Session: The Art & Science of R&D Innovation and Productivity

Super Session: Dealmaker’s Intentions: 2019 Market Outlook

Super Session: The State of the Vaccine Industry: Stimulating Sustainable Growth

 

See here for full AGENDA

Link for Registration: https://convention.bio.org/register/

The BIO International Convention is literally where hundreds of deals and partnerships have been made over the years.

 

BIO performs many services for members, but none of them are more visible than the BIO International Convention. The BIO International Convention helps BIO fulfill its mission to help grow the global biotech industry. Profits from the BIO International Convention are returned to the biotechnology industry by supporting BIO programs and initiatives. BIO works throughout the year to create a policy environment that enables the industry to continue to fulfill its vision of bettering the world through biotechnology innovation.

The key benefits of attending the BIO International Convention are access to global biotech and pharma leaders via BIO One-on-One Partnering, exposure to industry though-leaders with over 1,500 education sessions at your fingertips, and unparalleled networking opportunities with 16,000+ attendees from 74 countries.

In addition, we produce BIOtechNOW, an online blog chronicling ‘innovations transforming our world’ and the BIO Newsletter, the organization’s bi-weekly email newsletter. Subscribe to the BIO Newsletter.

 

Membership with the Biotechnology Innovation Organization (BIO)

BIO has a diverse membership that is comprised of  companies from all facets of biotechnology. Corporate R&D members range from entrepreneurial companies developing a first product to Fortune 100 multinationals. The majority of our members are small companies – 90 percent have annual revenues of $25 million or less, reflecting the broader biotechnology industry. Learn more about how you can save with BIO Membership.

BIO also represents academic centers, state and regional biotech associations and service providers to the industry, including financial and consulting firms.

  • 66% R&D-Intensive Companies *Of those: 89% have annual revenues under $25 million,  4% have annual revenues between $25 million and $1 billion, 7% have annual revenues over $1 billion.
  • 16% Nonprofit/Academic
  • 11% Service Providers
  • 7% State/International Affiliate Organizations

Other posts on LIVE CONFERENCE COVERAGE using Social Media on this OPEN ACCESS JOURNAL and OTHER Conferences Covered please see the following link at https://pharmaceuticalintelligence.com/press-coverage/

 

Notable Conferences Covered THIS YEAR INCLUDE: (see full list from 2013 at this link)

  • Koch Institute 2019 Immune Engineering Symposium, January 28-29, 2019, Kresge Auditorium, MIT

https://calendar.mit.edu/event/immune_engineering_symposium_2019#.XBrIDc9Kgcg

http://kochinstituteevents.cvent.com/events/koch-institute-2019-immune-engineering-symposium/event-summary-8d2098bb601a4654991060d59e92d7fe.aspx?dvce=1

 

  • 2019 MassBio’s Annual Meeting, State of Possible Conference ​, March 27 – 28, 2019, Royal Sonesta, Cambridge

http://files.massbio.org/file/MassBio-State-Of-Possible-Conference-Agenda-Feb-22-2019.pdf

 

  • World Medical Innovation Forum, Partners Innovations, ARTIFICIAL INTELLIGENCE | APRIL 8–10, 2019 | Westin, BOSTON

https://worldmedicalinnovation.org/agenda-list/

https://worldmedicalinnovation.org/

 

  • 18th Annual 2019 BioIT, Conference & Expo, April 16-18, 2019, Boston, Seaport World Trade Center, Track 5 Next-Gen Sequencing Informatics – Advances in Large-Scale Computing

http://www.giiconference.com/chi653337/

https://pharmaceuticalintelligence.com/2019/04/22/18th-annual-2019-bioit-conference-expo-april-16-18-2019-boston-seaport-world-trade-center-track-5-next-gen-sequencing-informatics-advances-in-large-scale-computing/

 

  • Translating Genetics into Medicine, April 25, 2019, 8:30 AM – 6:00 PM, The New York Academy of Sciences, 7 World Trade Center, 250 Greenwich St Fl 40, New York

https://pharmaceuticalintelligence.com/2019/04/25/translating-genetics-into-medicine-april-25-2019-830-am-600-pm-the-new-york-academy-of-sciences-7-world-trade-center-250-greenwich-st-fl-40-new-york/

 

  • 13th Annual US-India BioPharma & Healthcare Summit, May 9, 2019, Marriott, Cambridge

https://pharmaceuticalintelligence.com/2019/04/30/13th-annual-biopharma-healthcare-summit-thursday-may-9-2019/

 

  • 2019 Petrie-Flom Center Annual Conference: Consuming Genetics: Ethical and Legal Considerations of New Technologies, May 17, 2019, Harvard Law School

http://petrieflom.law.harvard.edu/events/details/2019-petrie-flom-center-annual-conference

https://pharmaceuticalintelligence.com/2019/01/11/2019-petrie-flom-center-annual-conference-consuming-genetics-ethical-and-legal-considerations-of-new-technologies/

 

  • 2019 Koch Institute Symposium – Machine Learning and Cancer, June 14, 2019, 8:00 AM-5:00 PM  ET MIT Kresge Auditorium, 48 Massachusetts Ave, Cambridge, MA

https://pharmaceuticalintelligence.com/2019/03/12/2019-koch-institute-symposium-machine-learning-and-cancer-june-14-2019-800-am-500-pmet-mit-kresge-auditorium-48-massachusetts-ave-cambridge-ma/

 

Read Full Post »


Seven Alternative Designs to Quantum Computing Platform – The Race by IBM, Google, Microsoft, and Others

 

Reporter: Aviva Lev-Ari, PhD, RN

 

Business Bets on a Quantum Leap

Quantum computing could help companies address problems as huge as supply chains and climate change. Here’s how IBM, Google, Microsoft, and others are racing to bring the tech from theory to practice.
May 21, 2019

quantum computer at IonQ, an Alphabet-backed startup

A version of this article appears in the June 2019 issue of Fortune with the headline “The Race for Quantum Domination.”

Medicine

One day, your health may depend on a quantum leap.

  • Pharmaceutical giant Biogen teamed up with consultancy Accenture and startup 1QBit on a quantum computing experiment in 2017 aimed at molecular modeling, one of the more complex disciplines in medicine. The goal: finding candidate drugs to treat neurodegenerative diseases.
  • Microsoft is collaborating with Case Western Reserve University to improve the accuracy of MRI machines, which help detect cancer, using so-called quantum-inspired algorithms.

 

7 ways to win the quantum race

There are multiple ways that quantum computing could work.

Here’s a guide to which companies are backing which tech.

Superconducting uses an electrical current, flowing through special semiconductor chips cooled to near absolute zero, to produce computational “qubits.” Google, IBM, and Intel are pursuing this approach, which has so far been the front-runner.

Ion trap relies on charged atoms that are manipulated by lasers in a vacuum, which helps to reduce noisy interference that can contribute to errors. Industrial giant Honeywell is betting on this technique. So is IonQ, a startup with backing from Alphabet.

Neutral Atom Similar to the ion-trap method, except it uses, you guessed it, neutral atoms. Physicist Mikhail Lukin’s lab at Harvard is a pioneer.

Annealing designed to find the lowest-energy (and therefore speediest) solutions to math problems. Canadian firm D-Wave has sold multimillion-dollar machines based on the idea to Google and NASA. They’re fast, but skeptics question whether they qualify as “quantum.”

Silicon spin uses single electrons trapped in transistors. Intel is hedging its bets between the more mature superconducting qubits and this younger, equally semiconductor-friendly method.

Topological uses exotic, highly stable quasi-particles called “anyons.” Microsoft deems this unproven moonshot as the best candidate in the long run, though the company has yet to produce a single one.

Photonics uses light particles sent through special silicon chips. The particles interact with one another very little (good), but can scatter and disappear (bad). Three-year-old stealth startup Psi Quantum is tinkering away on this idea.

SOURCE

http://fortune.com/longform/business-quantum-computing/

 

Other related articles published in this Open Access Online Scientific Journal include the following:

 

  • R&D for Artificial Intelligence Tools & Applications: Google’s Research Efforts in 2018

Reporter: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2019/01/16/rd-for-artificial-intelligence-tools-applications-googles-research-efforts-in-2018/

 

  • LIVE Day Two – World Medical Innovation Forum ARTIFICIAL INTELLIGENCE, Boston, MA USA, Monday, April 9, 2019

www.worldmedicalinnovation.org

https://pharmaceuticalintelligence.com/2019/04/09/live-day-two-world-medical-innovation-forum-artificial-intelligence-boston-ma-usa-monday-april-9-2019/

 

  • Research and Development (R&D) Expenditure by Country represent time, capital, and effort being put into researching and designing the products of the future – Data from the UNESCO Institute for Statistics adjusted for purchasing-power parity (PPP).

Reporter: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2019/05/26/research-and-development-rd-expenditure-by-country-represent-time-capital-and-effort-being-put-into-researching-and-designing-the-products-of-the-future-data-from-the-unesco-institute-for-s/

 

  • Resources on Artificial Intelligence in Health Care and in Medicine: Articles of Note at PharmaceuticalIntelligence.com @AVIVA1950 @pharma_BI

https://www.linkedin.com/pulse/resources-artificial-intelligence-health-care-note-lev-ari-phd-rn/

 

  • IBM’s Watson Health division – How will the Future look like?I

Reporter: Aviva Lev-Ari, PhD, RN

https://pharmaceuticalintelligence.com/2019/04/24/ibms-watson-health-division-how-will-the-future-look-like/

Read Full Post »


BioInformatic Resources at the Environmental Protection Agency: Tools and Webinars on Toxicity Prediction

Curator Stephen J. Williams Ph.D.

New GenRA Module in EPA’s CompTox Dashboard Will Help Predict Potential Chemical Toxicity

Published September 25, 2018

As part of its ongoing computational toxicology research, EPA is developing faster and improved approaches to evaluate chemicals for potential health effects.  One commonly applied approach is known as chemical read-across. Read-across uses information about how a chemical with known data behaves to make a prediction about the behavior of another chemical that is “similar” but does not have as much data. Current read-across, while cost-effective, relies on a subjective assessment, which leads to varying predictions and justifications depending on who undertakes and evaluates the assessment.

To reduce uncertainties and develop a more objective approach, EPA researchers have developed an automated read-across tool called Generalized Read-Across (GenRA), and added it to the newest version of the EPA Computational Toxicology Dashboard. The goal of GenRA is to encode as many expert considerations used within current read-across approaches as possible and combine these with data-driven approaches to transition read-across towards a more systematic and data-based method of making predictions.

EPA chemist Dr. Grace Patlewicz says it was this uncertainty that motivated the development of GenRA. “You don’t actually know if you’ve been successful at using read-across to help predict chemical toxicity because it’s a judgement call based on one person versus the next. That subjectivity is something we were trying to move away from.” Patlewicz says.

Since toxicologists and risk assessors are already familiar with read-across, EPA researchers saw value in creating a tool that that was aligned with the current read-across workflow but which addressed uncertainty using data analysis methods in what they call a “harmonized-hybrid workflow.”

In its current form, GenRA lets users find analogues, or chemicals that are similar to their target chemical, based on chemical structural similarity. The user can then select which analogues they want to carry forward into the GenRA prediction by exploring the consistency and concordance of the underlying experimental data for those analogues. Next, the tool predicts toxicity effects of specific repeated dose studies. Then, a plot with these outcomes is generated based on a similarity-weighted activity of the analogue chemicals the user selected. Finally, the user is presented with a data matrix view showing whether a chemical is predicted to be toxic (yes or no) for a chosen set of toxicity endpoints, with a quantitative measure of uncertainty.

The team is also comparing chemicals based on other similarity contexts, such as physicochemical characteristics or metabolic similarity, as well as extending the approach to make quantitative predictions of toxicity.

Patlewicz thinks incorporating other contexts and similarity measures will refine GenRA to make better toxicity predictions, fulfilling the goal of creating a read-across method capable of assessing thousands of chemicals that currently lack toxicity data.

“That’s the direction that we’re going in,” Patlewicz says. “Recognizing where we are and trying to move towards something a little bit more objective, showing how aspects of the current read-across workflow could be refined.”

Learn more at: https://comptox.epa.gov

 

A listing of EPA Tools for Air Quality Assessment

Tools

  • Atmospheric Model Evaluation Tool (AMET)
    AMET helps in the evaluation of meteorological and air quality simulations.
  • Benchmark Dose Software (BMDS)
    EPA developed the Benchmark Dose Software (BMDS) as a tool to help estimate dose or exposure of a chemical or chemical mixture associated with a given response level. The methodology is used by EPA risk assessors and is fast becoming the world’s standard for dose-response analysis for risk assessments, including air pollution risk assessments.
  • BenMAP
    BenMAP is a Windows-based computer program that uses a Geographic Information System (GIS)-based to estimate the health impacts and economic benefits occurring when populations experience changes in air quality.
  • Community-Focused Exposure and Risk Screening Tool (C-FERST)
    C-FERST is an online tool developed by EPA in collaboration with stakeholders to provide access to resources that can be used with communities to help identify and learn more about their environmental health issues and explore exposure and risk reduction options.
  • Community Health Vulnerability Index
    EPA scientists developed a Community Health Vulnerability Index that can be used to help identify communities at higher health risk from wildfire smoke. Breathing smoke from a nearby wildfire is a health threat, especially for people with lung or heart disease, diabetes and high blood pressure as well as older adults, and those living in communities with poverty, unemployment and other indicators of social stress. Health officials can use the tool, in combination with air quality models, to focus public health strategies on vulnerable populations living in areas where air quality is impaired, either by wildfire smoke or other sources of pollution. The work was published in Environmental Science & Technology.
  • Critical Loads Mapper Tool
    The Critical Loads Mapper Tool can be used to help protect terrestrial and aquatic ecosystems from atmospheric deposition of nitrogen and sulfur, two pollutants emitted from fossil fuel burning and agricultural emissions. The interactive tool provides easy access to information on deposition levels through time; critical loads, which identify thresholds when pollutants have reached harmful levels; and exceedances of these thresholds.
  • EnviroAtlas
    EnviroAtlas provides interactive tools and resources for exploring the benefits people receive from nature or “ecosystem goods and services”. Ecosystem goods and services are critically important to human health and well-being, but they are often overlooked due to lack of information. Using EnviroAtlas, many types of users can access, view, and analyze diverse information to better understand the potential impacts of various decisions.
  • EPA Air Sensor Toolbox for Citizen Scientists
    EPA’s Air Sensor Toolbox for Citizen Scientists provides information and guidance on new low-cost compact technologies for measuring air quality. Citizens are interested in learning more about local air quality where they live, work and play. EPA’s Toolbox includes information about: Sampling methodologies; Calibration and validation approaches; Measurement methods options; Data interpretation guidelines; Education and outreach; and Low cost sensor performance information.
  • ExpoFIRST
    The Exposure Factors Interactive Resource for Scenarios Tool (ExpoFIRST) brings data from EPA’s Exposure Factors Handbook: 2011 Edition (EFH) to an interactive tool that maximizes flexibility and transparency for exposure assessors. ExpoFIRST represents a significant advance for regional, state, and local scientists in performing and documenting calculations for community and site-specific exposure assessments, including air pollution exposure assessments.
  • EXPOsure toolbox (ExpoBox)
    This is a toolbox created to assist individuals from within government, industry, academia, and the general public with assessing exposure, including exposure to air contaminants, fate and transport processes of air pollutants and their potential exposure concentrations. It is a compendium of exposure assessment tools that links to guidance documents, databases, models, reference materials, and other related resources.
  • Federal Reference & Federal Equivalency Methods
    EPA scientists develop and evaluate Federal Reference Methods and Federal Equivalency Methods for accurately and reliably measuring six primary air pollutants in outdoor air. These methods are used by states and other organizations to assess implementation actions needed to attain National Ambient Air Quality Standards.
  • Fertilizer Emission Scenario Tool for CMAQ (FEST-C)
    FEST-C facilitates the definition and simulation of new cropland farm management system scenarios or editing of existing scenarios to drive Environmental Policy Integrated Climate model (EPIC) simulations.  For the standard 12km continental Community Multi-Scale Air Quality model (CMAQ) domain, this amounts to about 250,000 simulations for the U.S. alone. It also produces gridded daily EPIC weather input files from existing hourly Meteorology-Chemistry Interface Processor (MCIP) files, transforms EPIC output files to CMAQ-ready input files and links directly to Visual Environment for Rich Data Interpretation (VERDI) for spatial visualization of input and output files. The December 2012 release will perform all these functions for any CMAQ grid scale or domain.
  • Instruction Guide and Macro Analysis Tool for Community-led Air Monitoring 
    EPA has developed two tools for evaluating the performance of low-cost sensors and interpreting the data they collect to help citizen scientists, communities, and professionals learn about local air quality.
  • Integrated Climate and Land use Scenarios (ICLUS)
    Climate change and land-use change are global drivers of environmental change. Impact assessments frequently show that interactions between climate and land-use changes can create serious challenges for aquatic ecosystems, water quality, and air quality. Population projections to 2100 were used to model the distribution of new housing across the landscape. In addition, housing density was used to estimate changes in impervious surface cover.  A final report, datasets, the ICLUS+ Web Viewer and ArcGIS tools are available.
  • Indoor Semi-Volatile Organic Compound (i-SVOC)
    i-SVOC Version 1.0 is a general-purpose software application for dynamic modeling of the emission, transport, sorption, and distribution of semi-volatile organic compounds (SVOCs) in indoor environments. i-SVOC supports a variety of uses, including exposure assessment and the evaluation of mitigation options. SVOCs are a diverse group of organic chemicals that can be found in: Many are also present in indoor air, where they tend to bind to interior surfaces and particulate matter (dust).

    • Pesticides;
    • Ingredients in cleaning agents and personal care products;
    • Additives to vinyl flooring, furniture, clothing, cookware, food packaging, and electronics.
  • Municipal Solid Waste Decision Support Tool (MSW DST)EXIT
    This tool is designed to aid solid waste planners in evaluating the cost and environmental aspects of integrated municipal solid waste management strategies. The tool is the result of collaboration between EPA and RTI International and its partners.
  • Optical Noise-Reduction Averaging (ONA) Program Improves Black Carbon Particle Measurements Using Aethalometers
    ONA is a program that reduces noise in real-time black carbon data obtained using Aethalometers. Aethalometers optically measure the concentration of light absorbing or “black” particles that accumulate on a filter as air flows through it. These particles are produced by incomplete fossil fuel, biofuel and biomass combustion. Under polluted conditions, they appear as smoke or haze.
  • RETIGO tool
    Real Time Geospatial Data Viewer (RETIGO) is a free, web-based tool that shows air quality data that are collected while in motion (walking, biking or in a vehicle). The tool helps users overcome technical barriers to exploring air quality data. After collecting measurements, citizen scientists and other users can import their own data and explore the data on a map.
  • Remote Sensing Information Gateway (RSIG)
    RSIG offers a new way for users to get the multi-terabyte, environmental datasets they want via an interactive, Web browser-based application. A file download and parsing process that now takes months will be reduced via RSIG to minutes.
  • Simulation Tool Kit for Indoor Air Quality and Inhalation Exposure (IAQX)
    IAQX version 1.1 is an indoor air quality (IAQ) simulation software package that complements and supplements existing indoor air quality simulation (IAQ) programs. IAQX is for advanced users who have experience with exposure estimation, pollution control, risk assessment, and risk management. There are many sources of indoor air pollution, such as building materials, furnishings, and chemical cleaners. Since most people spend a large portion of their time indoors, it is important to be able to estimate exposure to these pollutants. IAQX helps users analyze the impact of pollutant sources and sinks, ventilation, and air cleaners. It performs conventional IAQ simulations to calculate the pollutant concentration and/or personal exposure as a function of time. It can also estimate adequate ventilation rates based on user-provided air quality criteria. This is a unique feature useful for product stewardship and risk management.
  • Spatial Allocator
    The Spatial Allocator provides tools that could be used by the air quality modeling community to perform commonly needed spatial tasks without requiring the use of a commercial Geographic Information System (GIS).
  • Traceability Protocol for Assay and Certification of Gaseous Calibration Standards
    This is used to certify calibration gases for ambient and continuous emission monitors. It specifies methods for assaying gases and establishing traceability to National Institute of Standards and Technology (NIST) reference standards. Traceability is required under EPA ambient and continuous emission monitoring regulations.
  • Watershed Deposition Mapping Tool (WDT)
    WDT provides an easy to use tool for mapping the deposition estimates from CMAQ to watersheds to provide the linkage of air and water needed for TMDL (Total Maximum Daily Load) and related nonpoint-source watershed analyses.
  • Visual Environment for Rich Data Interpretation (VERDI)
    VERDI is a flexible, modular, Java-based program for visualizing multivariate gridded meteorology, emissions, and air quality modeling data created by environmental modeling systems such as CMAQ and the Weather Research and Forecasting (WRF) model.

 

Databases

  • Air Quality Data for the CDC National Environmental Public Health Tracking Network 
    EPA’s Exposure Research scientists are collaborating with the Centers for Disease Control and Prevention (CDC) on a CDC initiative to build a National Environmental Public Health Tracking (EPHT) network. Working with state, local and federal air pollution and health agencies, the EPHT program is facilitating the collection, integration, analysis, interpretation, and dissemination of data from environmental hazard monitoring, and from human exposure and health effects surveillance. These data provide scientific information to develop surveillance indicators, and to investigate possible relationships between environmental exposures, chronic disease, and other diseases, that can lead to interventions to reduce the burden of theses illnesses. An important part of the initiative is air quality modeling estimates and air quality monitoring data, combined through Bayesian modeling that can be linked with health outcome data.
  • EPAUS9R – An Energy Systems Database for use with the Market Allocation (MARKAL) Model
    The EPAUS9r is a regional database representation of the United States energy system. The database uses the MARKAL model. MARKAL is an energy system optimization model used by local and federal governments, national and international communities and academia. EPAUS9r represents energy supply, technology, and demand throughout the major sectors of the U.S. energy system.
  • Fused Air Quality Surfaces Using Downscaling
    This database provides access to the most recent O3 and PM2.5 surfaces datasets using downscaling.
  • Health & Environmental Research Online (HERO)
    HERO provides access to scientific literature used to support EPA’s integrated science assessments, including the  Integrated Science Assessments (ISA) that feed into the National Ambient Air Quality (NAAQS) reviews.
  • SPECIATE 4.5 Database
    SPECIATE is a repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources.

A listing of EPA Tools and Databases for Water Contaminant Exposure Assessment

Exposure and Toxicity

  • EPA ExpoBox (A Toolbox for Exposure Assessors)
    This toolbox assists individuals from within government, industry, academia, and the general public with assessing exposure from multiple media, including water and sediment. It is a compendium of exposure assessment tools that links to guidance documents, databases, models, reference materials, and other related resources.

Chemical and Product Categories (CPCat) Database
CPCat is a database containing information mapping more than 43,000 chemicals to a set of terms categorizing their usage or function. The comprehensive list of chemicals with associated categories of chemical and product use was compiled from publically available sources. Unique use category taxonomies from each source are mapped onto a single common set of approximately 800 terms. Users can search for chemicals by chemical name, Chemical Abstracts Registry Number, or by CPCat terms associated with chemicals.

A listing of EPA Tools and Databases for Chemical Toxicity Prediction & Assessment

  • Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
    SeqAPASS is a fast, online screening tool that allows researchers and regulators to extrapolate toxicity information across species. For some species, such as humans, mice, rats, and zebrafish, the EPA has a large amount of data regarding their toxicological susceptibility to various chemicals. However, the toxicity data for numerous other plants and animals is very limited. SeqAPASS extrapolates from these data rich model organisms to thousands of other non-target species to evaluate their specific potential chemical susceptibility.

 

A listing of EPA Webinar and Literature on Bioinformatic Tools and Projects

Comparative Bioinformatics Applications for Developmental Toxicology

Discuss how the US EPA/NCCT is trying to solve the problem of too many chemicals, too high cost, and too much biological uncertainty Discuss the solution the ToxCast Program is proposing; a data-rich system to screen, classify and rank chemicals for further evaluation

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=186844

CHEMOINFORMATIC AND BIOINFORMATIC CHALLENGES AT THE US ENVIRONMENTAL PROTECTION AGENCY.

This presentation will provide an overview of both the scientific program and the regulatory activities related to computational toxicology. This presentation will provide an overview of both the scientific program and the regulatory activities related to computational toxicology.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=154013

How Can We Use Bioinformatics to Predict Which Agents Will Cause Birth Defects?

The availability of genomic sequences from a growing number of human and model organisms has provided an explosion of data, information, and knowledge regarding biological systems and disease processes. High-throughput technologies such as DNA and protein microarray biochips are now standard tools for probing the cellular state and determining important cellular behaviors at the genomic/proteomic levels. While these newer technologies are beginning to provide important information on cellular reactions to toxicant exposure (toxicogenomics), a major challenge that remains is the formulation of a strategy to integrate transcript, protein, metabolite, and toxicity data. This integration will require new concepts and tools in bioinformatics. The U.S. National Library of Medicine’s Pubmed site includes 19 million citations and abstracts and continues to grow. The BDSM team is now working on assembling the literature’s unstructured data into a structured database and linking it to BDSM within a system that can then be used for testing and generating new hypotheses. This effort will generate data bases of entities (such as genes, proteins, metabolites, gene ontology processes) linked to PubMed identifiers/abstracts and providing information on the relationships between them. The end result will be an online/standalone tool that will help researchers to focus on the papers most relevant to their query and uncover hidden connections and obvious information gaps.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=227345

ADVANCED PROTEOMICS AND BIOINFORMATICS TOOLS IN TOXICOLOGY RESEARCH: OVERCOMING CHALLENGES TO PROVIDE SIGNIFICANT RESULTS

This presentation specifically addresses the advantages and limitations of state of the art gel, protein arrays and peptide-based labeling proteomic approaches to assess the effects of a suite of model T4 inhibitors on the thyroid axis of Xenopus laevis.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NHEERL&dirEntryId=152823

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=344452

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=03%2F26%2F2014&dateEndPublishedPresented=03%2F26%2F2019&dirEntryId=344452&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=04%2F02%2F2014&dateEndPublishedPresented=04%2F02%2F2019&dirEntryId=344452&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=04%2F02%2F2014&dateEndPublishedPresented=04%2F02%2F2019&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

 

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=03%2F26%2F2014&dateEndPublishedPresented=03%2F26%2F2019&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=04%2F11%2F2014&dateEndPublishedPresented=04%2F11%2F2019&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=04%2F11%2F2014&dateEndPublishedPresented=04%2F11%2F2019&dirEntryId=344452&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

A Web-Hosted R Workflow to Simplify and Automate the Analysis of 16S NGS Data

Next-Generation Sequencing (NGS) produces large data sets that include tens-of-thousands of sequence reads per sample. For analysis of bacterial diversity, 16S NGS sequences are typically analyzed in a workflow that containing best-of-breed bioinformatics packages that may leverage multiple programming languages (e.g., Python, R, Java, etc.). The process totransform raw NGS data to usable operational taxonomic units (OTUs) can be tedious due tothe number of quality control (QC) steps used in QIIME and other software packages forsample processing. Therefore, the purpose of this work was to simplify the analysis of 16SNGS data from a large number of samples by integrating QC, demultiplexing, and QIIME(Quantitative Insights Into Microbial Ecology) analysis in an accessible R project. User command line operations for each of the pipeline steps were automated into a workflow. In addition, the R server allows multi-user access to the automated pipeline via separate useraccounts while providing access to the same large set of underlying data. We demonstratethe applicability of this pipeline automation using 16S NGS data from approximately 100 stormwater runoff samples collected in a mixed-land use watershed in northeast Georgia. OTU tables were generated for each sample and the relative taxonomic abundances were compared for different periods over storm hydrographs to determine how the microbial ecology of a stream changes with rise and fall of stream stage. Our approach simplifies the pipeline analysis of multiple 16S NGS samples by automating multiple preprocessing, QC, analysis and post-processing command line steps that are called by a sequence of R scripts. Presented at ASM 2015 Rapid NGS Bioinformatic Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NERL&dirEntryId=309890

DEVELOPING COMPUTATIONAL TOOLS NECESSARY FOR APPLYING TOXICOGENOMICS TO RISK ASSESSMENT AND REGULATORY DECISION MAKING.

GENOMICS, PROTEOMICS & METABOLOMICS CAN PROVIDE USEFUL WEIGHT-OF-EVIDENCE DATA ALONG THE SOURCE-TO-OUTCOME CONTINUUM, WHEN APPROPRIATE BIOINFORMATIC AND COMPUTATIONAL METHODS ARE APPLIED TOWARDS INTEGRATING MOLECULAR, CHEMICAL AND TOXICOGICAL INFORMATION. GENOMICS, PROTEOMICS & METABOLOMICS CAN PROVIDE USEFUL WEIGHT-OF-EVIDENCE DATA ALONG THE SOURCE-TO-OUTCOME CONTINUUM, WHEN APPROPRIATE BIOINFORMATIC AND COMPUTATIONAL METHODS ARE APPLIED TOWARDS INTEGRATING MOLECULAR, CHEMICAL AND TOXICOGICAL INFORMATION.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=156264

The Human Toxome Project

The Human Toxome project, funded as an NIH Transformative Research grant 2011–‐ 2016, is focused on developing the concepts and the means for deducing, validating, and sharing molecular Pathways of Toxicity (PoT). Using the test case of estrogenic endocrine disruption, the responses of MCF–‐7 human breast cancer cells are being phenotyped by transcriptomics and mass–‐spectroscopy–‐based metabolomics. The bioinformatics tools for PoT deduction represent a core deliverable. A number of challenges for quality and standardization of cell systems, omics technologies, and bioinformatics are being addressed. In parallel, concepts for annotation, validation, and sharing of PoT information, as well as their link to adverse outcomes, are being developed. A reasonably comprehensive public database of PoT, the Human Toxome Knowledge–‐base, could become a point of reference for toxicological research and regulatory tests strategies. A reasonably comprehensive public database of PoT, the Human Toxome Knowledge–‐base, could become a point of reference for toxicological research and regulatory tests strategies.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=309453

High-Resolution Metabolomics for Environmental Chemical Surveillance and Bioeffect Monitoring

High-Resolution Metabolomics for Environmental Chemical Surveillance and Bioeffect Monitoring (Presented by: Dean Jones, PhD, Department of Medicine, Emory University) (2/28/2013)

https://www.epa.gov/chemical-research/high-resolution-metabolomics-environmental-chemical-surveillance-and-bioeffect

Identification of Absorption, Distribution, Metabolism, and Excretion (ADME) Genes Relevant to Steatosis Using a Gene Expression Approach

Absorption, distribution, metabolism, and excretion (ADME) impact chemical concentration and activation of molecular initiating events of Adverse Outcome Pathways (AOPs) in cellular, tissue, and organ level targets. In order to better describe ADME parameters and how they modulate potential hazards posed by chemical exposure, our goal is to investigate the relationship between AOPs and ADME related genes and functional information. Given the scope of this task, we began using hepatic steatosis as a case study. To identify ADME genes related to steatosis, we used the publicly available toxicogenomics database, Open TG-GATEsTM. This database contains standardized rodent chemical exposure data from 170 chemicals (mostly drugs), along with differential gene expression data and corresponding associated pathological changes. We examined the chemical exposure microarray data set gathered from 9 chemical exposure treatments resulting in pathologically confirmed (minimal, moderate and severe) incidences of hepatic steatosis. From this differential gene expression data set, we utilized differential expression analyses to identify gene changes resulting from the chemical exposures leading to hepatic steatosis. We then selected differentially expressed genes (DEGs) related to ADME by filtering all genes based on their ADME functional identities. These DEGs include enzymes such as cytochrome p450, UDP glucuronosyltransferase, flavin-containing monooxygenase and transporter genes such as solute carriers and ATP-binding cassette transporter families. The up and downregulated genes were identified across these treatments. Total of 61 genes were upregulated and 68 genes were down regulated in all treatments. Meanwhile, 25 genes were both up regulated and downregulated across all the treatments. This work highlights the application of bioinformatics in linking AOPs with gene modulations specifically in relationships to ADME and exposures to chemicals. This abstract does not necessarily reflect U.S. EPA policy. This work highlights the application of bioinformatics tools to identify genes that are modulated by adverse outcomes. Specifically, we delineate a method to identify genes that are related to ADME and can impact target tissue dose in response to chemical exposures. The computational method outlined in this work is applicable to any adverse outcome pathway, and provide a linkage between chemical exposure, target tissue dose, and adverse outcomes. Application of this method will allow for the rapid screening of chemicals for their impact on ADME-related genes using available gene data bases in literature.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NHEERL&dirEntryId=341273

Development of Environmental Fate and Metabolic Simulators

Presented at Bioinformatics Open Source Conference (BOSC), Detroit, MI, June 23-24, 2005. see description

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NERL&dirEntryId=257172

 

Useful Webinars on EPA Computational Tools and Informatics

 

Computational Toxicology Communities of Practice

Computational Toxicology Research

EPA’s Computational Toxicology Communities of Practice is composed of hundreds of stakeholders from over 50 public and private sector organizations (ranging from EPA, other federal agencies, industry, academic institutions, professional societies, nongovernmental organizations, environmental non-profit groups, state environmental agencies and more) who have an interest in using advances in computational toxicology and exposure science to evaluate the safety of chemicals.

The Communities of Practice is open to the public. Monthly webinars are held at EPA’s RTP campus, on the fourth Thursday of the month (occasionally rescheduled in November and December to accommodate holiday schedules), from 11am-Noon EST/EDT. Remote participation is available. For more information or to be added to the meeting email list, contact: Monica Linnenbrink (linnenbrink.monica@epa.gov).

Related Links

Past Webinar Presentations

Presentation File Presented By Date
OPEn structure-activity Relationship App (OPERA) Powerpoint(VideoEXIT) Dr. Kamel Mansouri, Lead Computational Chemist contractor for Integrated Laboratory Systems in the National Institute of Environmental Health Sciences 2019/4/25
CompTox Chemicals Dashboard and InVitroDB V3 (VideoEXIT) Dr. Antony Williams, Chemist in EPA’s National Center for Computational Toxicology and Dr. Katie Paul-Friedman, Toxicologist in EPA’s National Center for Computational Toxicology 2019/3/28
The Systematic Empirical Evaluation of Models (SEEM) framework (VideoEXIT) Dr. John Wambaugh, Physical Scientist in EPA’s National Center for Computational Toxicology 2019/2/28
ToxValDB: A comprehensive database of quantitative in vivo study results from over 25,000 chemicals (VideoEXIT) Dr. Richard Judson, Research Chemist in EPA’s National Center for Computational Toxicology 2018/12/20
Sequence Alignment to Predict Across Species Susceptibility (seqAPASS) (VideoEXIT) Dr. Carlie LaLone, Bioinformaticist, EPA’s National Health and Environmental Effects Research Laboratory 2018/11/29
Chemicals and Products Database (VideoEXIT) Dr. Kathie Dionisio, Environmental Health Scientist, EPA’s National Exposure Research Laboratory 2018/10/25
CompTox Chemicals Dashboard V3 (VideoEXIT) Dr. Antony Williams, Chemist, EPA National Center for Computational Toxicology (NCCT). 2018/09/27
Generalised Read-Across (GenRA) (VideoEXIT) Dr. Grace Patlewicz, Chemist, EPA National Center for Computational Toxicology (NCCT). 2018/08/23
EPA’S ToxCast Owner’s Manual  (VideoEXIT) Monica Linnenbrink, Strategic Outreach and Communication lead, EPA National Center for Computational Toxicology (NCCT). 2018/07/26
EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT)      (VideoEXIT) Elin Ulrich, Research Chemist in the Public Health Chemistry Branch, EPA National Exposure Research Laboratory (NERL). 2018/06/28
ECOTOX Knowledgebase: New Tools and Data Visualizations(VideoEXIT) Colleen Elonen, Translational Toxicology Branch, and Dr. Jennifer Olker, Systems Toxicology Branch, in the Mid-Continent Ecology Division of EPA’s National Health & Environmental Effects Research Laboratory (NHEERL) 2018/05/24
Investigating Chemical-Microbiota Interactions in Zebrafish (VideoEXIT) Tamara Tal, Biologist in the Systems Biology Branch, Integrated Systems Toxicology Division, EPA’s National Health & Environmental Effects Research Laboratory (NHEERL) 2018/04/26
The CompTox Chemistry Dashboard v2.6: Delivering Improved Access to Data and Real Time Predictions (VideoEXIT) Tony Williams, Computational Chemist, EPA’s National Center for Computational Toxicology (NCCT) 2018/03/29
mRNA Transfection Retrofits Cell-Based Assays with Xenobiotic Metabolism (VideoEXIT* Audio starts at 10:17) Steve Simmons, Research Toxicologist, EPA’s National Center for Computational Toxicology (NCCT) 2018/02/22
Development and Distribution of ToxCast and Tox21 High-Throughput Chemical Screening Assay Method Description(VideoEXIT) Stacie Flood, National Student Services Contractor, EPA’s National Center for Computational Toxicology (NCCT) 2018/01/25
High-throughput H295R steroidogenesis assay: utility as an alternative and a statistical approach to characterize effects on steroidogenesis (VideoEXIT) Derik Haggard, ORISE Postdoctoral Fellow, EPA’s National Center for Computational Toxicology (NCCT) 2017/12/14
Systematic Review for Chemical Assessments: Core Elements and Considerations for Rapid Response (VideoEXIT) Kris Thayer, Director, Integrated Risk Information System (IRIS) Division of EPA’s National Center for Environmental Assessment (NCEA) 2017/11/16
High Throughput Transcriptomics (HTTr) Concentration-Response Screening in MCF7 Cells (VideoEXIT) Joshua Harrill, Toxicologist, EPA’s National Center for Computational Toxicology (NCCT) 2017/10/26
Learning Boolean Networks from ToxCast High-Content Imaging Data Todor Antonijevic, ORISE Postdoc, EPA’s National Center for Computational Toxicology (NCCT) 2017/09/28
Suspect Screening of Chemicals in Consumer Products Katherine Phillips, Research Chemist, Human Exposure and Dose Modeling Branch, Computational Exposure Division, EPA’s National Exposure Research Laboratory (NERHL) 2017/08/31
The EPA CompTox Chemistry Dashboard: A Centralized Hub for Integrating Data for the Environmental Sciences (VideoEXIT) Antony Williams, Chemist, EPA’s National Center for Computational Toxicology (NCCT) 2017/07/27
Navigating Through the Minefield of Read-Across Tools and Frameworks: An Update on Generalized Read-Across (GenRA)(VideoEXIT)

 

Read Full Post »


A Nonlinear Methodology to Explain Complexity of the Genome and Bioinformatic Information

Reporter: Stephen J. Williams, Ph.D.

Multifractal bioinformatics: A proposal to the nonlinear interpretation of genome

The following is an open access article by Pedro Moreno on a methodology to analyze genetic information across species and in particular, the evolutionary trends of complex genomes, by a nonlinear analytic approach utilizing fractal geometry, coined “Nonlinear Bioinformatics”.  This fractal approach stems from the complex nature of higher eukaryotic genomes including mosaicism, multiple interdispersed  genomic elements such as intronic regions, noncoding regions, and also mobile elements such as transposable elements.  Although seemingly random, there exists a repetitive nature of these elements. Such complexity of DNA regulation, structure and genomic variation is felt best understood by developing algorithms based on fractal analysis, which can best model the regionalized and repetitive variability and structure within complex genomes by elucidating the individual components which contributes to an overall complex structure rather than using a “linear” or “reductionist” approach looking at individual coding regions, which does not take into consideration the aforementioned factors leading to genetic complexity and diversity.

Indeed, many other attempts to describe the complexities of DNA as a fractal geometric pattern have been described.  In a paper by Carlo Cattani “Fractals and Hidden Symmetries in DNA“, Carlo uses fractal analysis to construct a simple geometric pattern of the influenza A virus by modeling the primary sequence of this viral DNA, namely the bases A,G,C, and T. The main conclusions that

fractal shapes and symmetries in DNA sequences and DNA walks have been shown and compared with random and deterministic complex series. DNA sequences are structured in such a way that there exists some fractal behavior which can be observed both on the correlation matrix and on the DNA walks. Wavelet analysis confirms by a symmetrical clustering of wavelet coefficients the existence of scale symmetries.

suggested that, at least, the viral influenza genome structure could be analyzed into its basic components by fractal geometry.
This approach has been used to model the complex nature of cancer as discussed in a 2011 Seminars in Oncology paper
Abstract: Cancer is a highly complex disease due to the disruption of tissue architecture. Thus, tissues, and not individual cells, are the proper level of observation for the study of carcinogenesis. This paradigm shift from a reductionist approach to a systems biology approach is long overdue. Indeed, cell phenotypes are emergent modes arising through collective non-linear interactions among different cellular and microenvironmental components, generally described by “phase space diagrams”, where stable states (attractors) are embedded into a landscape model. Within this framework, cell states and cell transitions are generally conceived as mainly specified by gene-regulatory networks. However, the system s dynamics is not reducible to the integrated functioning of the genome-proteome network alone; the epithelia-stroma interacting system must be taken into consideration in order to give a more comprehensive picture. Given that cell shape represents the spatial geometric configuration acquired as a result of the integrated set of cellular and environmental cues, we posit that fractal-shape parameters represent “omics descriptors of the epithelium-stroma system. Within this framework, function appears to follow form, and not the other way around.

As authors conclude

” Transitions from one phenotype to another are reminiscent of phase transitions observed in physical systems. The description of such transitions could be obtained by a set of morphological, quantitative parameters, like fractal measures. These parameters provide reliable information about system complexity. “

Gene expression also displays a fractal nature. In a Frontiers in Physiology paper by Mahboobeh Ghorbani, Edmond A. Jonckheere and Paul Bogdan* “Gene Expression Is Not Random: Scaling, Long-Range Cross-Dependence, and Fractal Characteristics of Gene Regulatory Networks“,

the authors describe that gene expression networks display time series display fractal and long-range dependence characteristics.

Abstract: Gene expression is a vital process through which cells react to the environment and express functional behavior. Understanding the dynamics of gene expression could prove crucial in unraveling the physical complexities involved in this process. Specifically, understanding the coherent complex structure of transcriptional dynamics is the goal of numerous computational studies aiming to study and finally control cellular processes. Here, we report the scaling properties of gene expression time series in Escherichia coliand Saccharomyces cerevisiae. Unlike previous studies, which report the fractal and long-range dependency of DNA structure, we investigate the individual gene expression dynamics as well as the cross-dependency between them in the context of gene regulatory network. Our results demonstrate that the gene expression time series display fractal and long-range dependence characteristics. In addition, the dynamics between genes and linked transcription factors in gene regulatory networks are also fractal and long-range cross-correlated. The cross-correlation exponents in gene regulatory networks are not unique. The distribution of the cross-correlation exponents of gene regulatory networks for several types of cells can be interpreted as a measure of the complexity of their functional behavior.

 

Given that multitude of complex biomolecular networks and biomolecules can be described by fractal patterns, the development of bioinformatic algorithms  would enhance our understanding of the interdependence and cross funcitonality of these mutiple biological networks, particularly in disease and drug resistance.  The article below by Pedro Moreno describes the development of such bioinformatic algorithms.

Pedro A. Moreno
Escuela de Ingeniería de Sistemas y Computación, Facultad de Ingeniería, Universidad del Valle, Cali, Colombia
E-mail: pedro.moreno@correounivalle.edu.co

Eje temático: Ingeniería de sistemas / System engineering
Recibido: 19 de septiembre de 2012
Aceptado: 16 de diciembre de 2013


 

 


Abstract

The first draft of the human genome (HG) sequence was published in 2001 by two competing consortia. Since then, several structural and functional characteristics for the HG organization have been revealed. Today, more than 2.000 HG have been sequenced and these findings are impacting strongly on the academy and public health. Despite all this, a major bottleneck, called the genome interpretation persists. That is, the lack of a theory that explains the complex puzzles of coding and non-coding features that compose the HG as a whole. Ten years after the HG sequenced, two recent studies, discussed in the multifractal formalism allow proposing a nonlinear theory that helps interpret the structural and functional variation of the genetic information of the genomes. The present review article discusses this new approach, called: “Multifractal bioinformatics”.

Keywords: Omics sciences, bioinformatics, human genome, multifractal analysis.


1. Introduction

Omic Sciences and Bioinformatics

In order to study the genomes, their life properties and the pathological consequences of impairment, the Human Genome Project (HGP) was created in 1990. Since then, about 500 Gpb (EMBL) represented in thousands of prokaryotic genomes and tens of different eukaryotic genomes have been sequenced (NCBI, 1000 Genomes, ENCODE). Today, Genomics is defined as the set of sciences and technologies dedicated to the comprehensive study of the structure, function and origin of genomes. Several types of genomic have arisen as a result of the expansion and implementation of genomics to the study of the Central Dogma of Molecular Biology (CDMB), Figure 1 (above). The catalog of different types of genomics uses the Latin suffix “-omic” meaning “set of” to mean the new massive approaches of the new omics sciences (Moreno et al, 2009). Given the large amount of genomic information available in the databases and the urgency of its actual interpretation, the balance has begun to lean heavily toward the requirements of bioinformatics infrastructure research laboratories Figure 1 (below).

The bioinformatics or Computational Biology is defined as the application of computer and information technology to the analysis of biological data (Mount, 2004). An interdisciplinary science that requires the use of computing, applied mathematics, statistics, computer science, artificial intelligence, biophysical information, biochemistry, genetics, and molecular biology. Bioinformatics was born from the need to understand the sequences of nucleotide or amino acid symbols that make up DNA and proteins, respectively. These analyzes are made possible by the development of powerful algorithms that predict and reveal an infinity of structural and functional features in genomic sequences, as gene location, discovery of homologies between macromolecules databases (Blast), algorithms for phylogenetic analysis, for the regulatory analysis or the prediction of protein folding, among others. This great development has created a multiplicity of approaches giving rise to new types of Bioinformatics, such as Multifractal Bioinformatics (MFB) that is proposed here.

1.1 Multifractal Bioinformatics and Theoretical Background

MFB is a proposal to analyze information content in genomes and their life properties in a non-linear way. This is part of a specialized sub-discipline called “nonlinear Bioinformatics”, which uses a number of related techniques for the study of nonlinearity (fractal geometry, Hurts exponents, power laws, wavelets, among others.) and applied to the study of biological problems (https://pharmaceuticalintelligence.com/tag/fractal-geometry/). For its application, we must take into account a detailed knowledge of the structure of the genome to be analyzed and an appropriate knowledge of the multifractal analysis.

1.2 From the Worm Genome toward Human Genome

To explore a complex genome such as the HG it is relevant to implement multifractal analysis (MFA) in a simpler genome in order to show its practical utility. For example, the genome of the small nematode Caenorhabditis elegans is an excellent model to learn many extrapolated lessons of complex organisms. Thus, if the MFA explains some of the structural properties in that genome it is expected that this same analysis reveals some similar properties in the HG.

The C. elegans nuclear genome is composed of about 100 Mbp, with six chromosomes distributed into five autosomes and one sex chromosome. The molecular structure of the genome is particularly homogeneous along with the chromosome sequences, due to the presence of several regular features, including large contents of genes and introns of similar sizes. The C. elegans genome has also a regional organization of the chromosomes, mainly because the majority of the repeated sequences are located in the chromosome arms, Figure 2 (left) (C. elegans Sequencing Consortium, 1998). Given these regular and irregular features, the MFA could be an appropriate approach to analyze such distributions.

Meanwhile, the HG sequencing revealed a surprising mosaicism in coding (genes) and noncoding (repetitive DNA) sequences, Figure 2 (right) (Venter et al., 2001). This structure of 6 Gbp is divided into 23 pairs of chromosomes (diploid cells) and these highly regionalized sequences introduce complex patterns of regularity and irregularity to understand the gene structure, the composition of sequences of repetitive DNA and its role in the study and application of life sciences. The coding regions of the genome are estimated at ~25,000 genes which constitute 1.4% of GH. These genes are involved in a giant sea of various types of non-coding sequences which compose 98.6% of HG (misnamed popularly as “junk DNA”). The non-coding regions are characterized by many types of repeated DNA sequences, where 10.6% consists of Alu sequences, a type of SINE (short and dispersed repeated elements) sequence and preferentially located towards the genes. LINES, MIR, MER, LTR, DNA transposons and introns are another type of non-coding sequences which form about 86% of the genome. Some of these sequences overlap with each other; as with CpG islands, which complicates the analysis of genomic landscape. This standard genomic landscape was recently clarified, the last studies show that 80.4% of HG is functional due to the discovery of more than five million “switches” that operate and regulate gene activity, re-evaluating the concept of “junk DNA”. (The ENCODE Project Consortium, 2012).

Given that all these genomic variations both in worm and human produce regionalized genomic landscapes it is proposed that Fractal Geometry (FG) would allow measuring how the genetic information content is fragmented. In this paper the methodology and the nonlinear descriptive models for each of these genomes will be reviewed.

1.3 The MFA and its Application to Genome Studies

Most problems in physics are implicitly non-linear in nature, generating phenomena such as chaos theory, a science that deals with certain types of (non-linear) but very sensitive dynamic systems to initial conditions, nonetheless of deterministic rigor, that is that their behavior can be completely determined by knowing initial conditions (Peitgen et al, 1992). In turn, the FG is an appropriate tool to study the chaotic dynamic systems (CDS). In other words, the FG and chaos are closely related because the space region toward which a chaotic orbit tends asymptotically has a fractal structure (strange attractors). Therefore, the FG allows studying the framework on which CDS are defined (Moon, 1992). And this is how it is expected for the genome structure and function to be organized.

The MFA is an extension of the FG and it is related to (Shannon) information theory, disciplines that have been very useful to study the information content over a sequence of symbols. Initially, Mandelbrot established the FG in the 80’s, as a geometry capable of measuring the irregularity of nature by calculating the fractal dimension (D), an exponent derived from a power law (Mandelbrot, 1982). The value of the D gives us a measure of the level of fragmentation or the information content for a complex phenomenon. That is because the D measures the scaling degree that the fragmented self-similarity of the system has. Thus, the FG looks for self-similar properties in structures and processes at different scales of resolution and these self-similarities are organized following scaling or power laws.

Sometimes, an exponent is not sufficient to characterize a complex phenomenon; so more exponents are required. The multifractal formalism allows this, and applies when many subgroups of fractals with different scalar properties with a large number of exponents or fractal dimensions coexist simultaneously. As a result, when a spectrum of multifractal singularity measurement is generated, the scaling behavior of the frequency of symbols of a sequence can be quantified (Vélez et al, 2010).

The MFA has been implemented to study the spatial heterogeneity of theoretical and experimental fractal patterns in different disciplines. In post-genomics times, the MFA was used to study multiple biological problems (Vélez et al, 2010). Nonetheless, very little attention has been given to the use of MFA to characterize the content of the structural genetic information of the genomes obtained from the images of the Chaos Representation Game (CRG). First studies at this level were made recently to the analysis of the C. elegans genome (Vélez et al, 2010) and human genomes (Moreno et al, 2011). The MFA methodology applied for the study of these genomes will be developed below.

2. Methodology

The Multifractal Formalism from the CGR

2.1 Data Acquisition and Molecular Parameters

Databases for the C. elegans and the 36.2 Hs_ refseq HG version were downloaded from the NCBI FTP server. Then, several strategies were designed to fragment the genomic DNA sequences of different length ranges. For example, the C. elegans genome was divided into 18 fragments, Figure 2 (left) and the human genome in 9,379 fragments. According to their annotation systems, the contents of molecular parameters of coding sequences (genes, exons and introns), noncoding sequences (repetitive DNA, Alu, LINES, MIR, MER, LTR, promoters, etc.) and coding/ non-coding DNA (TTAGGC, AAAAT, AAATT, TTTTC, TTTTT, CpG islands, etc.) are counted for each sequence.

2.2 Construction of the CGR 2.3 Fractal Measurement by the Box Counting Method

Subsequently, the CGR, a recursive algorithm (Jeffrey, 1990; Restrepo et al, 2009) is applied to each selected DNA sequence, Figure 3 (above, left) and from which an image is obtained, which is quantified by the box-counting algorithm. For example, in Figure 3 (above, left) a CGR image for a human DNA sequence of 80,000 bp in length is shown. Here, dark regions represent sub-quadrants with a high number of points (or nucleotides). Clear regions, sections with a low number of points. The calculation for the D for the Koch curve by the box-counting method is illustrated by a progression of changes in the grid size, and its Cartesian graph, Table 1

The CGR image for a given DNA sequence is quantified by a standard fractal analysis. A fractal is a fragmented geometric figure whose parts are an approximated copy at full scale, that is, the figure has self-similarity. The D is basically a scaling rule that the figure obeys. Generally, a power law is given by the following expression:

Where N(E) is the number of parts required for covering the figure when a scaling factor E is applied. The power law permits to calculate the fractal dimension as:

The D obtained by the box-counting algorithm covers the figure with disjoint boxes ɛ = 1/E and counts the number of boxes required. Figure 4 (above, left) shows the multifractal measure at momentum q=1.

2.4 Multifractal Measurement

When generalizing the box-counting algorithm for the multifractal case and according to the method of moments q, we obtain the equation (3) (Gutiérrez et al, 1998; Yu et al, 2001):

Where the Mi number of points falling in the i-th grid is determined and related to the total number Mand ɛ to box size. Thus, the MFA is used when multiple scaling rules are applied. Figure 4 (above, right) shows the calculation of the multifractal measures at different momentum q (partition function). Here, linear regressions must have a coefficient of determination equal or close to 1. From each linear regression D are obtained, which generate an spectrum of generalized fractal dimensions Dfor all q integers, Figure 4 (below, left). So, the multifractal spectrum is obtained as the limit:

The variation of the q integer allows emphasizing different regions and discriminating their fractal a high Dq is synonymous of the structure’s richness and the properties of these regions. Negative values emphasize the scarce regions; a high Dindicates a lot of structure and properties in these regions. In real world applications, the limit Dqreadily approximated from the data using a linear fitting: the transformation of the equation (3) yields:

Which shows that ln In(Mi )= for set q is a linear function in the ln(ɛ), Dq can therefore be evaluated as q the slope of a fixed relationship between In(Mi )= and (q-1) ln(ɛ). The methodologies and approaches for the method of box-counting and MFA are detailed in Moreno et al, 2000, Yu et al, 2001; Moreno, 2005. For a rigorous mathematical development of MFA from images consult Multifractal system, wikipedia.

2.5 Measurement of Information Content

Subsequently, from the spectrum of generalized dimensions Dq, the degree of multifractality ΔDq(MD) is calculated as the difference between the maximum and minimum values of : ΔD qq Dqmax – Dqmin (Ivanov et al, 1999). When qmaxqmin ΔDis high, the multifractal spectrum is rich in information and highly aperiodic, when ΔDq is small, the resulting dimension spectrum is poor in information and highly periodic. It is expected then, that the aperiodicity in the genome would be related to highly polymorphic genomic aperiodic structures and those periodic regions with highly repetitive and not very polymorphic genomic structures. The correlation exponent t(q) = (– 1)DqFigure 4 (below, right ) can also be obtained from the multifractal dimension Dq. The generalized dimension also provides significant specific information. D(q = 0) is equal to the Capacity dimension, which in this analysis is the size of the “box count”. D(q = 1) is equal to the Information dimension and D(q = 2) to the Correlation dimension. Based on these multifractal parameters, many of the structural genomic properties can be quantified, related, and interpreted.

2.6 Multifractal Parameters and Statistical and Discrimination Analyses

Once the multifractal parameters are calculated (D= (-20, 20), ΔDq, πq, etc.), correlations with the molecular parameters are sought. These relations are established by plotting the number of genome molecular parameters versus MD by discriminant analysis with Cartesian graphs in 2-D, Figure 5 (below, left) and 3-D and combining multifractal and molecular parameters. Finally, simple linear regression analysis, multivariate analysis, and analyses by ranges and clusterings are made to establish statistical significance.

3 Results and Discussion

3.1 Non-linear Descriptive Model for the C. elegans Genome

When analyzing the C. elegans genome with the multifractal formalism it revealed what symmetry and asymmetry on the genome nucleotide composition suggested. Thus, the multifractal scaling of the C. elegans genome is of interest because it indicates that the molecular structure of the chromosome may be organized as a system operating far from equilibrium following nonlinear laws (Ivanov et al, 1999; Burgos and Moreno-Tovar, 1996). This can be discussed from two points of view:

1) When comparing C. elegans chromosomes with each other, the X chromosome showed the lowest multifractality, Figure 5 (above). This means that the X chromosome is operating close to equilibrium, which results in an increased genetic instability. Thus, the instability of the X could selectively contribute to the molecular mechanism that determines sex (XX or X0) during meiosis. Thus, the X chromosome would be operating closer to equilibrium in order to maintain their particular sexual dimorphism.

2) When comparing different chromosome regions of the C. elegans genome, changes in multifractality were found in relation to the regional organization (at the center and arms) exhibited by the chromosomes, Figure 5 (below, left). These behaviors are associated with changes in the content of repetitive DNA, Figure 5 (below, right). The results indicated that the chromosome arms are even more complex than previously anticipated. Thus, TTAGGC telomere sequences would be operating far from equilibrium to protect the genetic information encoded by the entire chromosome.

All these biological arguments may explain why C. elegans genome is organized in a nonlinear way. These findings provide insight to quantify and understand the organization of the non-linear structure of the C. elegans genome, which may be extended to other genomes, including the HG (Vélez et al, 2010).

3.2 Nonlinear Descriptive Model for the Human Genome

Once the multifractal approach was validated in C. elegans genome, HG was analyzed exhaustively. This allowed us to propose a nonlinear model for the HG structure which will be discussed under three points of view.

1) It was found that the HG high multifractality depends strongly on the contents of Alu sequences and to a lesser extent on the content of CpG islands. These contents would be located primarily in highly aperiodic regions, thus taking the chromosome far from equilibrium and giving to it greater genetic stability, protection and attraction of mutations, Figure 6 (A-C). Thus, hundreds of regions in the HG may have high genetic stability and the most important genetic information of the HG, the genes, would be safeguarded from environmental fluctuations. Other repeated elements (LINES, MIR, MER, LTRs) showed no significant relationship,

Figure 6 (D). Consequently, the human multifractal map developed in Moreno et al, 2011 constitutes a good tool to identify those regions rich in genetic information and genomic stability. 2) The multifractal context seems to be a significant requirement for the structural and functional organization of thousands of genes and gene families. Thus, a high multifractal context (aperiodic) appears to be a “genomic attractor” for many genes (KOGs, KEEGs), Figure 6 (E) and some gene families, Figure 6 (F) are involved in genetic and deterministic processes, in order to maintain a deterministic regulation control in the genome, although most of HG sequences may be subject to a complex epigenetic control.

3) The classification of human chromosomes and chromosome regions analysis may have some medical implications (Moreno et al, 2002; Moreno et al, 2009). This means that the structure of low nonlinearity exhibited by some chromosomes (or chromosome regions) involve an environmental predisposition, as potential targets to undergo structural or numerical chromosomal alterations in Figure 6 (G). Additionally, sex chromosomes should have low multifractality to maintain sexual dimorphism and probably the X chromosome inactivation.

All these fractals and biological arguments could explain why Alu elements are shaping the HG in a nonlinearly manner (Moreno et al, 2011). Finally, the multifractal modeling of the HG serves as theoretical framework to examine new discoveries made by the ENCODE project and new approaches about human epigenomes. That is, the non-linear organization of HG might help to explain why it is expected that most of the GH is functional.

4. Conclusions

All these results show that the multifractal formalism is appropriate to quantify and evaluate genetic information contents in genomes and to relate it with the known molecular anatomy of the genome and some of the expected properties. Thus, the MFB allows interpreting in a logic manner the structural nature and variation of the genome.

The MFB allows understanding why a number of chromosomal diseases are likely to occur in the genome, thus opening a new perspective toward personalized medicine to study and interpret the GH and its diseases.

The entire genome contains nonlinear information organizing it and supposedly making it function, concluding that virtually 100% of HG is functional. Bioinformatics in general, is enriched with a novel approach (MFB) making it possible to quantify the genetic information content of any DNA sequence and their practical applications to different disciplines in biology, medicine and agriculture. This novel breakthrough in computational genomic analysis and diseases contributes to define Biology as a “hard” science.

MFB opens a door to develop a research program towards the establishment of an integrative discipline that contributes to “break” the code of human life. (http://pharmaceuticalintelligence. com/page/3/).

5. Acknowledgements

Thanks to the directives of the EISC, the Universidad del Valle and the School of Engineering for offering an academic, scientific and administrative space for conducting this research. Likewise, thanks to co authors (professors and students) who participated in the implementation of excerpts from some of the works cited here. Finally, thanks to Colciencias by the biotechnology project grant # 1103-12-16765.


6. References

Blanco, S., & Moreno, P.A. (2007). Representación del juego del caos para el análisis de secuencias de ADN y proteínas mediante el análisis multifractal (método “box-counting”). In The Second International Seminar on Genomics and Proteomics, Bioinformatics and Systems Biology (pp. 17-25). Popayán, Colombia.         [ Links ]

Burgos, J.D., & Moreno-Tovar, P. (1996). Zipf scaling behavior in the immune system. BioSystem , 39, 227-232.         [ Links ]

C. elegans Sequencing Consortium. (1998). Genome sequence of the nematode C. elegans: a platform for investigating biology. Science , 282, 2012-2018.         [ Links ]

Gutiérrez, J.M., Iglesias A., Rodríguez, M.A., Burgos, J.D., & Moreno, P.A. (1998). Analyzing the multifractals structure of DNA nucleotide sequences. In, M. Barbie & S. Chillemi (Eds.) Chaos and Noise in Biology and Medicine (cap. 4). Hackensack (NJ): World Scientific Publishing Co.         [ Links ]

Ivanov, P.Ch., Nunes, L.A., Golberger, A.L., Havlin, S., Rosenblum, M.G., Struzikk, Z.R., & Stanley, H.E. (1999). Multifractality in human heartbeat dynamics. Nature , 399, 461-465.         [ Links ]

Jeffrey, H.J. (1990). Chaos game representation of gene structure. Nucleic Acids Research , 18, 2163-2175.         [ Links ]

Mandelbrot, B. (1982). La geometría fractal de la naturaleza. Barcelona. España: Tusquets editores.         [ Links ]

Moon, F.C. (1992). Chaotic and fractal dynamics. New York: John Wiley.         [ Links ]

Moreno, P.A. (2005). Large scale and small scale bioinformatics studies on the Caenorhabditis elegans enome. Doctoral thesis. Department of Biology and Biochemistry, University of Houston, Houston, USA.         [ Links ]

Moreno, P.A., Burgos, J.D., Vélez, P.E., Gutiérrez, J.M., & et al., (2000). Multifractal analysis of complete genomes. In P roceedings of the 12th International Genome Sequencing and Analysis Conference (pp. 80-81). Miami Beach (FL).         [ Links ]

Moreno, P.A., Rodríguez, J.G., Vélez, P.E., Cubillos, J.R., & Del Portillo, P. (2002). La genómica aplicada en salud humana. Colombia Ciencia y Tecnología. Colciencias , 20, 14-21.         [ Links ]

Moreno, P.A., Vélez, P.E., & Burgos, J.D. (2009). Biología molecular, genómica y post-genómica. Pioneros, principios y tecnologías. Popayán, Colombia: Editorial Universidad del Cauca.         [ Links ]

Moreno, P.A., Vélez, P.E., Martínez, E., Garreta, L., Díaz, D., Amador, S., Gutiérrez, J.M., et. al. (2011). The human genome: a multifractal analysis. BMC Genomics , 12, 506.         [ Links ]

Mount, D.W. (2004). Bioinformatics. Sequence and ge nome analysis. New York: Cold Spring Harbor Laboratory Press.         [ Links ]

Peitgen, H.O., Jürgen, H., & Saupe D. (1992). Chaos and Fractals. New Frontiers of Science. New York: Springer-Verlag.         [ Links ]

Restrepo, S., Pinzón, A., Rodríguez, L.M., Sierra, R., Grajales, A., Bernal, A., Barreto, E. et. al. (2009). Computational biology in Colombia. PLoS Computational Biology, 5 (10), e1000535.         [ Links ]

The ENCODE Project Consortium. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature , 489, 57-74.         [ Links ]

Vélez, P.E., Garreta, L.E., Martínez, E., Díaz, N., Amador, S., Gutiérrez, J.M., Tischer, I., & Moreno, P.A. (2010). The Caenorhabditis elegans genome: a multifractal analysis. Genet and Mol Res , 9, 949-965.         [ Links ]

Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., & et al. (2001). The sequence of the human genome. Science , 291, 1304-1351.         [ Links ]

Yu, Z.G., Anh, V., & Lau, K.S. (2001). Measure representation and multifractal analysis of complete genomes. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics , 64, 031903.         [ Links ]

 

Other articles on Bioinformatics on this Open Access Journal include:

Bioinformatics Tool Review: Genome Variant Analysis Tools

2017 Agenda – BioInformatics: Track 6: BioIT World Conference & Expo ’17, May 23-35, 2017, Seaport World Trade Center, Boston, MA

Better bioinformatics

Broad Institute, Google Genomics combine bioinformatics and computing expertise

Autophagy-Modulating Proteins and Small Molecules Candidate Targets for Cancer Therapy: Commentary of Bioinformatics Approaches

CRACKING THE CODE OF HUMAN LIFE: The Birth of BioInformatics & Computational Genomics

Read Full Post »

Older Posts »