Feeds:
Posts
Comments

Archive for the ‘BioIT: BioInformatics’ Category


Diversity and Health Disparity Issues Need to be Addressed for GWAS and Precision Medicine Studies

Curator: Stephen J. Williams, PhD

 

 

From the POLICY FORUM ETHICS AND DIVERSITY Section of Science

Ethics of inclusion: Cultivate trust in precision medicine

 See all authors and affiliations

Science  07 Jun 2019:
Vol. 364, Issue 6444, pp. 941-942
DOI: 10.1126/science.aaw8299

Precision medicine is at a crossroads. Progress toward its central goal, to address persistent health inequities, will depend on enrolling populations in research that have been historically underrepresented, thus eliminating longstanding exclusions from such research (1). Yet the history of ethical violations related to protocols for inclusion in biomedical research, as well as the continued misuse of research results (such as white nationalists looking to genetic ancestry to support claims of racial superiority), continue to engender mistrust among these populations (2). For precision medicine research (PMR) to achieve its goal, all people must believe that there is value in providing information about themselves and their families, and that their participation will translate into equitable distribution of benefits. This requires an ethics of inclusion that considers what constitutes inclusive practices in PMR, what goals and values are being furthered through efforts to enhance diversity, and who participates in adjudicating these questions. The early stages of PMR offer a critical window in which to intervene before research practices and their consequences become locked in (3).

Initiatives such as the All of Us program have set out to collect and analyze health information and biological samples from millions of people (1). At the same time, questions of trust in biomedical research persist. For example, although the recent assertions of white nationalists were eventually denounced by the American Society of Human Genetics (4), the misuse of ancestry testing may have already undermined public trust in genetic research.

There are also infamous failures in research that included historically underrepresented groups, including practices of deceit, as in the Tuskegee Syphilis Study, or the misuse of samples, as with the Havasupai tribe (5). Many people who are being asked to give their data and samples for PMR must not only reconcile such past research abuses, but also weigh future risks of potential misuse of their data.

To help assuage these concerns, ongoing PMR studies should open themselves up to research, conducted by social scientists and ethicists, that examines how their approaches enhance diversity and inclusion. Empirical studies are needed to account for how diversity is conceptualized and how goals of inclusion are operationalized throughout the life course of PMR studies. This is not limited to selection and recruitment of populations but extends to efforts to engage participants and communities, through data collection and measurement, and interpretations and applications of study findings. A commitment to transparency is an important step toward cultivating public trust in PMR’s mission and practices.

From Inclusion to Inclusive

The lack of diverse representation in precision medicine and other biomedical research is a well-known problem. For example, rare genetic variants may be overlooked—or their association with common, complex diseases can be misinterpreted—as a result of sampling bias in genetics research (6). Concentrating research efforts on samples with largely European ancestry has limited the ability of scientists to make generalizable inferences about the relationships among genes, lifestyle, environmental exposures, and disease risks, and thereby threatens the equitable translation of PMR for broad public health benefit (7).

However, recruiting for diverse research participation alone is not enough. As with any push for “diversity,” related questions arise about how to describe, define, measure, compare, and explain inferred similarities and differences among individuals and groups (8). In the face of ambivalence about how to represent population variation, there is ample evidence that researchers resort to using definitions of diversity that are heterogeneous, inconsistent, and sometimes competing (9). Varying approaches are not inherently problematic; depending on the scientific question, some measures may be more theoretically justified than others and, in many cases, a combination of measures can be leveraged to offer greater insight (10). For example, studies have shown that American adults who do not self-identify as white report better mental and physical health if they think others perceive them as white (1112).

The benefit of using multiple measures of race and ancestry also extends to genetic studies. In a study of hypertension in Puerto Rico, not only did classifications based on skin color and socioeconomic status better predict blood pressure than genetic ancestry, the inclusion of these sociocultural measures also revealed an association between a genetic polymorphism and hypertension that was otherwise hidden (13). Thus, practices that allow for a diversity of measurement approaches, when accompanied by a commitment to transparency about the rationales for chosen approaches, are likely to benefit PMR research more than striving for a single gold standard that would apply across all studies. These definitional and measurement issues are not merely semantic. They also are socially consequential to broader perceptions of PMR research and the potential to achieve its goals of inclusion.

Study Practices, Improve Outcomes

Given the uncertainty and complexities of the current, early phase of PMR, the time is ripe for empirical studies that enable assessment and modulation of research practices and scientific priorities in light of their social and ethical implications. Studying ongoing scientific practices in real time can help to anticipate unintended consequences that would limit researchers’ ability to meet diversity recruitment goals, address both social and biological causes of health disparities, and distribute the benefits of PMR equitably. We suggest at least two areas for empirical attention and potential intervention.

First, we need to understand how “upstream” decisions about how to characterize study populations and exposures influence “downstream” research findings of what are deemed causal factors. For example, when precision medicine researchers rely on self-identification with U.S. Census categories to characterize race and ethnicity, this tends to circumscribe their investigation of potential gene-environment interactions that may affect health. The convenience and routine nature of Census categories seemed to lead scientists to infer that the reasons for differences among groups were self-evident and required no additional exploration (9). The ripple effects of initial study design decisions go beyond issues of recruitment to shape other facets of research across the life course of a project, from community engagement and the return of results to the interpretation of study findings for human health.

Second, PMR studies are situated within an ecosystem of funding agencies, regulatory bodies, disciplines, and other scholars. This partly explains the use of varied terminology, different conceptual understandings and interpretations of research questions, and heterogeneous goals for inclusion. It also makes it important to explore how expectations related to funding and regulation influence research definitions of diversity and benchmarks for inclusion.

For example, who defines a diverse study population, and how might those definitions vary across different institutional actors? Who determines the metrics that constitute successful inclusion, and why? Within a research consortium, how are expectations for data sharing and harmonization reconciled with individual studies’ goals for recruitment and analysis? In complex research fields that include multiple investigators, organizations, and agendas, how are heterogeneous, perhaps even competing, priorities negotiated? To date, no studies have addressed these questions or investigated how decisions facilitate, or compromise, goals of diversity and inclusion.

The life course of individual studies and the ecosystems in which they reside cannot be easily separated and therefore must be studied in parallel to understand how meanings of diversity are shaped and how goals of inclusion are pursued. Empirically “studying the studies” will also be instrumental in creating mechanisms for transparency about how PMR is conducted and how trade-offs among competing goals are resolved. Establishing open lines of inquiry that study upstream practices may allow researchers to anticipate and address downstream decisions about how results can be interpreted and should be communicated, with a particular eye toward the consequences for communities recruited to augment diversity. Understanding how scientists negotiate the challenges and barriers to achieving diversity that go beyond fulfilling recruitment numbers is a critical step toward promoting meaningful inclusion in PMR.

Transparent Reflection, Cultivation of Trust

Emerging research on public perceptions of PMR suggests that although there is general support, questions of trust loom large. What we learn from studies that examine on-the-ground approaches aimed at enhancing diversity and inclusion, and how the research community reflects and responds with improvements in practices as needed, will play a key role in building a culture of openness that is critical for cultivating public trust.

Cultivating long-term, trusting relationships with participants underrepresented in biomedical research has been linked to a broad range of research practices. Some of these include the willingness of researchers to (i) address the effect of history and experience on marginalized groups’ trust in researchers and clinicians; (ii) engage concerns about potential group harms and risks of stigmatization and discrimination; (iii) develop relationships with participants and communities that are characterized by transparency, clear communication, and mutual commitment; and (iv) integrate participants’ values and expectations of responsible oversight beyond initial informed consent (14). These findings underscore the importance of multidisciplinary teams that include social scientists, ethicists, and policy-makers, who can identify and help to implement practices that respect the histories and concerns of diverse publics.

A commitment to an ethics of inclusion begins with a recognition that risks from the misuse of genetic and biomedical research are unevenly distributed. History makes plain that a multitude of research practices ranging from unnecessarily limited study populations and taken-for-granted data collection procedures to analytic and interpretive missteps can unintentionally bolster claims of racial superiority or inferiority and provoke group harm (15). Sustained commitment to transparency about the goals, limits, and potential uses of research is key to further cultivating trust and building long-term research relationships with populations underrepresented in biomedical studies.

As calls for increasing diversity and inclusion in PMR grow, funding and organizational pathways must be developed that integrate empirical studies of scientific practices and their rationales to determine how goals of inclusion and equity are being addressed and to identify where reform is required. In-depth, multidisciplinary empirical investigations of how diversity is defined, operationalized, and implemented can provide important insights and lessons learned for guiding emerging science, and in so doing, meet our ethical obligations to ensure transparency and meaningful inclusion.

References and Notes

  1. C. P. Jones et al Ethn. Dis. 18496 (2008).
  2. C. C. GravleeA. L. NonC. J. Mulligan
  3. S. A. Kraft et al Am. J. Bioeth. 183 (2018).
  4. A. E. Shields et al Am. Psychol. 6077 (2005).

Read Full Post »


Data Science & Analytics: What do Data Scientists Do in 2020 and a Pioneer Practitioner’s Portfolio of Algorithm-based Decision Support Systems for Operations Management in Several Industrial Verticals

Curator: Aviva Lev-Ari, PhD, RN

Based on  Jesse Anderson’s work on data teams Kathleen Walch in Why Data Scientists Aren’t Data Engineers makes several keen distinctions between the two skill sets.

I can attest that she is absolutely correct. See below, a Pioneer Practitioner’s Portfolio of Algorithm-based Decision Support Systems for Operations Management in Several Industrial Verticals

 

These key distinctions are:

Data Scientists vs Data Engineers

In the mid-2000s, we saw the emergence of the Data Scientist position. As cited in the O’Reilly article: “This increase in the demand for data scientists has been driven by the success of the major Internet companies. Google, Facebook, LinkedIn, and Amazon have all made their marks by using data creatively: not just warehousing data, but turning it into something of value.” Not surprisingly, any organization that has data of value is looking at data science and data scientists to increasingly extract more value from that information.

Originating from roots in statistical modeling and data analysis, data scientists have backgrounds in advanced math and statistics, advanced analytics, and increasingly machine learning / AI.  The focus of data scientists is, unsurprisingly, data science — that is to say, how to extract useful information from a sea of data, and how to translate business and scientific informational needs into the language of information and math. Data scientists need to be masters of statistics, probability, mathematics, and algorithms that help to glean useful insights from huge piles of information. These data scientists usually have learned programming out of necessity more than anything else in order to run programs and run advanced analysis on data.  As a result, the code that data scientists have usually been tasked to write, is of a minimal nature – only as necessary to accomplish a data science task (R is a common language for them to use) and work best when they are provided clean data to run advanced analytics on. A data scientist is a scientist who creates hypothesis, runs tests and analysis of the data, and then translates their results for someone else in the organization to easily view and understand.

On the other hand, data scientists can’t perform their jobs without access to large volumes of clean data. Extracting, cleaning, and moving data is not really the role of a data scientist, but rather that of a data engineer. Data Engineers have programming and technology expertise, and have previously been involved with data integration, middleware, analytics, business data portal, and extract-transform-load (ETL) operations. The data engineer’s center of gravity and skills are focused around big data and distributed systems, and experience with programming languages such as Java, Python, Scala, and scripting tools and techniques.  Data engineers are challenged with the task of taking data from a wide range of systems in structured and unstructured formats, and data which is usually not “clean”, with missing fields, mismatched data types, and other data-related issues. These data engineers need to use their programming, integration, architecture, and systems skills to clean all the data and put it into a format and system that data scientists can then use to analyze, build their data models, and provide value to the organization. In this way, the role of a data engineer is an engineer who designs, builds and arranges data.

Can there be a combined Data Scientist-Engineer role?

While it might seem that the roles of a data scientist and data engineer are distinct, data scientists and data engineers share many traits and skill sets. These overlapping skills include the necessity to work with and manipulate big data sets, programming skills to apply operations to the data, data analytics skills, and general fluency with systems operations.

Rather than engineering and programming-centric tools, data scientists need data science-centric tools. Right now there’s a growing collection of these tools, often emerging from data or predictive analytics environments that suit the needs of data scientists. However, it’s possible that even more business-centric tools might be appropriate, especially as the data scientists become more embedded with the line of business. For example, decades ago if you wanted to operate on large volumes of data in a spreadsheet-like format, this involved programming, but tools like Excel introduced things like pivot tables and now business managers are able to perform all sorts of analyses. It’s only a matter of time before tools like Excel embed data science capabilities, or business-centric data mining and analysis tools into their products.

As the talent gap for data scientists continues to widen, there is no doubt that we will see new tools created out of necessity to allow non-technical (read: business) people to run, test, and analyze data. Strategic business managers will begin to learn data science, without needing or wanting programming or data integration experience.  Traditional data scientists will still be needed to run very complex analysis of data. For the most part however, basic analysis will move more to the business unit due to increasingly easy-to-use tools. This means we have still yet to see which tool or technology will be the dominant one for ML and data science in the enterprise.

 

 

My SOURCES for the evolution of the field of Data Science are the following:

 Jesse Anderson’s work on data teams

Learn How to Create and Manage Big Data Teams

This Free, 73 Page E-Book is the Complete Guide to Successful Big Data projects

I’m really tired of seeing Big Data projects fail. They fail for both technical and managerial reasons. They all fail for similar reasons and that’s just sad because we can fix or prevent them. Gartner’s research shows that 85% of Big Data projects don’t even make it into production.

“Only 15 percent of businesses reported deploying their big data project to production, effectively unchanged from last year (14 percent).”

October 4, 2016 Gartner Press Release

https://www.bigdatainstitute.io/data-engineering-teams-book/

 

December, 1, 2019, 9:48 am

Why Data Scientists Aren’t Data Engineers

Kathleen Walch

Managing Partner & Principal Analyst at AI Focused Research and Advisory firm Cognilytica

https://www.forbes.com/sites/cognitiveworld/2019/12/01/why-data-scientists-arent-data-engineers/amp/?__twitter_impression=true

 

Translating Between Computer Science and Statistics

Posted on December 1, 2019

Gil Press

https://whatsthebigdata.com/2019/12/01/translating-between-computer-science-and-statistics/

 

Jan 8, 2019, 06:18am

The AI Chronicles: Combining Statistical Analysis And Computing From Hollerith To Zuckerberg

Gil Press Contributor

Enterprise & Cloud

https://www.forbes.com/sites/gilpress/2019/01/08/the-ai-chronicles-combining-statistical-analysis-and-computing-from-hollerith-to-zuckerberg/#23cf507c73b3

 

Jan 2, 2015, 10:48am

A Very Short History Of The Internet And The Web

Gil Press Contributor

Enterprise & Cloud

https://www.forbes.com/sites/gilpress/2015/01/02/a-very-short-history-of-the-internet-and-the-web-2/#a45c9307a4e2

 

May 28, 2013, 09:09am

A Very Short History Of Data Science

Gil Press Contributor

Enterprise & Cloud

https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#1e7db3e155cf

 

May 9, 2013, 09:45am

A Very Short History Of Big Data

Gil Press Contributor

Enterprise & Cloud

https://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/#16c2043b65a1

 

Apr 8, 2013, 09:16am

A Very Short History of Information Technology (IT)

Gil Press Contributor

Enterprise & Cloud

https://www.forbes.com/sites/gilpress/2013/04/08/a-very-short-history-of-information-technology-it/#3f5491022440

 

A Pioneer Practitioner’s Portfolio of Algorithm-based Decision Support Systems for Operations Management in Several Industrial Verticals: Analytics Designer, Aviva Lev-Ari, PhD, RN

On this landscape about IT, The Internet, Analytics, Statistics, Big Data, Data Science and Artificial Intelligence, I am to tell stories on my own pioneering work in data science, Algorithm-based decision support systems design for different organizations in several sectors of the US economy:

  • Startups:
  1. TimeØ Group
  2. Concept Five Technologies, Inc.
  3. MDSS, Inc.
  4. LPBI Group
  • Top Tier Management Consulting: SRI International, Monitor Group;
  • OEM: Amdahl Corporation;
  • Top 6th System Integrator: Perot System Corporation;
  • FFRDC: MITRE Corporation.
  • Publishing industry: was Director of Research at McGraw-Hill/CTB.
  • Northeastern University, Researcher on Cardiovascular Pharmaco-therapy at Bouve College of Health Sciences (Independent research guided by Professor of Pharmacology)

Type of institutions:

  • For-Profit corporations: Amdahl Corp, PSC, McGraw-Hill
  • For-Profit Top Tier Consulting: Monitor Company, Now Deloitte
  • Not-for-Profit Top Tier Consulting: SRI International
  • FFRDC: MITRE
  • eScientific Publishing: LPBI Group: Developers of Curation methodology for e-Articles [N = 3,700], electronic Table of Contents for e-Books in Medicine [N = 16, https://lnkd.in/ekWGNqA] and e-Proceedings of Biotech Conferences [N = 70].

 

Autobiographical Annotations: Tribute to My Professors

 

Pioneering implementations of analytics to business decision making: contributions to domain knowledge conceptualization, research design, methodology development, data modeling and statistical data analysis: Aviva Lev-Ari, UCB, PhD’83; HUJI MA’76

https://pharmaceuticalintelligence.com/2018/05/28/pioneering-implementations-of-analytics-to-business-decision-making-contributions-to-domain-knowledge-conceptualization-research-design-methodology-development-data-modeling-and-statistical-data-a/

Recollections of Years at UC, Berkeley, Part 1 and Part 2

  • Recollections: Part 1 – My days at Berkeley, 9/1978 – 12/1983 – About my doctoral advisor, Allan Pred, other professors and other peers

https://pharmaceuticalintelligence.com/2018/03/15/recollections-my-days-at-berkeley-9-1978-12-1983-about-my-doctoral-advisor-allan-pred-other-professors-and-other-peer/

  • Recollections: Part 2 – “While Rolling” is preceded by “While Enrolling” Autobiographical Alumna Recollections of Berkeley – Aviva Lev-Ari, PhD’83

https://pharmaceuticalintelligence.com/2018/05/24/recollections-part-2-while-rolling-is-preceded-by-while-enrolling-autobiographical-alumna-recollections-of-berkeley-aviva-lev-ari-phd83/

Accomplishments

The Digital Age Gave Rise to New Definitions – New Benchmarks were born on the World Wide Web for the Intangible Asset of Firm’s Reputation: Pay a Premium for buying e-Reputation

For @AVIVA1950, Founder, LPBI Group @pharma_BI: Twitter Analytics [Engagement Rate, Link Clicks, Retweets, Likes, Replies] & Tweet Highlights [Tweets, Impressions, Profile Visits, Mentions, New Followers] https://analytics.twitter.com/user/AVIVA1950/tweets

Thriving at the Survival Calls during Careers in the Digital Age – An AGE like no Other, also known as, DIGITAL

Reflections on a Four-phase Career: Aviva Lev-Ari, PhD, RN, March 2018

Was prepared for publication in American Friends of the Hebrew University (AFHU), May 2018 Newsletter, Hebrew University’s HUJI Alumni Spotlight Section.

Aviva Lev-Ari’s profile was up on 5/3/2018 on AFHU website under the Alumni Spotlight at https://www.afhu.org/

On 5/11/2018, Excerpts were Published in AFHU e-news.

https://us10.campaign-archive.com/?u=5c25136c60d4dfc4d3bb36eee&id=757c5c3aae&e=d09d2b8d72

https://www.afhu.org/2018/05/03/aviva-lev-ari/

 

Read Full Post »


MinneBOS 2019, Field Guide to Data Science & Emerging Tech in the Boston Community

August 22, 2019, 8AM to 5PM at Boston University Questrom School of Business, 595 Commonwealth Avenue, Boston, MA

 

 

MinneBOS – Boston’s Field Guide to Data Science & Emerging Tech

Announcement

Leaders in Pharmaceutical Business Intelligence (LPBI) Group

 

REAL TIME Press Coverage for

 http://pharmaceuticalintelligence.com 

by

 Aviva Lev-Ari, PhD, RN

Director & Founder, Leaders in Pharmaceutical Business Intelligence (LPBI) Group, Boston

Editor-in-Chief, Open Access Online Scientific Journal, http://pharmaceuticalintelligence.com

Editor-in-Chief, BioMed e-Series, 16 Volumes in Medicine, https://pharmaceuticalintelligence.com/biomed-e-books/

@pharma_BI

@AVIVA1950

#MinneBos

 

Logo, Leaders in Pharmaceutical Business Intelligence (LPBI) Group, Boston

Our BioMed e-series

WE ARE ON AMAZON.COM

 

https://lnkd.in/ekWGNqA

 

UPDATED AGENDA

Thursday, August 22 • 9:30am – 10:15am
Histopathological images are the gold standard tool for cancer diagnosis, whose interpretation requires manual inspection by expert pathologists. This process is time-consuming for the patients and subject to human error. Recent advances in deep learning models, particularly convolutional neural networks, combined with big databases of patient histopathology images will pave the path for cancer researchers to create more accurate guiding tools for pathologists. In this talk, I will review the latest advances of big data in healthcare analytics and focus on deep learning applications in cancer research. Targeted at a general audience, I will provide a high-level overview of technical concepts in deep learning image analysis, and describe a typical cloud-based workflow for tackling such big data problems. I will conclude my talk by sharing some of our most recent results based on a wide range of cancer types.

Speakers

avatar for Mohammad Soltanieh-ha, PhD

Mohammad Soltanieh-ha, PhD

Clinical Assistant Professor, Boston University – Questrom
Mohammad is a faculty at Boston University, Questrom School of Business, where he teaches data analytics and big data to master’s students. Mohammad’s current research area involves deep learning and its applications in cancer research.

10:15am

10:30am

Thursday, August 22 • 10:30am – 11:00am

Deep learning image recognition and classification models for fashion items

Large scale image recognition and classification is an interesting and challenging problem. This case study uses fashion-MNIST dataset that involves 60000 training images and 10000 testing images. Several popular deep learning models are explored in this study to arrive at a suitable model with high accuracy. Although convolutional neural networks have emerged as a gold-standard for image recognition and classification problems due to speed and accuracy advantages, arriving at an optimal model and making several choices at the time of specifying model architecture, is still a challenging task. This case study provides the best practices and interesting insights.

Speakers

avatar for Bharatendra Rai

Bharatendra Rai

Professor, UMass Dartmouth
Bharatendra Rai, Ph.D. is Professor of Business Analytics in the Charlton College of Business at UMass Dartmouth. His research interests include machine learning & deep learning applications.
  • Train data: 60,000
  • Test data: 10,000
  • Dataset available from Google MNIST Fashion Data – items in DB: data already labelled
  • Label and Description
  • Architecture: Input >> Conv >> Conv >> Pooling >> Dropout << Dense <<Flatten << Dropout >> Output
  • CNN vs Fully connected: 320 parameters: 3x3x1x32 + [32 BIAS TERM] = 320 vs
  • fully connected network parameters is 16 million
  • Train the model: 15 iterations – Training and Validation
  • Actual vs Predicted: 94% was classified correctly = Accuracy: 94% 5974 vs 4700 (78%)
  • Confusion Matrix – Test 720 correctly classified for item 6  – Probability va Actual Vs Predicted
  • Image generation: Noise . gnerator Network > fake Image vs Real image – GAN Loss va Discriminator Loss
  • CNN network help reduce # of parameter
  • Droppot layers can help reduce overfitting
  • validation split of x%chooses last x% of train data
  • Generation of new data is challenging

11:00am

11:15am

Thursday, August 22 • 11:15am – 12:00pm

Rapid Data Science

Most companies today require fast, traceable, and actionable answers to their data questions. This talk will present the structure of the data science process along with cutting edge developments in computing and data science technology (DST) with direct applications to real world problems (with a lot of pictures!). Everything from modeling to team building will be discussed, with clear business applications.

Speakers

avatar for Erez Kaminski

Erez Kaminski

Leaders Global Operations Fellow, MIT
Erez has spent his career helping companies solve problems using data science. He is currently a graduate student in computer science and business at MIT. Previously, he worked in data science at Amgen Inc. and as a technologist at Wolfram Research.

12:00pm

1:00pm

Thursday, August 22 • 1:00pm – 1:45pm

Health and Healthcare Data Visualization – See how you’re doing

Health and healthcare organizations are swimming in data but few have the skills to show and see the story in their data using the best practices of data visualization. This presentation raises awareness about the research that inform these best practice and stories from the front of groups who are embracing them and re-imagining how they display their data and information. These groups include the NYC Dept of Health & Mental Hygiene, The Centers for Medicare and Medicaid (CMS), and leading medical centers and providers across the country.

Speakers

avatar for Katherine Rowell

Katherine Rowell

Co-Founder & Principal, Health Data Viz
Katherine Rowell is a health, healthcare, and data visualization expert. She is Co-founder and Principal of HealthDataViz, a Boston firm that specializes in helping healthcare organizations organize, design and present visual displays of data to inform their decisions and stimulate… Read More →
  • dashboard for Hospital CEOs

1:45pm

2:00pm

Thursday, August 22 • 2:00pm – 2:45pm

AI in Healthcare

Benefits, challenges and impact of AI and Cybersecurity on medicine.

Speakers

avatar for Vinit Nijhawan

Vinit Nijhawan

Lecturer, Boston University
Vinit Nijhawan is an Entrepreneur, Academic, and Board Member with a track record of success, including 4 startups in 20 years.
  • US: Spends the most on Health Care (HC) death per 100K people is the highest
  • Eric Topol – Diagnosis is not done correctly, AI will help with diagnosis
  • Diagnosis — AI will have the most impact; VIRAL infections are diagnosed as bacterial infections and get antibiotics for treatment
  • Image Classification my ML – decline below to human misclassification
  • Training Data sets – Big data
  • Algorithms getting better
  • Data Capture getting better – HC as well
  • Investment in HC is the greatest
  • SECURITY related to Implentable Medical Devices = security attacks – hacking and sending signal to implentable devices

2:45pm

3:00pm

Empower

Thursday, August 22 • 3:00pm – 3:30pm

Patient centric AI: Saving lives with ML driven hospital interventions

This presentation will cover the use of machine learning for maximizing the impact of a hospital readmissions intervention program. With machine learning, clinical care teams can identify and focus their intervention efforts on patients with the highest risk of readmission. The talk will go over the goals, logistics, and considerations for defining, implementing, and measuring our ML driven intervention program. While covering some technical details, this presentation will focus on the business implementation of advanced technology for helping people live healthier lives.

Speakers

avatar for Miguel Martinez

Miguel Martinez

Data Scientist, Optum
Miguel Martinez is a Data Scientist at Optum Enterprise Analytics. Relied on as a tech lead in advancing AI healthcare initiatives, he is passionate about identifying and developing data science solutions for the benefit of organizations and people.

 

3:30pm

3:45pm

Thursday, August 22 • 3:45pm – 4:15pm

Using Ontologies to Power AI Systems

There’s a great deal of confusion about the role of a knowledge architecture in artificial intelligence projects. Some people don’t believe that any reference data is necessary. But in reality reference data is required- even if there is no metadata or architecture definitions outside defined externally for an AI algorithm, someone has made the decisions about architecture and classification within the program. However, this will not work for every organization because there are terms, workflows, product attributes, and organizing principles that are unique to the organization and that need to be defined for AI tools to work most effectively.

Speakers

avatar for Seth Earley

Seth Earley

CEO, Earley Information Science
Seth Earley is a published author and public speaker about artificial intelligence and information architecture. He wrote “There’s no AI without IA” which has become an industry catchphrase used by a number of people including Ginny Rometty, the CEO of IBM.
  • Ontology, taxonomies, thesauri – conceptual relationships
  • Object-Oriented Programming and Information Architecture using AI is Old wine in new bottles

4:15pm

Thursday, August 22

TBA

 Senior Leadership Panel: Future Directions of Analytics

This panel includes senior leaders from across industry, academia & government to discuss challenges they are tackling, needs they anticipate and goals they will achieve

Moderators

avatar for Bonnie Holub, PhD

Bonnie Holub, PhD

Industry & Business Data Science, Teradata
Bonnie has a PhD in Artificial Intelligence and specializes in correlating disparate sets of Big Data for actionable results.

Read Full Post »


scPopCorn: A New Computational Method for Subpopulation Detection and their Comparative Analysis Across Single-Cell Experiments

Reporter and Curator: Dr. Sudipta Saha, Ph.D.

 

Present day technological advances have facilitated unprecedented opportunities for studying biological systems at single-cell level resolution. For example, single-cell RNA sequencing (scRNA-seq) enables the measurement of transcriptomic information of thousands of individual cells in one experiment. Analyses of such data provide information that was not accessible using bulk sequencing, which can only assess average properties of cell populations. Single-cell measurements, however, can capture the heterogeneity of a population of cells. In particular, single-cell studies allow for the identification of novel cell types, states, and dynamics.

 

One of the most prominent uses of the scRNA-seq technology is the identification of subpopulations of cells present in a sample and comparing such subpopulations across samples. Such information is crucial for understanding the heterogeneity of cells in a sample and for comparative analysis of samples from different conditions, tissues, and species. A frequently used approach is to cluster every dataset separately, inspect marker genes for each cluster, and compare these clusters in an attempt to determine which cell types were shared between samples. This approach, however, relies on the existence of predefined or clearly identifiable marker genes and their consistent measurement across subpopulations.

 

Although the aligned data can then be clustered to reveal subpopulations and their correspondence, solving the subpopulation-mapping problem by performing global alignment first and clustering second overlooks the original information about subpopulations existing in each experiment. In contrast, an approach addressing this problem directly might represent a more suitable solution. So, keeping this in mind the researchers developed a computational method, single-cell subpopulations comparison (scPopCorn), that allows for comparative analysis of two or more single-cell populations.

 

The performance of scPopCorn was tested in three distinct settings. First, its potential was demonstrated in identifying and aligning subpopulations from single-cell data from human and mouse pancreatic single-cell data. Next, scPopCorn was applied to the task of aligning biological replicates of mouse kidney single-cell data. scPopCorn achieved the best performance over the previously published tools. Finally, it was applied to compare populations of cells from cancer and healthy brain tissues, revealing the relation of neoplastic cells to neural cells and astrocytes. Consequently, as a result of this integrative approach, scPopCorn provides a powerful tool for comparative analysis of single-cell populations.

 

This scPopCorn is basically a computational method for the identification of subpopulations of cells present within individual single-cell experiments and mapping of these subpopulations across these experiments. Different from other approaches, scPopCorn performs the tasks of population identification and mapping simultaneously by optimizing a function that combines both objectives. When applied to complex biological data, scPopCorn outperforms previous methods. However, it should be kept in mind that scPopCorn assumes the input single-cell data to consist of separable subpopulations and it is not designed to perform a comparative analysis of single cell trajectories datasets that do not fulfill this constraint.

 

Several innovations developed in this work contributed to the performance of scPopCorn. First, unifying the above-mentioned tasks into a single problem statement allowed for integrating the signal from different experiments while identifying subpopulations within each experiment. Such an incorporation aids the reduction of biological and experimental noise. The researchers believe that the ideas introduced in scPopCorn not only enabled the design of a highly accurate identification of subpopulations and mapping approach, but can also provide a stepping stone for other tools to interrogate the relationships between single cell experiments.

 

References:

 

https://www.sciencedirect.com/science/article/pii/S2405471219301887

 

https://www.tandfonline.com/doi/abs/10.1080/23307706.2017.1397554

 

https://ieeexplore.ieee.org/abstract/document/4031383

 

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0927-y

 

https://www.sciencedirect.com/science/article/pii/S2405471216302666

 

 

Read Full Post »


eProceedings for BIO 2019 International Convention, June 3-6, 2019 Philadelphia Convention Center; Philadelphia PA, Real Time Coverage by Stephen J. Williams, PhD @StephenJWillia2

 

CONFERENCE OVERVIEW

Real Time Coverage of BIO 2019 International Convention, June 3-6, 2019 Philadelphia Convention Center; Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/05/31/real-time-coverage-of-bio-international-convention-june-3-6-2019-philadelphia-convention-center-philadelphia-pa/

 

LECTURES & PANELS

Real Time Coverage @BIOConvention #BIO2019: Machine Learning and Artificial Intelligence: Realizing Precision Medicine One Patient at a Time, 6/5/2019, Philadelphia PA

Reporter: Stephen J Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-machine-learning-and-artificial-intelligence-realizing-precision-medicine-one-patient-at-a-time/

 

Real Time Coverage @BIOConvention #BIO2019: Genome Editing and Regulatory Harmonization: Progress and Challenges, 6/5/2019. Philadelphia PA

Reporter: Stephen J Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-genome-editing-and-regulatory-harmonization-progress-and-challenges/

 

Real Time Coverage @BIOConvention #BIO2019: Precision Medicine Beyond Oncology June 5, 2019, Philadelphia PA

Reporter: Stephen J Williams PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-precision-medicine-beyond-oncology-june-5-philadelphia-pa/

 

Real Time @BIOConvention #BIO2019:#Bitcoin Your Data! From Trusted Pharma Silos to Trustless Community-Owned Blockchain-Based Precision Medicine Data Trials, 6/5/2019, Philadelphia PA

Reporter: Stephen J Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-bioconvention-bio2019bitcoin-your-data-from-trusted-pharma-silos-to-trustless-community-owned-blockchain-based-precision-medicine-data-trials/

 

Real Time Coverage @BIOConvention #BIO2019: Keynote Address Jamie Dimon CEO @jpmorgan June 5, 2019, Philadelphia, PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/05/real-time-coverage-bioconvention-bio2019-keynote-address-jamie-dimon-ceo-jpmorgan-june-5-philadelphia/

 

Real Time Coverage @BIOConvention #BIO2019: Chat with @FDA Commissioner, & Challenges in Biotech & Gene Therapy June 4, 2019, Philadelphia, PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-chat-with-fda-commissioner-challenges-in-biotech-gene-therapy-june-4-philadelphia/

 

Falling in Love with Science: Championing Science for Everyone, Everywhere June 4 2019, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-falling-in-love-with-science-championing-science-for-everyone-everywhere/

 

Real Time Coverage @BIOConvention #BIO2019: June 4 Morning Sessions; Global Biotech Investment & Public-Private Partnerships, 6/4/2019, Philadelphia PA

Reporter: Stephen J Williams PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-june-4-morning-sessions-global-biotech-investment-public-private-partnerships/

 

Real Time Coverage @BIOConvention #BIO2019: Understanding the Voices of Patients: Unique Perspectives on Healthcare; June 4, 2019, 11:00 AM, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-understanding-the-voices-of-patients-unique-perspectives-on-healthcare-june-4/

 

Real Time Coverage @BIOConvention #BIO2019: Keynote: Siddhartha Mukherjee, Oncologist and Pulitzer Author; June 4 2019, 9AM, Philadelphia PA

Reporter: Stephen J. Williams, PhD. @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/04/real-time-coverage-bioconvention-bio2019-keynote-siddhartha-mukherjee-oncologist-and-pulitzer-author-june-4-9am-philadelphia-pa/

 

Real Time Coverage @BIOConvention #BIO2019:  Issues of Risk and Reproduceability in Translational and Academic Collaboration; 2:30-4:00 June 3, 2019, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-issues-of-risk-and-reproduceability-in-translational-and-academic-collaboration-230-400-june-3-philadelphia-pareal-time-coverage-bioconvention-bi/

 

Real Time Coverage @BIOConvention #BIO2019: What’s Next: The Landscape of Innovation in 2019 and Beyond. 3-4 PM June 3, 2019, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-whats-next-the-landscape-of-innovation-in-2019-and-beyond-3-4-pm-june-3-philadelphia-pa/

 

Real Time Coverage @BIOConvention #BIO2019: After Trump’s Drug Pricing Blueprint: What Happens Next? A View from Washington; June 3, 2019 1:00 PM, Philadelphia PA

Reporter: Stephen J. Williams, PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-after-trumps-drug-pricing-blueprint-what-happens-next-a-view-from-washington-june-3-2019-100-pm-philadelphia-pa/

 

Real Time Coverage @BIOConvention #BIO2019: International Cancer Clusters Showcase June 3, 2019, Philadelphia PA

Reporter: Stephen J. Williams PhD @StephenJWillia2

https://pharmaceuticalintelligence.com/2019/06/03/real-time-coverage-bioconvention-bio2019-international-cancer-clusters-showcase-june-3-philadelphia-pa/

Read Full Post »


Real Time Coverage @BIOConvention #BIO2019: Machine Learning and Artificial Intelligence: Realizing Precision Medicine One Patient at a Time

Reporter: Stephen J Williams, PhD @StephenJWillia2

The impact of Machine Learning (ML) and Artificial Intelligence (AI) during the last decade has been tremendous. With the rise of infobesity, ML/AI is evolving to an essential capability to help mine the sheer volume of patient genomics, omics, sensor/wearables and real-world data, and unravel the knot of healthcare’s most complex questions.

Despite the advancements in technology, organizations struggle to prioritize and implement ML/AI to achieve the anticipated value, whilst managing the disruption that comes with it. In this session, panelists will discuss ML/AI implementation and adoption strategies that work. Panelists will draw upon their experiences as they share their success stories, discuss how to implement digital diagnostics, track disease progression and treatment, and increase commercial value and ROI compared against traditional approaches.

  • most of trials which are done are still in training AI/ML algorithms with training data sets.  The best results however have been about 80% accuracy in training sets.  Needs to improve
  • All data sets can be biased.  For example a professor was looking at heartrate using a IR detector on a wearable but it wound up that different types of skin would generate a different signal to the detector so training sets maybe population biases (you are getting data from one group)
  • clinical grade equipment actually haven’t been trained on a large set like commercial versions of wearables, Commercial grade is tested on a larger study population.  This can affect the AI/ML algorithms.
  • Regulations:  The regulatory bodies responsible is up to debate.  Whether FDA or FTC is responsible for AI/ML in healtcare and healthcare tech and IT is not fully decided yet.  We don’t have the guidances for these new technologies
  • some rules: never use your own encryption always use industry standards especially when getting personal data from wearables.  One hospital corrupted their system because their computer system was not up to date and could not protect against a virus transmitted by a wearable.
  • pharma companies understand they need to increase value of their products so very interested in how AI/ML can be used.

Please follow LIVE on TWITTER using the following @ handles and # hashtags:

@Handles

@pharma_BI

@AVIVA1950

@BIOConvention

# Hashtags

#BIO2019 (official meeting hashtag)

Read Full Post »


BioInformatic Resources at the Environmental Protection Agency: Tools and Webinars on Toxicity Prediction

Curator Stephen J. Williams Ph.D.

New GenRA Module in EPA’s CompTox Dashboard Will Help Predict Potential Chemical Toxicity

Published September 25, 2018

As part of its ongoing computational toxicology research, EPA is developing faster and improved approaches to evaluate chemicals for potential health effects.  One commonly applied approach is known as chemical read-across. Read-across uses information about how a chemical with known data behaves to make a prediction about the behavior of another chemical that is “similar” but does not have as much data. Current read-across, while cost-effective, relies on a subjective assessment, which leads to varying predictions and justifications depending on who undertakes and evaluates the assessment.

To reduce uncertainties and develop a more objective approach, EPA researchers have developed an automated read-across tool called Generalized Read-Across (GenRA), and added it to the newest version of the EPA Computational Toxicology Dashboard. The goal of GenRA is to encode as many expert considerations used within current read-across approaches as possible and combine these with data-driven approaches to transition read-across towards a more systematic and data-based method of making predictions.

EPA chemist Dr. Grace Patlewicz says it was this uncertainty that motivated the development of GenRA. “You don’t actually know if you’ve been successful at using read-across to help predict chemical toxicity because it’s a judgement call based on one person versus the next. That subjectivity is something we were trying to move away from.” Patlewicz says.

Since toxicologists and risk assessors are already familiar with read-across, EPA researchers saw value in creating a tool that that was aligned with the current read-across workflow but which addressed uncertainty using data analysis methods in what they call a “harmonized-hybrid workflow.”

In its current form, GenRA lets users find analogues, or chemicals that are similar to their target chemical, based on chemical structural similarity. The user can then select which analogues they want to carry forward into the GenRA prediction by exploring the consistency and concordance of the underlying experimental data for those analogues. Next, the tool predicts toxicity effects of specific repeated dose studies. Then, a plot with these outcomes is generated based on a similarity-weighted activity of the analogue chemicals the user selected. Finally, the user is presented with a data matrix view showing whether a chemical is predicted to be toxic (yes or no) for a chosen set of toxicity endpoints, with a quantitative measure of uncertainty.

The team is also comparing chemicals based on other similarity contexts, such as physicochemical characteristics or metabolic similarity, as well as extending the approach to make quantitative predictions of toxicity.

Patlewicz thinks incorporating other contexts and similarity measures will refine GenRA to make better toxicity predictions, fulfilling the goal of creating a read-across method capable of assessing thousands of chemicals that currently lack toxicity data.

“That’s the direction that we’re going in,” Patlewicz says. “Recognizing where we are and trying to move towards something a little bit more objective, showing how aspects of the current read-across workflow could be refined.”

Learn more at: https://comptox.epa.gov

 

A listing of EPA Tools for Air Quality Assessment

Tools

  • Atmospheric Model Evaluation Tool (AMET)
    AMET helps in the evaluation of meteorological and air quality simulations.
  • Benchmark Dose Software (BMDS)
    EPA developed the Benchmark Dose Software (BMDS) as a tool to help estimate dose or exposure of a chemical or chemical mixture associated with a given response level. The methodology is used by EPA risk assessors and is fast becoming the world’s standard for dose-response analysis for risk assessments, including air pollution risk assessments.
  • BenMAP
    BenMAP is a Windows-based computer program that uses a Geographic Information System (GIS)-based to estimate the health impacts and economic benefits occurring when populations experience changes in air quality.
  • Community-Focused Exposure and Risk Screening Tool (C-FERST)
    C-FERST is an online tool developed by EPA in collaboration with stakeholders to provide access to resources that can be used with communities to help identify and learn more about their environmental health issues and explore exposure and risk reduction options.
  • Community Health Vulnerability Index
    EPA scientists developed a Community Health Vulnerability Index that can be used to help identify communities at higher health risk from wildfire smoke. Breathing smoke from a nearby wildfire is a health threat, especially for people with lung or heart disease, diabetes and high blood pressure as well as older adults, and those living in communities with poverty, unemployment and other indicators of social stress. Health officials can use the tool, in combination with air quality models, to focus public health strategies on vulnerable populations living in areas where air quality is impaired, either by wildfire smoke or other sources of pollution. The work was published in Environmental Science & Technology.
  • Critical Loads Mapper Tool
    The Critical Loads Mapper Tool can be used to help protect terrestrial and aquatic ecosystems from atmospheric deposition of nitrogen and sulfur, two pollutants emitted from fossil fuel burning and agricultural emissions. The interactive tool provides easy access to information on deposition levels through time; critical loads, which identify thresholds when pollutants have reached harmful levels; and exceedances of these thresholds.
  • EnviroAtlas
    EnviroAtlas provides interactive tools and resources for exploring the benefits people receive from nature or “ecosystem goods and services”. Ecosystem goods and services are critically important to human health and well-being, but they are often overlooked due to lack of information. Using EnviroAtlas, many types of users can access, view, and analyze diverse information to better understand the potential impacts of various decisions.
  • EPA Air Sensor Toolbox for Citizen Scientists
    EPA’s Air Sensor Toolbox for Citizen Scientists provides information and guidance on new low-cost compact technologies for measuring air quality. Citizens are interested in learning more about local air quality where they live, work and play. EPA’s Toolbox includes information about: Sampling methodologies; Calibration and validation approaches; Measurement methods options; Data interpretation guidelines; Education and outreach; and Low cost sensor performance information.
  • ExpoFIRST
    The Exposure Factors Interactive Resource for Scenarios Tool (ExpoFIRST) brings data from EPA’s Exposure Factors Handbook: 2011 Edition (EFH) to an interactive tool that maximizes flexibility and transparency for exposure assessors. ExpoFIRST represents a significant advance for regional, state, and local scientists in performing and documenting calculations for community and site-specific exposure assessments, including air pollution exposure assessments.
  • EXPOsure toolbox (ExpoBox)
    This is a toolbox created to assist individuals from within government, industry, academia, and the general public with assessing exposure, including exposure to air contaminants, fate and transport processes of air pollutants and their potential exposure concentrations. It is a compendium of exposure assessment tools that links to guidance documents, databases, models, reference materials, and other related resources.
  • Federal Reference & Federal Equivalency Methods
    EPA scientists develop and evaluate Federal Reference Methods and Federal Equivalency Methods for accurately and reliably measuring six primary air pollutants in outdoor air. These methods are used by states and other organizations to assess implementation actions needed to attain National Ambient Air Quality Standards.
  • Fertilizer Emission Scenario Tool for CMAQ (FEST-C)
    FEST-C facilitates the definition and simulation of new cropland farm management system scenarios or editing of existing scenarios to drive Environmental Policy Integrated Climate model (EPIC) simulations.  For the standard 12km continental Community Multi-Scale Air Quality model (CMAQ) domain, this amounts to about 250,000 simulations for the U.S. alone. It also produces gridded daily EPIC weather input files from existing hourly Meteorology-Chemistry Interface Processor (MCIP) files, transforms EPIC output files to CMAQ-ready input files and links directly to Visual Environment for Rich Data Interpretation (VERDI) for spatial visualization of input and output files. The December 2012 release will perform all these functions for any CMAQ grid scale or domain.
  • Instruction Guide and Macro Analysis Tool for Community-led Air Monitoring 
    EPA has developed two tools for evaluating the performance of low-cost sensors and interpreting the data they collect to help citizen scientists, communities, and professionals learn about local air quality.
  • Integrated Climate and Land use Scenarios (ICLUS)
    Climate change and land-use change are global drivers of environmental change. Impact assessments frequently show that interactions between climate and land-use changes can create serious challenges for aquatic ecosystems, water quality, and air quality. Population projections to 2100 were used to model the distribution of new housing across the landscape. In addition, housing density was used to estimate changes in impervious surface cover.  A final report, datasets, the ICLUS+ Web Viewer and ArcGIS tools are available.
  • Indoor Semi-Volatile Organic Compound (i-SVOC)
    i-SVOC Version 1.0 is a general-purpose software application for dynamic modeling of the emission, transport, sorption, and distribution of semi-volatile organic compounds (SVOCs) in indoor environments. i-SVOC supports a variety of uses, including exposure assessment and the evaluation of mitigation options. SVOCs are a diverse group of organic chemicals that can be found in: Many are also present in indoor air, where they tend to bind to interior surfaces and particulate matter (dust).

    • Pesticides;
    • Ingredients in cleaning agents and personal care products;
    • Additives to vinyl flooring, furniture, clothing, cookware, food packaging, and electronics.
  • Municipal Solid Waste Decision Support Tool (MSW DST)EXIT
    This tool is designed to aid solid waste planners in evaluating the cost and environmental aspects of integrated municipal solid waste management strategies. The tool is the result of collaboration between EPA and RTI International and its partners.
  • Optical Noise-Reduction Averaging (ONA) Program Improves Black Carbon Particle Measurements Using Aethalometers
    ONA is a program that reduces noise in real-time black carbon data obtained using Aethalometers. Aethalometers optically measure the concentration of light absorbing or “black” particles that accumulate on a filter as air flows through it. These particles are produced by incomplete fossil fuel, biofuel and biomass combustion. Under polluted conditions, they appear as smoke or haze.
  • RETIGO tool
    Real Time Geospatial Data Viewer (RETIGO) is a free, web-based tool that shows air quality data that are collected while in motion (walking, biking or in a vehicle). The tool helps users overcome technical barriers to exploring air quality data. After collecting measurements, citizen scientists and other users can import their own data and explore the data on a map.
  • Remote Sensing Information Gateway (RSIG)
    RSIG offers a new way for users to get the multi-terabyte, environmental datasets they want via an interactive, Web browser-based application. A file download and parsing process that now takes months will be reduced via RSIG to minutes.
  • Simulation Tool Kit for Indoor Air Quality and Inhalation Exposure (IAQX)
    IAQX version 1.1 is an indoor air quality (IAQ) simulation software package that complements and supplements existing indoor air quality simulation (IAQ) programs. IAQX is for advanced users who have experience with exposure estimation, pollution control, risk assessment, and risk management. There are many sources of indoor air pollution, such as building materials, furnishings, and chemical cleaners. Since most people spend a large portion of their time indoors, it is important to be able to estimate exposure to these pollutants. IAQX helps users analyze the impact of pollutant sources and sinks, ventilation, and air cleaners. It performs conventional IAQ simulations to calculate the pollutant concentration and/or personal exposure as a function of time. It can also estimate adequate ventilation rates based on user-provided air quality criteria. This is a unique feature useful for product stewardship and risk management.
  • Spatial Allocator
    The Spatial Allocator provides tools that could be used by the air quality modeling community to perform commonly needed spatial tasks without requiring the use of a commercial Geographic Information System (GIS).
  • Traceability Protocol for Assay and Certification of Gaseous Calibration Standards
    This is used to certify calibration gases for ambient and continuous emission monitors. It specifies methods for assaying gases and establishing traceability to National Institute of Standards and Technology (NIST) reference standards. Traceability is required under EPA ambient and continuous emission monitoring regulations.
  • Watershed Deposition Mapping Tool (WDT)
    WDT provides an easy to use tool for mapping the deposition estimates from CMAQ to watersheds to provide the linkage of air and water needed for TMDL (Total Maximum Daily Load) and related nonpoint-source watershed analyses.
  • Visual Environment for Rich Data Interpretation (VERDI)
    VERDI is a flexible, modular, Java-based program for visualizing multivariate gridded meteorology, emissions, and air quality modeling data created by environmental modeling systems such as CMAQ and the Weather Research and Forecasting (WRF) model.

 

Databases

  • Air Quality Data for the CDC National Environmental Public Health Tracking Network 
    EPA’s Exposure Research scientists are collaborating with the Centers for Disease Control and Prevention (CDC) on a CDC initiative to build a National Environmental Public Health Tracking (EPHT) network. Working with state, local and federal air pollution and health agencies, the EPHT program is facilitating the collection, integration, analysis, interpretation, and dissemination of data from environmental hazard monitoring, and from human exposure and health effects surveillance. These data provide scientific information to develop surveillance indicators, and to investigate possible relationships between environmental exposures, chronic disease, and other diseases, that can lead to interventions to reduce the burden of theses illnesses. An important part of the initiative is air quality modeling estimates and air quality monitoring data, combined through Bayesian modeling that can be linked with health outcome data.
  • EPAUS9R – An Energy Systems Database for use with the Market Allocation (MARKAL) Model
    The EPAUS9r is a regional database representation of the United States energy system. The database uses the MARKAL model. MARKAL is an energy system optimization model used by local and federal governments, national and international communities and academia. EPAUS9r represents energy supply, technology, and demand throughout the major sectors of the U.S. energy system.
  • Fused Air Quality Surfaces Using Downscaling
    This database provides access to the most recent O3 and PM2.5 surfaces datasets using downscaling.
  • Health & Environmental Research Online (HERO)
    HERO provides access to scientific literature used to support EPA’s integrated science assessments, including the  Integrated Science Assessments (ISA) that feed into the National Ambient Air Quality (NAAQS) reviews.
  • SPECIATE 4.5 Database
    SPECIATE is a repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources.

A listing of EPA Tools and Databases for Water Contaminant Exposure Assessment

Exposure and Toxicity

  • EPA ExpoBox (A Toolbox for Exposure Assessors)
    This toolbox assists individuals from within government, industry, academia, and the general public with assessing exposure from multiple media, including water and sediment. It is a compendium of exposure assessment tools that links to guidance documents, databases, models, reference materials, and other related resources.

Chemical and Product Categories (CPCat) Database
CPCat is a database containing information mapping more than 43,000 chemicals to a set of terms categorizing their usage or function. The comprehensive list of chemicals with associated categories of chemical and product use was compiled from publically available sources. Unique use category taxonomies from each source are mapped onto a single common set of approximately 800 terms. Users can search for chemicals by chemical name, Chemical Abstracts Registry Number, or by CPCat terms associated with chemicals.

A listing of EPA Tools and Databases for Chemical Toxicity Prediction & Assessment

  • Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)
    SeqAPASS is a fast, online screening tool that allows researchers and regulators to extrapolate toxicity information across species. For some species, such as humans, mice, rats, and zebrafish, the EPA has a large amount of data regarding their toxicological susceptibility to various chemicals. However, the toxicity data for numerous other plants and animals is very limited. SeqAPASS extrapolates from these data rich model organisms to thousands of other non-target species to evaluate their specific potential chemical susceptibility.

 

A listing of EPA Webinar and Literature on Bioinformatic Tools and Projects

Comparative Bioinformatics Applications for Developmental Toxicology

Discuss how the US EPA/NCCT is trying to solve the problem of too many chemicals, too high cost, and too much biological uncertainty Discuss the solution the ToxCast Program is proposing; a data-rich system to screen, classify and rank chemicals for further evaluation

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=186844

CHEMOINFORMATIC AND BIOINFORMATIC CHALLENGES AT THE US ENVIRONMENTAL PROTECTION AGENCY.

This presentation will provide an overview of both the scientific program and the regulatory activities related to computational toxicology. This presentation will provide an overview of both the scientific program and the regulatory activities related to computational toxicology.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=154013

How Can We Use Bioinformatics to Predict Which Agents Will Cause Birth Defects?

The availability of genomic sequences from a growing number of human and model organisms has provided an explosion of data, information, and knowledge regarding biological systems and disease processes. High-throughput technologies such as DNA and protein microarray biochips are now standard tools for probing the cellular state and determining important cellular behaviors at the genomic/proteomic levels. While these newer technologies are beginning to provide important information on cellular reactions to toxicant exposure (toxicogenomics), a major challenge that remains is the formulation of a strategy to integrate transcript, protein, metabolite, and toxicity data. This integration will require new concepts and tools in bioinformatics. The U.S. National Library of Medicine’s Pubmed site includes 19 million citations and abstracts and continues to grow. The BDSM team is now working on assembling the literature’s unstructured data into a structured database and linking it to BDSM within a system that can then be used for testing and generating new hypotheses. This effort will generate data bases of entities (such as genes, proteins, metabolites, gene ontology processes) linked to PubMed identifiers/abstracts and providing information on the relationships between them. The end result will be an online/standalone tool that will help researchers to focus on the papers most relevant to their query and uncover hidden connections and obvious information gaps.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=227345

ADVANCED PROTEOMICS AND BIOINFORMATICS TOOLS IN TOXICOLOGY RESEARCH: OVERCOMING CHALLENGES TO PROVIDE SIGNIFICANT RESULTS

This presentation specifically addresses the advantages and limitations of state of the art gel, protein arrays and peptide-based labeling proteomic approaches to assess the effects of a suite of model T4 inhibitors on the thyroid axis of Xenopus laevis.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NHEERL&dirEntryId=152823

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=344452

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=03%2F26%2F2014&dateEndPublishedPresented=03%2F26%2F2019&dirEntryId=344452&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=04%2F02%2F2014&dateEndPublishedPresented=04%2F02%2F2019&dirEntryId=344452&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=04%2F02%2F2014&dateEndPublishedPresented=04%2F02%2F2019&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

 

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=03%2F26%2F2014&dateEndPublishedPresented=03%2F26%2F2019&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=04%2F11%2F2014&dateEndPublishedPresented=04%2F11%2F2019&dirEntryId=344452&fed_org_id=111&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

Bioinformatic Integration of in vivo Data and Literature-based Gene Associations for Prioritization of Adverse Outcome Pathway Development

Adverse outcome pathways (AOPs) describe a sequence of events, beginning with a molecular initiating event (MIE), proceeding via key events (KEs), and culminating in an adverse outcome (AO). A challenge for use of AOPs in a safety evaluation context has been identification of MIEs and KEs relevant for AOs observed in regulatory toxicity studies. In this work, we implemented a bioinformatic approach that leverages mechanistic information in the literature and the AOs measured in regulatory toxicity studies to prioritize putative MIEs and/or early KEs for AOP development relevant to chemical safety evaluation. The US Environmental Protection Agency Toxicity Reference Database (ToxRefDB, v2.0) contains effect information for >1000 chemicals curated from >5000 studies or summaries from sources including data evaluation records from the US EPA Office of Pesticide Programs, the National Toxicology Program (NTP), peer-reviewed literature, and pharmaceutical preclinical studies. To increase ToxRefDB interoperability, endpoint and effect information were cross-referenced with codes from the United Medical Language System, which enabled mapping of in vivo pathological effects from ToxRefDB to PubMed (via Medical Subject Headings or MeSH). This enabled linkage to any resource that is also connected to PubMed or indexed with MeSH. A publicly available bioinformatic tool, the Entity-MeSH Co-occurrence Network (EMCON), uses multiple data sources and a measure of mutual information to identify genes most related to a MeSH term. Using EMCON, gene sets were generated for endpoints of toxicological relevance in ToxRefDB linking putative KEs and/or MIEs. The Comparative Toxicogenomics Database was used to further filter important associations. As a proof of concept, thyroid-related effects and their highly associated genes were examined, and demonstrated relevant MIEs and early KEs for AOPs to describe thyroid-related AOs. The ToxRefDB to gene mapping for thyroid resulted in >50 unique gene to chemical relationships. Integrated use of EMCON and ToxRefDB data provides a basis for rapid and robust putative AOP development, as well as a novel means to generate mechanistic hypotheses for specific chemicals. This abstract does not necessarily reflect U.S. EPA policy. Abstract and Poster for 2019 Society of Toxicology annual meeting in March 2019

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dateBeginPublishedPresented=04%2F11%2F2014&dateEndPublishedPresented=04%2F11%2F2019&dirEntryId=344452&keyword=Chemical+Safety&showCriteria=2&sortBy=pubDateYear&subject=Chemical+Safety+Research

A Web-Hosted R Workflow to Simplify and Automate the Analysis of 16S NGS Data

Next-Generation Sequencing (NGS) produces large data sets that include tens-of-thousands of sequence reads per sample. For analysis of bacterial diversity, 16S NGS sequences are typically analyzed in a workflow that containing best-of-breed bioinformatics packages that may leverage multiple programming languages (e.g., Python, R, Java, etc.). The process totransform raw NGS data to usable operational taxonomic units (OTUs) can be tedious due tothe number of quality control (QC) steps used in QIIME and other software packages forsample processing. Therefore, the purpose of this work was to simplify the analysis of 16SNGS data from a large number of samples by integrating QC, demultiplexing, and QIIME(Quantitative Insights Into Microbial Ecology) analysis in an accessible R project. User command line operations for each of the pipeline steps were automated into a workflow. In addition, the R server allows multi-user access to the automated pipeline via separate useraccounts while providing access to the same large set of underlying data. We demonstratethe applicability of this pipeline automation using 16S NGS data from approximately 100 stormwater runoff samples collected in a mixed-land use watershed in northeast Georgia. OTU tables were generated for each sample and the relative taxonomic abundances were compared for different periods over storm hydrographs to determine how the microbial ecology of a stream changes with rise and fall of stream stage. Our approach simplifies the pipeline analysis of multiple 16S NGS samples by automating multiple preprocessing, QC, analysis and post-processing command line steps that are called by a sequence of R scripts. Presented at ASM 2015 Rapid NGS Bioinformatic Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NERL&dirEntryId=309890

DEVELOPING COMPUTATIONAL TOOLS NECESSARY FOR APPLYING TOXICOGENOMICS TO RISK ASSESSMENT AND REGULATORY DECISION MAKING.

GENOMICS, PROTEOMICS & METABOLOMICS CAN PROVIDE USEFUL WEIGHT-OF-EVIDENCE DATA ALONG THE SOURCE-TO-OUTCOME CONTINUUM, WHEN APPROPRIATE BIOINFORMATIC AND COMPUTATIONAL METHODS ARE APPLIED TOWARDS INTEGRATING MOLECULAR, CHEMICAL AND TOXICOGICAL INFORMATION. GENOMICS, PROTEOMICS & METABOLOMICS CAN PROVIDE USEFUL WEIGHT-OF-EVIDENCE DATA ALONG THE SOURCE-TO-OUTCOME CONTINUUM, WHEN APPROPRIATE BIOINFORMATIC AND COMPUTATIONAL METHODS ARE APPLIED TOWARDS INTEGRATING MOLECULAR, CHEMICAL AND TOXICOGICAL INFORMATION.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=156264

The Human Toxome Project

The Human Toxome project, funded as an NIH Transformative Research grant 2011–‐ 2016, is focused on developing the concepts and the means for deducing, validating, and sharing molecular Pathways of Toxicity (PoT). Using the test case of estrogenic endocrine disruption, the responses of MCF–‐7 human breast cancer cells are being phenotyped by transcriptomics and mass–‐spectroscopy–‐based metabolomics. The bioinformatics tools for PoT deduction represent a core deliverable. A number of challenges for quality and standardization of cell systems, omics technologies, and bioinformatics are being addressed. In parallel, concepts for annotation, validation, and sharing of PoT information, as well as their link to adverse outcomes, are being developed. A reasonably comprehensive public database of PoT, the Human Toxome Knowledge–‐base, could become a point of reference for toxicological research and regulatory tests strategies. A reasonably comprehensive public database of PoT, the Human Toxome Knowledge–‐base, could become a point of reference for toxicological research and regulatory tests strategies.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NCCT&dirEntryId=309453

High-Resolution Metabolomics for Environmental Chemical Surveillance and Bioeffect Monitoring

High-Resolution Metabolomics for Environmental Chemical Surveillance and Bioeffect Monitoring (Presented by: Dean Jones, PhD, Department of Medicine, Emory University) (2/28/2013)

https://www.epa.gov/chemical-research/high-resolution-metabolomics-environmental-chemical-surveillance-and-bioeffect

Identification of Absorption, Distribution, Metabolism, and Excretion (ADME) Genes Relevant to Steatosis Using a Gene Expression Approach

Absorption, distribution, metabolism, and excretion (ADME) impact chemical concentration and activation of molecular initiating events of Adverse Outcome Pathways (AOPs) in cellular, tissue, and organ level targets. In order to better describe ADME parameters and how they modulate potential hazards posed by chemical exposure, our goal is to investigate the relationship between AOPs and ADME related genes and functional information. Given the scope of this task, we began using hepatic steatosis as a case study. To identify ADME genes related to steatosis, we used the publicly available toxicogenomics database, Open TG-GATEsTM. This database contains standardized rodent chemical exposure data from 170 chemicals (mostly drugs), along with differential gene expression data and corresponding associated pathological changes. We examined the chemical exposure microarray data set gathered from 9 chemical exposure treatments resulting in pathologically confirmed (minimal, moderate and severe) incidences of hepatic steatosis. From this differential gene expression data set, we utilized differential expression analyses to identify gene changes resulting from the chemical exposures leading to hepatic steatosis. We then selected differentially expressed genes (DEGs) related to ADME by filtering all genes based on their ADME functional identities. These DEGs include enzymes such as cytochrome p450, UDP glucuronosyltransferase, flavin-containing monooxygenase and transporter genes such as solute carriers and ATP-binding cassette transporter families. The up and downregulated genes were identified across these treatments. Total of 61 genes were upregulated and 68 genes were down regulated in all treatments. Meanwhile, 25 genes were both up regulated and downregulated across all the treatments. This work highlights the application of bioinformatics in linking AOPs with gene modulations specifically in relationships to ADME and exposures to chemicals. This abstract does not necessarily reflect U.S. EPA policy. This work highlights the application of bioinformatics tools to identify genes that are modulated by adverse outcomes. Specifically, we delineate a method to identify genes that are related to ADME and can impact target tissue dose in response to chemical exposures. The computational method outlined in this work is applicable to any adverse outcome pathway, and provide a linkage between chemical exposure, target tissue dose, and adverse outcomes. Application of this method will allow for the rapid screening of chemicals for their impact on ADME-related genes using available gene data bases in literature.

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NHEERL&dirEntryId=341273

Development of Environmental Fate and Metabolic Simulators

Presented at Bioinformatics Open Source Conference (BOSC), Detroit, MI, June 23-24, 2005. see description

https://cfpub.epa.gov/si/si_public_record_report.cfm?Lab=NERL&dirEntryId=257172

 

Useful Webinars on EPA Computational Tools and Informatics

 

Computational Toxicology Communities of Practice

Computational Toxicology Research

EPA’s Computational Toxicology Communities of Practice is composed of hundreds of stakeholders from over 50 public and private sector organizations (ranging from EPA, other federal agencies, industry, academic institutions, professional societies, nongovernmental organizations, environmental non-profit groups, state environmental agencies and more) who have an interest in using advances in computational toxicology and exposure science to evaluate the safety of chemicals.

The Communities of Practice is open to the public. Monthly webinars are held at EPA’s RTP campus, on the fourth Thursday of the month (occasionally rescheduled in November and December to accommodate holiday schedules), from 11am-Noon EST/EDT. Remote participation is available. For more information or to be added to the meeting email list, contact: Monica Linnenbrink (linnenbrink.monica@epa.gov).

Related Links

Past Webinar Presentations

Presentation File Presented By Date
OPEn structure-activity Relationship App (OPERA) Powerpoint(VideoEXIT) Dr. Kamel Mansouri, Lead Computational Chemist contractor for Integrated Laboratory Systems in the National Institute of Environmental Health Sciences 2019/4/25
CompTox Chemicals Dashboard and InVitroDB V3 (VideoEXIT) Dr. Antony Williams, Chemist in EPA’s National Center for Computational Toxicology and Dr. Katie Paul-Friedman, Toxicologist in EPA’s National Center for Computational Toxicology 2019/3/28
The Systematic Empirical Evaluation of Models (SEEM) framework (VideoEXIT) Dr. John Wambaugh, Physical Scientist in EPA’s National Center for Computational Toxicology 2019/2/28
ToxValDB: A comprehensive database of quantitative in vivo study results from over 25,000 chemicals (VideoEXIT) Dr. Richard Judson, Research Chemist in EPA’s National Center for Computational Toxicology 2018/12/20
Sequence Alignment to Predict Across Species Susceptibility (seqAPASS) (VideoEXIT) Dr. Carlie LaLone, Bioinformaticist, EPA’s National Health and Environmental Effects Research Laboratory 2018/11/29
Chemicals and Products Database (VideoEXIT) Dr. Kathie Dionisio, Environmental Health Scientist, EPA’s National Exposure Research Laboratory 2018/10/25
CompTox Chemicals Dashboard V3 (VideoEXIT) Dr. Antony Williams, Chemist, EPA National Center for Computational Toxicology (NCCT). 2018/09/27
Generalised Read-Across (GenRA) (VideoEXIT) Dr. Grace Patlewicz, Chemist, EPA National Center for Computational Toxicology (NCCT). 2018/08/23
EPA’S ToxCast Owner’s Manual  (VideoEXIT) Monica Linnenbrink, Strategic Outreach and Communication lead, EPA National Center for Computational Toxicology (NCCT). 2018/07/26
EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT)      (VideoEXIT) Elin Ulrich, Research Chemist in the Public Health Chemistry Branch, EPA National Exposure Research Laboratory (NERL). 2018/06/28
ECOTOX Knowledgebase: New Tools and Data Visualizations(VideoEXIT) Colleen Elonen, Translational Toxicology Branch, and Dr. Jennifer Olker, Systems Toxicology Branch, in the Mid-Continent Ecology Division of EPA’s National Health & Environmental Effects Research Laboratory (NHEERL) 2018/05/24
Investigating Chemical-Microbiota Interactions in Zebrafish (VideoEXIT) Tamara Tal, Biologist in the Systems Biology Branch, Integrated Systems Toxicology Division, EPA’s National Health & Environmental Effects Research Laboratory (NHEERL) 2018/04/26
The CompTox Chemistry Dashboard v2.6: Delivering Improved Access to Data and Real Time Predictions (VideoEXIT) Tony Williams, Computational Chemist, EPA’s National Center for Computational Toxicology (NCCT) 2018/03/29
mRNA Transfection Retrofits Cell-Based Assays with Xenobiotic Metabolism (VideoEXIT* Audio starts at 10:17) Steve Simmons, Research Toxicologist, EPA’s National Center for Computational Toxicology (NCCT) 2018/02/22
Development and Distribution of ToxCast and Tox21 High-Throughput Chemical Screening Assay Method Description(VideoEXIT) Stacie Flood, National Student Services Contractor, EPA’s National Center for Computational Toxicology (NCCT) 2018/01/25
High-throughput H295R steroidogenesis assay: utility as an alternative and a statistical approach to characterize effects on steroidogenesis (VideoEXIT) Derik Haggard, ORISE Postdoctoral Fellow, EPA’s National Center for Computational Toxicology (NCCT) 2017/12/14
Systematic Review for Chemical Assessments: Core Elements and Considerations for Rapid Response (VideoEXIT) Kris Thayer, Director, Integrated Risk Information System (IRIS) Division of EPA’s National Center for Environmental Assessment (NCEA) 2017/11/16
High Throughput Transcriptomics (HTTr) Concentration-Response Screening in MCF7 Cells (VideoEXIT) Joshua Harrill, Toxicologist, EPA’s National Center for Computational Toxicology (NCCT) 2017/10/26
Learning Boolean Networks from ToxCast High-Content Imaging Data Todor Antonijevic, ORISE Postdoc, EPA’s National Center for Computational Toxicology (NCCT) 2017/09/28
Suspect Screening of Chemicals in Consumer Products Katherine Phillips, Research Chemist, Human Exposure and Dose Modeling Branch, Computational Exposure Division, EPA’s National Exposure Research Laboratory (NERHL) 2017/08/31
The EPA CompTox Chemistry Dashboard: A Centralized Hub for Integrating Data for the Environmental Sciences (VideoEXIT) Antony Williams, Chemist, EPA’s National Center for Computational Toxicology (NCCT) 2017/07/27
Navigating Through the Minefield of Read-Across Tools and Frameworks: An Update on Generalized Read-Across (GenRA)(VideoEXIT)

 

Read Full Post »

Older Posts »