Feeds:
Posts
Comments

Posts Tagged ‘curation communities’


Leaders in Pharmaceutical Intelligence Presentation at The Life Sciences Collaborative

Curator: Stephen J. Williams, Ph.D. Website Analytics: Adam Sonnenberg, BSc Leaders in Pharmaceutical Intelligence presented their ongoing efforts to develop an open-access scientific and medical publishing and curation platform to The Life Science Collaborative, an executive pharmaceutical and biopharma networking group in the Philadelphia/New Jersey area.

Our Team

Slide1

For more information on the Vision, Funding Deals and Partnerships please see our site at https://pharmaceuticalintelligence.com/vision/

Slide2

For more information about our Team please see our site at https://pharmaceuticalintelligence.com/contributors-biographies/

Slide5

For more information of LPBI Deals and Partnerships please see our site at https://pharmaceuticalintelligence.com/joint-ventures/

Slide4

For more information about our BioMed E-Series please see our site at https://pharmaceuticalintelligence.com/biomed-e-books/

E-Book Titles by LPBI

LPBI book titles slide Slide8Slide3

Slide6

For more information on Real-Time Conference Coverage including a full list of Conferences Covered by LPBI please go to https://pharmaceuticalintelligence.com/press-coverage/

For more information on Real-Time Conference Coverage and a full listing of Conferences Covered by LPBI please go to:

https://pharmaceuticalintelligence.com/press-coverage/ Slide7

Slide10

The Pennsylvania (PA) and New Jersey (NJ) Biotech environment had been hit hard by the recession and loss of anchor big pharma companies however as highlighted by our interviews in “The Vibrant Philly Biotech Scene” and other news outlets, additional issues are preventing the PA/NJ area from achieving its full potential (discussions also with LSC)

Slide9Download the PowerPoint slides here: Presentationlsc

Read Full Post »


Artificial Intelligence Versus the Scientist: Who Will Win?

Will DARPA Replace the Human Scientist: Not So Fast, My Friend!

Writer, Curator: Stephen J. Williams, Ph.D.

scientistboxingwithcomputer

Last month’s issue of Science article by Jia You “DARPA Sets Out to Automate Research”[1] gave a glimpse of how science could be conducted in the future: without scientists. The article focused on the U.S. Defense Advanced Research Projects Agency (DARPA) program called ‘Big Mechanism”, a $45 million effort to develop computer algorithms which read scientific journal papers with ultimate goal of extracting enough information to design hypotheses and the next set of experiments,

all without human input.

The head of the project, artificial intelligence expert Paul Cohen, says the overall goal is to help scientists cope with the complexity with massive amounts of information. As Paul Cohen stated for the article:

“‘

Just when we need to understand highly connected systems as systems,

our research methods force us to focus on little parts.

                                                                                                                                                                                                               ”

The Big Mechanisms project aims to design computer algorithms to critically read journal articles, much as scientists will, to determine what and how the information contributes to the knowledge base.

As a proof of concept DARPA is attempting to model Ras-mutation driven cancers using previously published literature in three main steps:

  1. Natural Language Processing: Machines read literature on cancer pathways and convert information to computational semantics and meaning

One team is focused on extracting details on experimental procedures, using the mining of certain phraseology to determine the paper’s worth (for example using phrases like ‘we suggest’ or ‘suggests a role in’ might be considered weak versus ‘we prove’ or ‘provide evidence’ might be identified by the program as worthwhile articles to curate). Another team led by a computational linguistics expert will design systems to map the meanings of sentences.

  1. Integrate each piece of knowledge into a computational model to represent the Ras pathway on oncogenesis.
  2. Produce hypotheses and propose experiments based on knowledge base which can be experimentally verified in the laboratory.

The Human no Longer Needed?: Not So Fast, my Friend!

The problems the DARPA research teams are encountering namely:

  • Need for data verification
  • Text mining and curation strategies
  • Incomplete knowledge base (past, current and future)
  • Molecular biology not necessarily “requires casual inference” as other fields do

Verification

Notice this verification step (step 3) requires physical lab work as does all other ‘omics strategies and other computational biology projects. As with high-throughput microarray screens, a verification is needed usually in the form of conducting qPCR or interesting genes are validated in a phenotypical (expression) system. In addition, there has been an ongoing issue surrounding the validity and reproducibility of some research studies and data.

See Importance of Funding Replication Studies: NIH on Credibility of Basic Biomedical Studies

Therefore as DARPA attempts to recreate the Ras pathway from published literature and suggest new pathways/interactions, it will be necessary to experimentally validate certain points (protein interactions or modification events, signaling events) in order to validate their computer model.

Text-Mining and Curation Strategies

The Big Mechanism Project is starting very small; this reflects some of the challenges in scale of this project. Researchers were only given six paragraph long passages and a rudimentary model of the Ras pathway in cancer and then asked to automate a text mining strategy to extract as much useful information. Unfortunately this strategy could be fraught with issues frequently occurred in the biocuration community namely:

Manual or automated curation of scientific literature?

Biocurators, the scientists who painstakingly sort through the voluminous scientific journal to extract and then organize relevant data into accessible databases, have debated whether manual, automated, or a combination of both curation methods [2] achieves the highest accuracy for extracting the information needed to enter in a database. Abigail Cabunoc, a lead developer for Ontario Institute for Cancer Research’s WormBase (a database of nematode genetics and biology) and Lead Developer at Mozilla Science Lab, noted, on her blog, on the lively debate on biocuration methodology at the Seventh International Biocuration Conference (#ISB2014) that the massive amounts of information will require a Herculaneum effort regardless of the methodology.

Although I will have a future post on the advantages/disadvantages and tools/methodologies of manual vs. automated curation, there is a great article on researchinformation.infoExtracting More Information from Scientific Literature” and also see “The Methodology of Curation for Scientific Research Findings” and “Power of Analogy: Curation in Music, Music Critique as a Curation and Curation of Medical Research Findings – A Comparison” for manual curation methodologies and A MOD(ern) perspective on literature curation for a nice workflow paper on the International Society for Biocuration site.

The Big Mechanism team decided on a full automated approach to text-mine their limited literature set for relevant information however was able to extract only 40% of relevant information from these six paragraphs to the given model. Although the investigators were happy with this percentage most biocurators, whether using a manual or automated method to extract information, would consider 40% a low success rate. Biocurators, regardless of method, have reported ability to extract 70-90% of relevant information from the whole literature (for example for Comparative Toxicogenomics Database)[3-5].

Incomplete Knowledge Base

In an earlier posting (actually was a press release for our first e-book) I had discussed the problem with the “data deluge” we are experiencing in scientific literature as well as the plethora of ‘omics experimental data which needs to be curated.

Tackling the problem of scientific and medical information overload

pubmedpapersoveryears

Figure. The number of papers listed in PubMed (disregarding reviews) during ten year periods have steadily increased from 1970.

Analyzing and sharing the vast amounts of scientific knowledge has never been so crucial to innovation in the medical field. The publication rate has steadily increased from the 70’s, with a 50% increase in the number of original research articles published from the 1990’s to the previous decade. This massive amount of biomedical and scientific information has presented the unique problem of an information overload, and the critical need for methodology and expertise to organize, curate, and disseminate this diverse information for scientists and clinicians. Dr. Larry Bernstein, President of Triplex Consulting and previously chief of pathology at New York’s Methodist Hospital, concurs that “the academic pressures to publish, and the breakdown of knowledge into “silos”, has contributed to this knowledge explosion and although the literature is now online and edited, much of this information is out of reach to the very brightest clinicians.”

Traditionally, organization of biomedical information has been the realm of the literature review, but most reviews are performed years after discoveries are made and, given the rapid pace of new discoveries, this is appearing to be an outdated model. In addition, most medical searches are dependent on keywords, hence adding more complexity to the investigator in finding the material they require. Third, medical researchers and professionals are recognizing the need to converse with each other, in real-time, on the impact new discoveries may have on their research and clinical practice.

These issues require a people-based strategy, having expertise in a diverse and cross-integrative number of medical topics to provide the in-depth understanding of the current research and challenges in each field as well as providing a more conceptual-based search platform. To address this need, human intermediaries, known as scientific curators, are needed to narrow down the information and provide critical context and analysis of medical and scientific information in an interactive manner powered by web 2.0 with curators referred to as the “researcher 2.0”. This curation offers better organization and visibility to the critical information useful for the next innovations in academic, clinical, and industrial research by providing these hybrid networks.

Yaneer Bar-Yam of the New England Complex Systems Institute was not confident that using details from past knowledge could produce adequate roadmaps for future experimentation and noted for the article, “ “The expectation that the accumulation of details will tell us what we want to know is not well justified.”

In a recent post I had curated findings from four lung cancer omics studies and presented some graphic on bioinformatic analysis of the novel genetic mutations resulting from these studies (see link below)

Multiple Lung Cancer Genomic Projects Suggest New Targets, Research Directions for

Non-Small Cell Lung Cancer

which showed, that while multiple genetic mutations and related pathway ontologies were well documented in the lung cancer literature there existed many significant genetic mutations and pathways identified in the genomic studies but little literature attributed to these lung cancer-relevant mutations.

KEGGinliteroanalysislungcancer

  This ‘literomics’ analysis reveals a large gap between our knowledge base and the data resulting from large translational ‘omic’ studies.

Different Literature Analyses Approach Yeilding

A ‘literomics’ approach focuses on what we don NOT know about genes, proteins, and their associated pathways while a text-mining machine learning algorithm focuses on building a knowledge base to determine the next line of research or what needs to be measured. Using each approach can give us different perspectives on ‘omics data.

Deriving Casual Inference

Ras is one of the best studied and characterized oncogenes and the mechanisms behind Ras-driven oncogenenis is highly understood.   This, according to computational biologist Larry Hunt of Smart Information Flow Technologies makes Ras a great starting point for the Big Mechanism project. As he states,” Molecular biology is a good place to try (developing a machine learning algorithm) because it’s an area in which common sense plays a minor role”.

Even though some may think the project wouldn’t be able to tackle on other mechanisms which involve epigenetic factors UCLA’s expert in causality Judea Pearl, Ph.D. (head of UCLA Cognitive Systems Lab) feels it is possible for machine learning to bridge this gap. As summarized from his lecture at Microsoft:

“The development of graphical models and the logic of counterfactuals have had a marked effect on the way scientists treat problems involving cause-effect relationships. Practical problems requiring causal information, which long were regarded as either metaphysical or unmanageable can now be solved using elementary mathematics. Moreover, problems that were thought to be purely statistical, are beginning to benefit from analyzing their causal roots.”

According to him first

1) articulate assumptions

2) define research question in counter-inference terms

Then it is possible to design an inference system using calculus that tells the investigator what they need to measure.

To watch a video of Dr. Judea Pearl’s April 2013 lecture at Microsoft Research Machine Learning Summit 2013 (“The Mathematics of Causal Inference: with Reflections on Machine Learning”), click here.

The key for the Big Mechansism Project may me be in correcting for the variables among studies, in essence building a models system which may not rely on fully controlled conditions. Dr. Peter Spirtes from Carnegie Mellon University in Pittsburgh, PA is developing a project called the TETRAD project with two goals: 1) to specify and prove under what conditions it is possible to reliably infer causal relationships from background knowledge and statistical data not obtained under fully controlled conditions 2) develop, analyze, implement, test and apply practical, provably correct computer programs for inferring causal structure under conditions where this is possible.

In summary such projects and algorithms will provide investigators the what, and possibly the how should be measured.

So for now it seems we are still needed.

References

  1. You J: Artificial intelligence. DARPA sets out to automate research. Science 2015, 347(6221):465.
  2. Biocuration 2014: Battle of the New Curation Methods [http://blog.abigailcabunoc.com/biocuration-2014-battle-of-the-new-curation-methods]
  3. Davis AP, Johnson RJ, Lennon-Hopkins K, Sciaky D, Rosenstein MC, Wiegers TC, Mattingly CJ: Targeted journal curation as a method to improve data currency at the Comparative Toxicogenomics Database. Database : the journal of biological databases and curation 2012, 2012:bas051.
  4. Wu CH, Arighi CN, Cohen KB, Hirschman L, Krallinger M, Lu Z, Mattingly C, Valencia A, Wiegers TC, John Wilbur W: BioCreative-2012 virtual issue. Database : the journal of biological databases and curation 2012, 2012:bas049.
  5. Wiegers TC, Davis AP, Mattingly CJ: Collaborative biocuration–text-mining development task for document prioritization for curation. Database : the journal of biological databases and curation 2012, 2012:bas037.

Other posts on this site on include: Artificial Intelligence, Curation Methodology, Philosophy of Science

Inevitability of Curation: Scientific Publishing moves to embrace Open Data, Libraries and Researchers are trying to keep up

A Brief Curation of Proteomics, Metabolomics, and Metabolism

The Methodology of Curation for Scientific Research Findings

Scientific Curation Fostering Expert Networks and Open Innovation: Lessons from Clive Thompson and others

The growing importance of content curation

Data Curation is for Big Data what Data Integration is for Small Data

Cardiovascular Original Research: Cases in Methodology Design for Content Co-Curation The Art of Scientific & Medical Curation

Exploring the Impact of Content Curation on Business Goals in 2013

Power of Analogy: Curation in Music, Music Critique as a Curation and Curation of Medical Research Findings – A Comparison

conceived: NEW Definition for Co-Curation in Medical Research

Reconstructed Science Communication for Open Access Online Scientific Curation

Search Results for ‘artificial intelligence’

 The Simple Pictures Artificial Intelligence Still Can’t Recognize

Data Scientist on a Quest to Turn Computers Into Doctors

Vinod Khosla: “20% doctor included”: speculations & musings of a technology optimist or “Technology will replace 80% of what doctors do”

Where has reason gone?

Read Full Post »


10:15AM 11/13/2014 – 10th Annual Personalized Medicine Conference at the Harvard Medical School, Boston

REAL TIME Coverage of this Conference by Dr. Aviva Lev-Ari, PhD, RN – Director and Founder of LEADERS in PHARMACEUTICAL BUSINESS INTELLIGENCE, Boston http://pharmaceuticalintelligence.com

10:15 a.m. Panel Discussion — IT/Big Data

IT/Big Data

The human genome is composed of 6 billion nucleotides (using the genetic alphabet of T, C, G and A). As the cost of sequencing the human genome is decreasing at a rapid rate, it might not be too far into the future that every human being will be sequenced at least once in their lifetime. The sequence data together with the clinical data are going to be used more and more frequently to make clinical decisions. If that is true, we need to have secure methods of storing, retrieving and analyzing all of these data.  Some people argue that this is a tsunami of data that we are not ready to handle. The panel will discuss the types and volumes of data that are being generated and how to deal with it.

IT/Big Data

   Moderator:

Amy Abernethy, M.D.
Chief Medical Officer, Flatiron

Role of Informatics, SW and HW in PM. Big data and Healthcare

How Lab and Clinics can be connected. Oncologist, Hematologist use labs in clinical setting, Role of IT and Technology in the environment of the Clinicians

Compare Stanford Medical Center and Harvard Medical Center and Duke Medical Center — THREE different models in Healthcare data management

Create novel solutions: Capture the voice of the patient for integration of component: Volume, Veracity, Value

Decisions need to be made in short time frame, documentation added after the fact

No system can be perfect in all aspects

Understanding clinical record for conversion into data bases – keeping quality of data collected

Key Topics

Panelists:

Stephen Eck, M.D., Ph.D.
Vice President, Global Head of Oncology Medical Sciences,
Astellas, Inc.

Small data expert, great advantage to small data. Populations data allows for longitudinal studies,

Big Mac Big Data – Big is Good — Is data been collected suitable for what is it used, is it robust, limitations, of what the data analysis mean

Data analysis in Chemical Libraries – now annotated

Diversity data in NOTED by MDs, nuances are very great, Using Medical Records for building Billing Systems

Cases when the data needed is not known or not available — use data that is available — limits the scope of what Valuable solution can be arrived at

In Clinical Trial: needs of researchers, billing clinicians — in one system

Translation of data on disease to data object

Signal to Noise Problem — Thus Big data provided validity and power

 

J. Michael Gaziano, M.D., M.P.H., F.R.C.P.
Scientific Director, Massachusetts Veterans Epidemiology Research
and Information Center (MAVERIC), VA Boston Healthcare System;
Chief Division of Aging, Brigham and Women’s Hospital;
Professor of Medicine, Harvard Medical School

at BWH since 1987 at 75% – push forward the Genomics Agenda, VA system 25% – VA is horizontally data integrated embed research and knowledge — baseline questionnaire 200,000 phenotypes – questionnaire and Genomics data to be integrated, Data hierarchical way to be curated, Simple phenotypes, validate phenotypes, Probability to have susceptibility for actual disease, Genomics Medicine will benefit Clinicians

Data must be of visible quality, collect data via Telephone VA – on Med compliance study, on Ability to tolerate medication

–>>Annotation assisted in building a tool for Neurologist on Alzheimer’s Disease (AlzSWAN knowledge base) (see also Genotator , a Disease-Agnostic Tool for Annotation)

–>>Curation of data is very different than statistical analysis of Clinical Trial Data

–>>Integration of data at VA and at BWH are tow different models of SUCCESSFUL data integration models, accessing the data is also using a different model

–>>Data extraction from the Big data — an issue

–>>Where the answers are in the data, build algorithms that will pick up causes of disease: Alzheimer’s – very difficult to do

–>>system around all stakeholders: investment in connectivity, moving data, individual silo, HR, FIN, Clinical Research

–>>Biobank data and data quality

 

Krishna Yeshwant, M.D.
General Partner, Google Ventures;
Physician, Brigham and Women’s Hospital

Computer Scientist and Medical Student. Were the technology is going?

Messy situation, interaction IT and HC, Boston and Silicon Valley are focusing on Consumers, Google Engineers interested in developing Medical and HC applications — HUGE interest. Application or Wearable – new companies in this space, from Computer Science world to Medicine – Enterprise level – EMR or Consumer level – Wearable — both areas are very active in Silicon Valley

IT stuff in the hospital HARDER that IT in any other environment, great progress in last 5 years, security of data, privacy. Sequencing data cost of big data management with highest security

Constrained data vs non-constrained data

Opportunities for Government cooperation as a Lead needed for standardization of data objects

 

Questions from the Podium:

  • Where is the Truth: do we have all the tools or we don’t for Genomic data usage
  • Question on Interoperability
  • Big Valuable data — vs Big data
  • quality, uniform, large cohort, comprehensive Cancer Centers
  • Volume of data can compensate quality of data
  • Data from Imaging – Quality and interpretation – THREE radiologist will read cancer screening

 

 

 

– See more at: http://personalizedmedicine.partners.org/Education/Personalized-Medicine-Conference/Program.aspx#sthash.qGbGZXXf.dpuf

 

@HarvardPMConf

#PMConf

@SachsAssociates

@Duke_Medicine

@AstellasUS

@GoogleVentures

@harvardmed

@BrighamWomens

@kyeshwant

Read Full Post »


Scientific Curation Fostering Expert Networks and Open Innovation: Lessons from Clive Thompson

Life-cycle of Science 2

 

 

 

 

 

 

 

 

 

 

 

Curators and Writer: Stephen J. Williams, Ph.D. with input from Curators Larry H. Bernstein, MD, FCAP, Dr. Justin D. Pearlman, MD, PhD, FACC and Dr. Aviva Lev-Ari, PhD, RN

(this discussion is in a three part series including:

Using Scientific Content Curation as a Method for Validation and Biocuration

Using Scientific Content Curation as a Method for Open Innovation)

 

Every month I get my Wired Magazine (yes in hard print, I still like to turn pages manually plus I don’t mind if I get grease or wing sauce on my magazine rather than on my e-reader) but I always love reading articles written by Clive Thompson. He has a certain flair for understanding the techno world we live in and the human/technology interaction, writing about interesting ways in which we almost inadvertently integrate new technologies into our day-to-day living, generating new entrepreneurship, new value.   He also writes extensively about tech and entrepreneurship.

October 2013 Wired article by Clive Thompson, entitled “How Successful Networks Nurture Good Ideas: Thinking Out Loud”, describes how the voluminous writings, postings, tweets, and sharing on social media is fostering connections between people and ideas which, previously, had not existed. The article was generated from Clive Thompson’s book Smarter Than you Think: How Technology is Changing Our Minds for the Better.Tom Peters also commented about the article in his blog (see here).

Clive gives a wonderful example of Ory Okolloh, a young Kenyan-born law student who, after becoming frustrated with the lack of coverage of problems back home, started a blog about Kenyan politics. Her blog not only got interest from movie producers who were documenting female bloggers but also gained the interest of fellow Kenyans who, during the upheaval after the 2007 Kenyan elections, helped Ory to develop a Google map for reporting of violence (http://www.ushahidi.com/, which eventually became a global organization using open-source technology to affect crises-management. There are a multitude of examples how networks and the conversations within these circles are fostering new ideas. As Clive states in the article:

 

Our ideas are PRODUCTS OF OUR ENVIRONMENT.

They are influenced by the conversations around us.

However the article got me thinking of how Science 2.0 and the internet is changing how scientists contribute, share, and make connections to produce new and transformative ideas.

But HOW MUCH Knowledge is OUT THERE?

 

Clive’s article listed some amazing facts about the mountains of posts, tweets, words etc. out on the internet EVERY DAY, all of which exemplifies the problem:

  • 154.6 billion EMAILS per DAY
  • 400 million TWEETS per DAY
  • 1 million BLOG POSTS (including this one) per DAY
  • 2 million COMMENTS on WordPress per DAY
  • 16 million WORDS on Facebook per DAY
  • TOTAL 52 TRILLION WORDS per DAY

As he estimates this would be 520 million books per DAY (book with average 100,000 words).

A LOT of INFO. But as he suggests it is not the volume but how we create and share this information which is critical as the science fiction writer Theodore Sturgeon noted “Ninety percent of everything is crap” AKA Sturgeon’s Law.

 

Internet live stats show how congested the internet is each day (http://www.internetlivestats.com/). Needless to say Clive’s numbers are a bit off. As of the writing of this article:

 

  • 2.9 billion internet users
  • 981 million websites (only 25,000 hacked today)
  • 128 billion emails
  • 385 million Tweets
  • > 2.7 million BLOG posts today (including this one)

 

The Good, The Bad, and the Ugly of the Scientific Internet (The Wild West?)

 

So how many science blogs are out there? Well back in 2008 “grrlscientistasked this question and turned up a total of 19,881 blogs however most were “pseudoscience” blogs, not written by Ph.D or MD level scientists. A deeper search on Technorati using the search term “scientist PhD” turned up about 2,000 written by trained scientists.

So granted, there is a lot of

goodbadugly

 

              ….. when it comes to scientific information on the internet!

 

 

 

 

 

I had recently re-posted, on this site, a great example of how bad science and medicine can get propagated throughout the internet:

https://pharmaceuticalintelligence.com/2014/06/17/the-gonzalez-protocol-worse-than-useless-for-pancreatic-cancer/

 

and in a Nature Report:Stem cells: Taking a stand against pseudoscience

http://www.nature.com/news/stem-cells-taking-a-stand-against-pseudoscience-1.15408

Drs.Elena Cattaneo and Gilberto Corbellini document their long, hard fight against false and invalidated medical claims made by some “clinicians” about the utility and medical benefits of certain stem-cell therapies, sacrificing their time to debunk medical pseudoscience.

 

Using Curation and Science 2.0 to build Trusted, Expert Networks of Scientists and Clinicians

 

Establishing networks of trusted colleagues has been a cornerstone of the scientific discourse for centuries. For example, in the mid-1640s, the Royal Society began as:

 

“a meeting of natural philosophers to discuss promoting knowledge of the

natural world through observation and experiment”, i.e. science.

The Society met weekly to witness experiments and discuss what we

would now call scientific topics. The first Curator of Experiments

was Robert Hooke.”

 

from The History of the Royal Society

 

Royal Society CoatofArms

 

 

 

 

 

 

The Royal Society of London for Improving Natural Knowledge.

(photo credit: Royal Society)

(Although one wonders why they met “in-cognito”)

Indeed as discussed in “Science 2.0/Brainstorming” by the originators of OpenWetWare, an open-source science-notebook software designed to foster open-innovation, the new search and aggregation tools are making it easier to find, contribute, and share information to interested individuals. This paradigm is the basis for the shift from Science 1.0 to Science 2.0. Science 2.0 is attempting to remedy current drawbacks which are hindering rapid and open scientific collaboration and discourse including:

  • Slow time frame of current publishing methods: reviews can take years to fashion leading to outdated material
  • Level of information dissemination is currently one dimensional: peer-review, highly polished work, conferences
  • Current publishing does not encourage open feedback and review
  • Published articles edited for print do not take advantage of new web-based features including tagging, search-engine features, interactive multimedia, no hyperlinks
  • Published data and methodology incomplete
  • Published data not available in formats which can be readably accessible across platforms: gene lists are now mandated to be supplied as files however other data does not have to be supplied in file format

(put in here a brief blurb of summary of problems and why curation could help)

 

Curation in the Sciences: View from Scientific Content Curators Larry H. Bernstein, MD, FCAP, Dr. Justin D. Pearlman, MD, PhD, FACC and Dr. Aviva Lev-Ari, PhD, RN

Curation is an active filtering of the web’s  and peer reviewed literature found by such means – immense amount of relevant and irrelevant content. As a result content may be disruptive. However, in doing good curation, one does more than simply assign value by presentation of creative work in any category. Great curators comment and share experience across content, authors and themes. Great curators may see patterns others don’t, or may challenge or debate complex and apparently conflicting points of view.  Answers to specifically focused questions comes from the hard work of many in laboratory settings creatively establishing answers to definitive questions, each a part of the larger knowledge-base of reference. There are those rare “Einstein’s” who imagine a whole universe, unlike the three blind men of the Sufi tale.  One held the tail, the other the trunk, the other the ear, and they all said this is an elephant!
In my reading, I learn that the optimal ratio of curation to creation may be as high as 90% curation to 10% creation. Creating content is expensive. Curation, by comparison, is much less expensive.

– Larry H. Bernstein, MD, FCAP

Curation is Uniquely Distinguished by the Historical Exploratory Ties that Bind –Larry H. Bernstein, MD, FCAP

The explosion of information by numerous media, hardcopy and electronic, written and video, has created difficulties tracking topics and tying together relevant but separated discoveries, ideas, and potential applications. Some methods to help assimilate diverse sources of knowledge include a content expert preparing a textbook summary, a panel of experts leading a discussion or think tank, and conventions moderating presentations by researchers. Each of those methods has value and an audience, but they also have limitations, particularly with respect to timeliness and pushing the edge. In the electronic data age, there is a need for further innovation, to make synthesis, stimulating associations, synergy and contrasts available to audiences in a more timely and less formal manner. Hence the birth of curation. Key components of curation include expert identification of data, ideas and innovations of interest, expert interpretation of the original research results, integration with context, digesting, highlighting, correlating and presenting in novel light.

Justin D Pearlman, MD, PhD, FACC from The Voice of Content Consultant on The  Methodology of Curation in Cardiovascular Original Research: Cases in Methodology Design for Content Co-Curation The Art of Scientific & Medical Curation

 

In Power of Analogy: Curation in Music, Music Critique as a Curation and Curation of Medical Research Findings – A Comparison, Drs. Larry Bernstein and Aviva Lev-Ari likens the medical and scientific curation process to curation of musical works into a thematic program:

 

Work of Original Music Curation and Performance:

 

Music Review and Critique as a Curation

Work of Original Expression what is the methodology of Curation in the context of Medical Research Findings Exposition of Synthesis and Interpretation of the significance of the results to Clinical Care

… leading to new, curated, and collaborative works by networks of experts to generate (in this case) ebooks on most significant trends and interpretations of scientific knowledge as relates to medical practice.

 

In Summary: How Scientific Content Curation Can Help

 

Given the aforementioned problems of:

        I.            the complex and rapid deluge of scientific information

      II.            the need for a collaborative, open environment to produce transformative innovation

    III.            need for alternative ways to disseminate scientific findings

CURATION MAY OFFER SOLUTIONS

        I.            Curation exists beyond the review: curation decreases time for assessment of current trends adding multiple insights, analyses WITH an underlying METHODOLOGY (discussed below) while NOT acting as mere reiteration, regurgitation

 

      II.            Curation providing insights from WHOLE scientific community on multiple WEB 2.0 platforms

 

    III.            Curation makes use of new computational and Web-based tools to provide interoperability of data, reporting of findings (shown in Examples below)

 

Therefore a discussion is given on methodologies, definitions of best practices, and tools developed to assist the content curation community in this endeavor.

Methodology in Scientific Content Curation as Envisioned by Aviva lev-Ari, PhD, RN

 

At Leaders in Pharmaceutical Business Intelligence, site owner and chief editor Aviva lev-Ari, PhD, RN has been developing a strategy “for the facilitation of Global access to Biomedical knowledge rather than the access to sheer search results on Scientific subject matters in the Life Sciences and Medicine”. According to Aviva, “for the methodology to attain this complex goal it is to be dealing with popularization of ORIGINAL Scientific Research via Content Curation of Scientific Research Results by Experts, Authors, Writers using the critical thinking process of expert interpretation of the original research results.” The following post:

Cardiovascular Original Research: Cases in Methodology Design for Content Curation and Co-Curation

 

https://pharmaceuticalintelligence.com/2013/07/29/cardiovascular-original-research-cases-in-methodology-design-for-content-curation-and-co-curation/

demonstrate two examples how content co-curation attempts to achieve this aim and develop networks of scientist and clinician curators to aid in the active discussion of scientific and medical findings, and use scientific content curation as a means for critique offering a “new architecture for knowledge”. Indeed, popular search engines such as Google, Yahoo, or even scientific search engines such as NCBI’s PubMed and the OVID search engine rely on keywords and Boolean algorithms …

which has created a need for more context-driven scientific search and discourse.

In Science and Curation: the New Practice of Web 2.0, Célya Gruson-Daniel (@HackYourPhd) states:

To address this need, human intermediaries, empowered by the participatory wave of web 2.0, naturally started narrowing down the information and providing an angle of analysis and some context. They are bloggers, regular Internet users or community managers – a new type of profession dedicated to the web 2.0. A new use of the web has emerged, through which the information, once produced, is collectively spread and filtered by Internet users who create hierarchies of information.

.. where Célya considers curation an essential practice to manage open science and this new style of research.

As mentioned above in her article, Dr. Lev-Ari represents two examples of how content curation expanded thought, discussion, and eventually new ideas.

  1. Curator edifies content through analytic process = NEW form of writing and organizations leading to new interconnections of ideas = NEW INSIGHTS

i)        Evidence: curation methodology leading to new insights for biomarkers

 

  1. Same as #1 but multiple players (experts) each bringing unique insights, perspectives, skills yielding new research = NEW LINE of CRITICAL THINKING

ii)      Evidence: co-curation methodology among cardiovascular experts leading to cardiovascular series ebooks

Life-cycle of Science 2

The Life Cycle of Science 2.0. Due to Web 2.0, new paradigms of scientific collaboration are rapidly emerging.  Originally, scientific discovery were performed by individual laboratories or “scientific silos” where the main method of communication was peer-reviewed publication, meeting presentation, and ultimately news outlets and multimedia. In this digital era, data was organized for literature search and biocurated databases. In an era of social media, Web 2.0, a group of scientifically and medically trained “curators” organize the piles of data of digitally generated data and fit data into an organizational structure which can be shared, communicated, and analyzed in a holistic approach, launching new ideas due to changes in organization structure of data and data analytics.

 

The result, in this case, is a collaborative written work above the scope of the review. Currently review articles are written by experts in the field and summarize the state of a research are. However, using collaborative, trusted networks of experts, the result is a real-time synopsis and analysis of the field with the goal in mind to

INCREASE THE SCIENTIFIC CURRENCY.

For detailed description of methodology please see Cardiovascular Original Research: Cases in Methodology Design for Content Co-Curation The Art of Scientific & Medical Curation

 

In her paper, Curating e-Science Data, Maureen Pennock, from The British Library, emphasized the importance of using a diligent, validated, and reproducible, and cost-effective methodology for curation by e-science communities over the ‘Grid:

“The digital data deluge will have profound repercussions for the infrastructure of research and beyond. Data from a wide variety of new and existing sources will need to be annotated with metadata, then archived and curated so that both the data and the programmes used to transform the data can be reproduced for use in the future. The data represent a new foundation for new research, science, knowledge and discovery”

— JISC Senior Management Briefing Paper, The Data Deluge (2004)

 

As she states proper data and content curation is important for:

  • Post-analysis
  • Data and research result reuse for new research
  • Validation
  • Preservation of data in newer formats to prolong life-cycle of research results

However she laments the lack of

  • Funding for such efforts
  • Training
  • Organizational support
  • Monitoring
  • Established procedures

 

Tatiana Aders wrote a nice article based on an interview with Microsoft’s Robert Scoble, where he emphasized the need for curation in a world where “Twitter is the replacement of the Associated Press Wire Machine” and new technologic platforms are knocking out old platforms at a rapid pace. In addition he notes that curation is also a social art form where primary concerns are to understand an audience and a niche.

Indeed, part of the reason the need for curation is unmet, as writes Mark Carrigan, is the lack of appreciation by academics of the utility of tools such as Pinterest, Storify, and Pearl Trees to effectively communicate and build collaborative networks.

And teacher Nancy White, in her article Understanding Content Curation on her blog Innovations in Education, shows examples of how curation in an educational tool for students and teachers by demonstrating students need to CONTEXTUALIZE what the collect to add enhanced value, using higher mental processes such as:

  • Knowledge
  • Comprehension
  • Application
  • Analysis
  • Synthesis
  • Evaluation

curating-tableA GREAT table about the differences between Collecting and Curating by Nancy White at http://d20innovation.d20blogs.org/2012/07/07/understanding-content-curation/

 

 

 

 

 

 

 

 

 

 

 

University of Massachusetts Medical School has aggregated some useful curation tools at http://esciencelibrary.umassmed.edu/data_curation

Although many tools are related to biocuration and building databases but the common idea is curating data with indexing, analyses, and contextual value to provide for an audience to generate NETWORKS OF NEW IDEAS.

See here for a curation of how networks fosters knowledge, by Erika Harrison on ScoopIt

(http://www.scoop.it/t/mobilizing-knowledge-through-complex-networks)

 

“Nowadays, any organization should employ network scientists/analysts who are able to map and analyze complex systems that are of importance to the organization (e.g. the organization itself, its activities, a country’s economic activities, transportation networks, research networks).”

Andrea Carafa insight from World Economic Forum New Champions 2012 “Power of Networks

 

Creating Content Curation Communities: Breaking Down the Silos!

 

An article by Dr. Dana Rotman “Facilitating Scientific Collaborations Through Content Curation Communities” highlights how scientific information resources, traditionally created and maintained by paid professionals, are being crowdsourced to professionals and nonprofessionals in which she termed “content curation communities”, consisting of professionals and nonprofessional volunteers who create, curate, and maintain the various scientific database tools we use such as Encyclopedia of Life, ChemSpider (for Slideshare see here), biowikipedia etc. Although very useful and openly available, these projects create their own challenges such as

  • information integration (various types of data and formats)
  • social integration (marginalized by scientific communities, no funding, no recognition)

The authors set forth some ways to overcome these challenges of the content curation community including:

  1. standardization in practices
  2. visualization to document contributions
  3. emphasizing role of information professionals in content curation communities
  4. maintaining quality control to increase respectability
  5. recognizing participation to professional communities
  6. proposing funding/national meeting – Data Intensive Collaboration in Science and Engineering Workshop

A few great presentations and papers from the 2012 DICOSE meeting are found below

Judith M. Brown, Robert Biddle, Stevenson Gossage, Jeff Wilson & Steven Greenspan. Collaboratively Analyzing Large Data Sets using Multitouch Surfaces. (PDF) NotesForBrown

 

Bill Howe, Cecilia Aragon, David Beck, Jeffrey P. Gardner, Ed Lazowska, Tanya McEwen. Supporting Data-Intensive Collaboration via Campus eScience Centers. (PDF) NotesForHowe

 

Kerk F. Kee & Larry D. Browning. Challenges of Scientist-Developers and Adopters of Existing Cyberinfrastructure Tools for Data-Intensive Collaboration, Computational Simulation, and Interdisciplinary Projects in Early e-Science in the U.S.. (PDF) NotesForKee

 

Ben Li. The mirages of big data. (PDF) NotesForLiReflectionsByBen

 

Betsy Rolland & Charlotte P. Lee. Post-Doctoral Researchers’ Use of Preexisting Data in Cancer Epidemiology Research. (PDF) NoteForRolland

 

Dana Rotman, Jennifer Preece, Derek Hansen & Kezia Procita. Facilitating scientific collaboration through content curation communities. (PDF) NotesForRotman

 

Nicholas M. Weber & Karen S. Baker. System Slack in Cyberinfrastructure Development: Mind the Gaps. (PDF) NotesForWeber

Indeed, the movement of Science 2.0 from Science 1.0 had originated because these “silos” had frustrated many scientists, resulting in changes in the area of publishing (Open Access) but also communication of protocols (online protocol sites and notebooks like OpenWetWare and BioProtocols Online) and data and material registries (CGAP and tumor banks). Some examples are given below.

Open Science Case Studies in Curation

1. Open Science Project from Digital Curation Center

This project looked at what motivates researchers to work in an open manner with regard to their data, results and protocols, and whether advantages are delivered by working in this way.

The case studies consider the benefits and barriers to using ‘open science’ methods, and were carried out between November 2009 and April 2010 and published in the report Open to All? Case studies of openness in research. The Appendices to the main report (pdf) include a literature review, a framework for characterizing openness, a list of examples, and the interview schedule and topics. Some of the case study participants kindly agreed to us publishing the transcripts. This zip archive contains transcripts of interviews with researchers in astronomy, bioinformatics, chemistry, and language technology.

 

see: Pennock, M. (2006). “Curating e-Science Data”. DCC Briefing Papers: Introduction to Curation. Edinburgh: Digital Curation Centre. Handle: 1842/3330. Available online: http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation– See more at: http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/curating-e-science-data#sthash.RdkPNi9F.dpuf

 

2.      cBIO -cBio’s biological data curation group developed and operates using a methodology called CIMS, the Curation Information Management System. CIMS is a comprehensive curation and quality control process that efficiently extracts information from publications.

 

3. NIH Topic Maps – This website provides a database and web-based interface for searching and discovering the types of research awarded by the NIH. The database uses automated, computer generated categories from a statistical analysis known as topic modeling.

 

4. SciKnowMine (USC)- We propose to create a framework to support biocuration called SciKnowMine (after ‘Scientific Knowledge Mine’), cyberinfrastructure that supports biocuration through the automated mining of text, images, and other amenable media at the scale of the entire literature.

 

  1. OpenWetWareOpenWetWare is an effort to promote the sharing of information, know-how, and wisdom among researchers and groups who are working in biology & biological engineering. Learn more about us.   If you would like edit access, would be interested in helping out, or want your lab website hosted on OpenWetWare, pleasejoin us. OpenWetWare is managed by the BioBricks Foundation. They also have a wiki about Science 2.0.

6. LabTrove: a lightweight, web based, laboratory “blog” as a route towards a marked up record of work in a bioscience research laboratory. Authors in PLOS One article, from University of Southampton, report the development of an open, scientific lab notebook using a blogging strategy to share information.

7. OpenScience ProjectThe OpenScience project is dedicated to writing and releasing free and Open Source scientific software. We are a group of scientists, mathematicians and engineers who want to encourage a collaborative environment in which science can be pursued by anyone who is inspired to discover something new about the natural world.

8. Open Science Grid is a multi-disciplinary partnership to federate local, regional, community and national cyberinfrastructures to meet the needs of research and academic communities at all scales.

 

9. Some ongoing biomedical knowledge (curation) projects at ISI

IICurate
This project is concerned with developing a curation and documentation system for information integration in collaboration with the II Group at ISI as part of the BIRN.

BioScholar
It’s primary purpose is to provide software for experimental biomedical scientists that would permit a single scientific worker (at the level of a graduate student or postdoctoral worker) to design, construct and manage a shared knowledge repository for a research group derived on a local store of PDF files. This project is funded by NIGMS from 2008-2012 ( RO1-GM083871).

10. Tools useful for scientific content curation

 

Research Analytic and Curation Tools from University of Queensland

 

Thomson Reuters information curation services for pharma industry

 

Microblogs as a way to communicate information about HPV infection among clinicians and patients; use of Chinese microblog SinaWeibo as a communication tool

 

VIVO for scientific communities– In order to connect this information about research activities across institutions and make it available to others, taking into account smaller players in the research landscape and addressing their need for specific information (for example, by proving non-conventional research objects), the open source software VIVO that provides research information as linked open data (LOD) is used in many countries.  So-called VIVO harvesters collect research information that is freely available on the web, and convert the data collected in conformity with LOD standards. The VIVO ontology builds on prevalent LOD namespaces and, depending on the needs of the specialist community concerned, can be expanded.

 

 

11. Examples of scientific curation in different areas of Science/Pharma/Biotech/Education

 

From Science 2.0 to Pharma 3.0 Q&A with Hervé Basset

http://digimind.com/blog/experts/pharma-3-0/

Hervé Basset, specialist librarian in the pharmaceutical industry and owner of the blog “Science Intelligence“, to talk about the inspiration behind his recent book  entitled “From Science 2.0 to Pharma 3.0″, published by Chandos Publishing and available on Amazon and how health care companies need a social media strategy to communicate and convince the health-care consumer, not just the practicioner.

 

Thomson Reuters and NuMedii Launch Ground-Breaking Initiative to Identify Drugs for Repurposing. Companies leverage content, Big Data analytics and expertise to improve success of drug discovery

 

Content Curation as a Context for Teaching and Learning in Science

 

#OZeLIVE Feb2014

http://www.youtube.com/watch?v=Ty-ugUA4az0

Creative Commons license

 

DigCCur: A graduate level program initiated by University of North Carolina to instruct the future digital curators in science and other subjects

 

Syracuse University offering a program in eScience and digital curation

 

Curation Tips from TED talks and tech experts

Steven Rosenbaum from Curation Nation

http://www.youtube.com/watch?v=HpncJd1v1k4

 

Pawan Deshpande form Curata on how content curation communities evolve and what makes a good content curation:

http://www.youtube.com/watch?v=QENhIU9YZyA

 

How the Internet of Things is Promoting the Curation Effort

Update by Stephen J. Williams, PhD 3/01/19

Up till now, curation efforts like wikis (Wikipedia, Wikimedicine, Wormbase, GenBank, etc.) have been supported by a largely voluntary army of citizens, scientists, and data enthusiasts.  I am sure all have seen the requests for donations to help keep Wikipedia and its other related projects up and running.  One of the obscure sister projects of Wikipedia, Wikidata, wants to curate and represent all information in such a way in which both machines, computers, and humans can converse in.  About an army of 4 million have Wiki entries and maintain these databases.

Enter the Age of the Personal Digital Assistants (Hellooo Alexa!)

In a March 2019 WIRED article “Encyclopedia Automata: Where Alexa Gets Its Information”  senior WIRED writer Tom Simonite reports on the need for new types of data structure as well as how curated databases are so important for the new fields of AI as well as enabling personal digital assistants like Alexa or Google Assistant decipher meaning of the user.

As Mr. Simonite noted, many of our libraries of knowledge are encoded in an “ancient technology largely opaque to machines-prose.”   Search engines like Google do not have a problem with a question asked in prose as they just have to find relevant links to pages. Yet this is a problem for Google Assistant, for instance, as machines can’t quickly extract meaning from the internet’s mess of “predicates, complements, sentences, and paragraphs. It requires a guide.”

Enter Wikidata.  According to founder Denny Vrandecic,

Language depends on knowing a lot of common sense, which computers don’t have access to

A wikidata entry (of which there are about 60 million) codes every concept and item with a numeric code, the QID code number. These codes are integrated with tags (like tags you use on Twitter as handles or tags in WordPress used for Search Engine Optimization) so computers can identify patterns of recognition between these codes.

Now human entry into these databases are critical as we add new facts and in particular meaning to each of these items.  Else, machines have problems deciphering our meaning like Apple’s Siri, where they had complained of dumb algorithms to interpret requests.

The knowledge of future machines could be shaped by you and me, not just tech companies and PhDs.

But this effort needs money

Wikimedia’s executive director, Katherine Maher, had prodded and cajoled these megacorporations for tapping the free resources of Wiki’s.  In response, Amazon and Facebook had donated millions for the Wikimedia projects.  Google recently gave 3.1 million USD$ in donations.

 

Future postings on the relevance and application of scientific curation will include:

Using Scientific Content Curation as a Method for Validation and Biocuration

 

Using Scientific Content Curation as a Method for Open Innovation

 

Other posts on this site related to Content Curation and Methodology include:

The growing importance of content curation

Data Curation is for Big Data what Data Integration is for Small Data

6 Steps to More Effective Content Curation

Stem Cells and Cardiac Repair: Content Curation & Scientific Reporting

Cancer Research: Curations and Reporting

Cardiovascular Diseases and Pharmacological Therapy: Curations

Cardiovascular Original Research: Cases in Methodology Design for Content Co-Curation The Art of Scientific & Medical Curation

Exploring the Impact of Content Curation on Business Goals in 2013

Power of Analogy: Curation in Music, Music Critique as a Curation and Curation of Medical Research Findings – A Comparison

conceived: NEW Definition for Co-Curation in Medical Research

The Young Surgeon and The Retired Pathologist: On Science, Medicine and HealthCare Policy – The Best Writers Among the WRITERS

Reconstructed Science Communication for Open Access Online Scientific Curation

 

 

Read Full Post »