Advertisements
Feeds:
Posts
Comments

Posts Tagged ‘Open data’


 

Yay! Bloomberg View Seems to Be On the Side of the Lowly Scientist!

 

Reporter: Stephen J. Williams, Ph.D.

Justin Fox at BloombergView had just published an article near and dear to the hearts of all those #openaccess scientists and those of us @Pharma_BI and @MozillaScience who feel strong about #openscience #opendata and the movement to make scientific discourse freely accessible.

His article “Academic Publishing Can’t Remain Such a Great Business” discusses the history of academic publishing and how consolidation of smaller publishers into large scientific publishing houses (Bigger publishers bought smaller ones) has produced a monopoly like environment in which prices for journal subscriptions are rising. He also discusses how the open access movement is challenging this model and may oneday replace the big publishing houses.

A few tidbits from his article:

Publishers of academic journals have a great thing going. They generally don’t pay for the articles they publish, or for the primary editing and peer reviewing essential to preparing them for publication (they do fork over some money for copy editing). Most of this gratis labor is performed by employees of academic institutions. Those institutions, along with government agencies and foundations, also fund all the research that these journal articles are based upon.

Yet the journal publishers are able to get authors to sign over copyright to this content, and sell it in the form of subscriptions to university libraries. Most journals are now delivered in electronic form, which you think would cut the cost, but no, the price has been going up and up:

 

This isn’t just inflation at work: in 1994, journal subscriptions accounted for 51 percent of all library spending on information resources. In 2012 it was 69 percent.

Who exactly is getting that money? The largest academic publisher is Elsevier, which is also the biggest, most profitable division of RELX, the Anglo-Dutch company that was known until February as Reed Elsevier.

 

RELX reports results in British pounds; I converted to dollars in part because the biggest piece of the company’s revenue comes from the U.S. And yes, those are pretty great operating-profit margins: 33 percent in 2014, 39 percent in 2013. The next biggest academic publisher is Springer Nature, which is closely held (by German publisher Holtzbrinck and U.K. private-equity firm BC Partners) but reportedly has annual revenue of about $1.75 billion. Other biggies that are part of publicly traded companies include Wiley-Blackwell, a division of John Wiley & Sons; Wolters Kluwer Health, a division of Wolters Kluwer; and Taylor & Francis, a division of Informa.

And gives a brief history of academic publishing:

The history here is that most early scholarly journals were the work of nonprofit scientific societies. The goal was to disseminate research as widely as possible, not to make money — a key reason why nobody involved got paid. After World War II, the explosion in both the production of and demand for academic research outstripped the capabilities of the scientific societies, and commercial publishers stepped into the breach. At a time when journals had to be printed and shipped all over the world, this made perfect sense.

Once it became possible to effortlessly copy and disseminate digital files, though, the economics changed. For many content producers, digital copying is a threat to their livelihoods. As Peter Suber, the director of Harvard University’s Office for Scholarly Communication, puts it in his wonderful little book, “Open Access”:

And while NIH Tried To Force These Houses To Accept Open Access:

About a decade ago, the universities and funding agencies began fighting back. The National Institutes of Health in the U.S., the world’s biggest funder of medical research, began requiring in 2008 that all recipients of its grants submit electronic versions of their final peer-reviewed manuscripts when they are accepted for publication in journals, to be posted a year later on the NIH’s open-access PubMed depository. Publishers grumbled, but didn’t want to turn down the articles.

Big publishers are making $ by either charging as much as they can or focus on new customers and services

For the big publishers, meanwhile, the choice is between positioning themselves for the open-access future or maximizing current returns. In its most recent annual report, RELX leans toward the latter while nodding toward the former:

Over the past 15 years alternative payment models for the dissemination of research such as “author-pays” or “author’s funder-pays” have emerged. While it is expected that paid subscription will remain the primary distribution model, Elsevier has long invested in alternative business models to address the needs of customers and researchers.

Elsevier’s extra services can add news avenues of revenue

https://www.elsevier.com/social-sciences/business-and-management

https://www.elsevier.com/rd-solutions

but they may be seeing the light on OpenAccess (possibly due to online advocacy, an army of scientific curators and online scientific communities):

Elsevier’s Mendeley and Academia.edu – How We Distribute Scientific Research: A Case in Advocacy for Open Access Journals

SAME SCIENTIFIC IMPACT: Scientific Publishing – Open Journals vs. Subscription-based

e-Recognition via Friction-free Collaboration over the Internet: “Open Access to Curation of Scientific Research”

Indeed we recently put up an interesting authored paper “A Patient’s Perspective: On Open Heart Surgery from Diagnosis and Intervention to Recovery” (free of charge) letting the community of science freely peruse and comment, and generally well accepted by both author and community as a nice way to share academic discourse without the enormous fees, especially on opinion papers in which a rigorous peer review may not be necessary.

But it was very nice to see a major news outlet like Bloomberg View understand the lowly scientist’s aggravations.

Thanks Bloomberg!

 

 

 

 

 

Advertisements

Read Full Post »


Mozilla Science Lab Promotes Data Reproduction Through Open Access: Report from 9/10/2015 Online Meeting

Reporter: Stephen J. Williams, Ph.D.

Mozilla Inc. is developing a platform for scientists to discuss the issues related to developing a framework to share scientific data as well as tackle the problems of scientific reproducibility in an Open Access manner. According to their blog

https://blog.mozilla.org/blog/2013/06/14/5992/

We’re excited to announce the launch of the Mozilla Science Lab, a new initiative that will help researchers around the world use the open web to shape science’s future.

Scientists created the web — but the open web still hasn’t transformed scientific practice to the same extent we’ve seen in other areas like media, education and business. For all of the incredible discoveries of the last century, science is still largely rooted in the “analog” age. Credit systems in science are still largely based around “papers,” for example, and as a result researchers are often discouraged from sharing, learning, reusing, and adopting the type of open and collaborative learning that the web makes possible.

The Science Lab will foster dialog between the open web community and researchers to tackle this challenge. Together they’ll share ideas, tools, and best practices for using next-generation web solutions to solve real problems in science, and explore ways to make research more agile and collaborative.

On their blog they highlight various projects related to promoting Open Access for scientific data

On September 10, 2015 Mozilla Science Lab had their scheduled meeting on scientific data reproduce ability.  The meeting was free and covered by ethernet and on social media. The Twitter hashtag for updates and meeting discussion is #mozscience (https://twitter.com/search?q=%23mozscience )

Open Access Meeting Announcement on Twitter

https://twitter.com/MozillaScience/status/641642491532283904

//platform.twitter.com/widgets.js

mozilla science lab

Mozilla Science Lab @MozillaScience

Join @khinsen @abbycabs + @EvoMRI tmrw (11AM ET) to hear about replication, publishing + #openscience. Details: https://etherpad.mozilla.org/sciencelab-calls-sep10-2015 …

AGENDA:

  • Mozilla Science Lab Updates
  • Staff welcomes and thank yous:
  • Welcoming Zannah Marsh, our first Instructional Designer
  • Workshopping the “Working Open” guide:
    • Discussion of Future foundation and GitHub projects
    • Discussion of submission for open science project funding
  • Contributorship Badges Pilot – an update! – Abby Cabunoc Mayes – @abbycabs
  • Will be live on GigaScience September 17th!
  • Where you can jump in: https://github.com/mozillascience/paperbadger/issues/17
  • Questions regarding coding projects – Abby will coordinate efforts on coding into their codebase
  • The journal will publish and authors and reviewers get a badge and their efforts and comments will appear on GigaScience: Giga Science will give credit for your reviews – supports an Open Science Discussion

Roadmap for

  • Fellows review is in full swing!
  • MozFest update:
  • Miss the submission deadline? You can still apply to join our Open Research Accelerator and join us for the event (PLUS get a DOI for your submission and 1:1 help)

A discussion by Konrad Hinsen (@khinsen) on ReScience, a journal focused on scientific replication will be presented:

  • ReScience – a new journal for replications – Konrad Hinsen @khinsen
  • ReScience is dedicated to publishing replications of previously published computational studies, along with all the code required to replicate the results.
  • ReScience lives entirely on GitHub. Submissions take the form of a Git repository, and review takes place in the open through GitHub issues. This also means that ReScience is free for everyone (authors, readers, reviewers, editors… well, I said everyone, right?), as long as GitHub is willing to host it.
  • ReScience was launched just a few days ago and is evolving quickly. To stay up to date, follow @ReScienceEds on Twitter. If you want to volunteer as a reviewer, please contact the editorial board.

The ReScience Journal Reproducible Science is Good. Replicated Science is better.

ReScience is a peer-reviewed journal that targets computational research and encourages the explicit reproduction of already published research promoting new and open-source implementations in order to ensure the original research is reproducible. To achieve such a goal, the whole editing chain is radically different from any other traditional scientific journal. ReScience lives on github where each new implementation is made available together with the comments, explanations and tests. Each submission takes the form of a pull request that is publicly reviewed and tested in order to guarantee any researcher can re-use it. If you ever reproduced computational result from the literature, ReScience is the perfect place to publish this new implementation. The Editorial Board

Notes from his talk:

– must be able to replicate paper’s results as written according to experimental methods

– All authors on ReScience need to be on GitHub

– not accepting MatLab replication; replication can involve computational replication;

  • Research Ideas and Outcomes Journal – Daniel Mietchen @EvoMRI
    • Postdoc at Natural Museum of London doing data mining; huge waste that 90% research proposals don’t get used so this journal allows for publishing proposals
    • Learned how to write proposals by finding a proposal online open access
    • Reviewing system based on online reviews like GoogleDocs where people view, comment
    • Growing editorial and advisory board; venturing into new subject areas like humanities, economics, biological research so they are trying to link diverse areas under SOCIAL IMPACT labeling
    • BIG question how to get scientists to publish their proposals especially to improve efficiency of collaboration and reduce too many duplicated efforts as well as reagent sharing
    • Crowdfunding platform used as post publication funding mechanism; still in works
    • They need a lot of help on the editorial board so if have a PhD PLEASE JOIN
  • Website:
  • Background:
  • Science article:
  • Some key features:
  • for publishing all steps of the research cycle, from proposals (funded and not yet funded) onwards
  • maps submissions to societal challenges
  • focus on post-publication peer review; pre-submission endorsement; all reviews public
  • lets authors choose which publishing services they want, e.g. whether they’d like journal-mediated peer review
  • collaborative WYSIWYG authoring and publishing platform based on JATS XML

A brief discussion of upcoming events on @MozillaScience

Meetings are held 2nd Thursdays of each month

Additional plugins, coding, and new publishing formats are available at https://www.mozillascience.org/

Other related articles on OPEN ACCESS Publishing were published in this Open Access Online Scientific Journal, include the following:

Archives of Medicine (AOM) to Publish from “Leaders in Pharmaceutical Business Intelligence (LPBI)” Open Access On-Line Scientific Journal http://pharmaceuticalintelligence.com

Annual Growth in NIH Clicks: 32% Open Access Online Scientific Journal http://pharmaceuticalintelligence.com

Collaborations and Open Access Innovations – CHI, BioIT World, 4/29 – 5/1/2014, Seaport World Trade Center, Boston

Elsevier’s Mendeley and Academia.edu – How We Distribute Scientific Research: A Case in Advocacy for Open Access Journals

Reconstructed Science Communication for Open Access Online Scientific Curation

The Fatal Self Distraction of the Academic Publishing Industry: The Solution of the Open Access Online Scientific Journals

 

Read Full Post »


Scientific Curation Fostering Expert Networks and Open Innovation: Lessons from Clive Thompson

Life-cycle of Science 2

 

 

 

 

 

 

 

 

 

 

 

Curators and Writer: Stephen J. Williams, Ph.D. with input from Curators Larry H. Bernstein, MD, FCAP, Dr. Justin D. Pearlman, MD, PhD, FACC and Dr. Aviva Lev-Ari, PhD, RN

(this discussion is in a three part series including:

Using Scientific Content Curation as a Method for Validation and Biocuration

Using Scientific Content Curation as a Method for Open Innovation)

 

Every month I get my Wired Magazine (yes in hard print, I still like to turn pages manually plus I don’t mind if I get grease or wing sauce on my magazine rather than on my e-reader) but I always love reading articles written by Clive Thompson. He has a certain flair for understanding the techno world we live in and the human/technology interaction, writing about interesting ways in which we almost inadvertently integrate new technologies into our day-to-day living, generating new entrepreneurship, new value.   He also writes extensively about tech and entrepreneurship.

October 2013 Wired article by Clive Thompson, entitled “How Successful Networks Nurture Good Ideas: Thinking Out Loud”, describes how the voluminous writings, postings, tweets, and sharing on social media is fostering connections between people and ideas which, previously, had not existed. The article was generated from Clive Thompson’s book Smarter Than you Think: How Technology is Changing Our Minds for the Better.Tom Peters also commented about the article in his blog (see here).

Clive gives a wonderful example of Ory Okolloh, a young Kenyan-born law student who, after becoming frustrated with the lack of coverage of problems back home, started a blog about Kenyan politics. Her blog not only got interest from movie producers who were documenting female bloggers but also gained the interest of fellow Kenyans who, during the upheaval after the 2007 Kenyan elections, helped Ory to develop a Google map for reporting of violence (http://www.ushahidi.com/, which eventually became a global organization using open-source technology to affect crises-management. There are a multitude of examples how networks and the conversations within these circles are fostering new ideas. As Clive states in the article:

 

Our ideas are PRODUCTS OF OUR ENVIRONMENT.

They are influenced by the conversations around us.

However the article got me thinking of how Science 2.0 and the internet is changing how scientists contribute, share, and make connections to produce new and transformative ideas.

But HOW MUCH Knowledge is OUT THERE?

 

Clive’s article listed some amazing facts about the mountains of posts, tweets, words etc. out on the internet EVERY DAY, all of which exemplifies the problem:

  • 154.6 billion EMAILS per DAY
  • 400 million TWEETS per DAY
  • 1 million BLOG POSTS (including this one) per DAY
  • 2 million COMMENTS on WordPress per DAY
  • 16 million WORDS on Facebook per DAY
  • TOTAL 52 TRILLION WORDS per DAY

As he estimates this would be 520 million books per DAY (book with average 100,000 words).

A LOT of INFO. But as he suggests it is not the volume but how we create and share this information which is critical as the science fiction writer Theodore Sturgeon noted “Ninety percent of everything is crap” AKA Sturgeon’s Law.

 

Internet live stats show how congested the internet is each day (http://www.internetlivestats.com/). Needless to say Clive’s numbers are a bit off. As of the writing of this article:

 

  • 2.9 billion internet users
  • 981 million websites (only 25,000 hacked today)
  • 128 billion emails
  • 385 million Tweets
  • > 2.7 million BLOG posts today (including this one)

 

The Good, The Bad, and the Ugly of the Scientific Internet (The Wild West?)

 

So how many science blogs are out there? Well back in 2008 “grrlscientistasked this question and turned up a total of 19,881 blogs however most were “pseudoscience” blogs, not written by Ph.D or MD level scientists. A deeper search on Technorati using the search term “scientist PhD” turned up about 2,000 written by trained scientists.

So granted, there is a lot of

goodbadugly

 

              ….. when it comes to scientific information on the internet!

 

 

 

 

 

I had recently re-posted, on this site, a great example of how bad science and medicine can get propagated throughout the internet:

https://pharmaceuticalintelligence.com/2014/06/17/the-gonzalez-protocol-worse-than-useless-for-pancreatic-cancer/

 

and in a Nature Report:Stem cells: Taking a stand against pseudoscience

http://www.nature.com/news/stem-cells-taking-a-stand-against-pseudoscience-1.15408

Drs.Elena Cattaneo and Gilberto Corbellini document their long, hard fight against false and invalidated medical claims made by some “clinicians” about the utility and medical benefits of certain stem-cell therapies, sacrificing their time to debunk medical pseudoscience.

 

Using Curation and Science 2.0 to build Trusted, Expert Networks of Scientists and Clinicians

 

Establishing networks of trusted colleagues has been a cornerstone of the scientific discourse for centuries. For example, in the mid-1640s, the Royal Society began as:

 

“a meeting of natural philosophers to discuss promoting knowledge of the

natural world through observation and experiment”, i.e. science.

The Society met weekly to witness experiments and discuss what we

would now call scientific topics. The first Curator of Experiments

was Robert Hooke.”

 

from The History of the Royal Society

 

Royal Society CoatofArms

 

 

 

 

 

 

The Royal Society of London for Improving Natural Knowledge.

(photo credit: Royal Society)

(Although one wonders why they met “in-cognito”)

Indeed as discussed in “Science 2.0/Brainstorming” by the originators of OpenWetWare, an open-source science-notebook software designed to foster open-innovation, the new search and aggregation tools are making it easier to find, contribute, and share information to interested individuals. This paradigm is the basis for the shift from Science 1.0 to Science 2.0. Science 2.0 is attempting to remedy current drawbacks which are hindering rapid and open scientific collaboration and discourse including:

  • Slow time frame of current publishing methods: reviews can take years to fashion leading to outdated material
  • Level of information dissemination is currently one dimensional: peer-review, highly polished work, conferences
  • Current publishing does not encourage open feedback and review
  • Published articles edited for print do not take advantage of new web-based features including tagging, search-engine features, interactive multimedia, no hyperlinks
  • Published data and methodology incomplete
  • Published data not available in formats which can be readably accessible across platforms: gene lists are now mandated to be supplied as files however other data does not have to be supplied in file format

(put in here a brief blurb of summary of problems and why curation could help)

 

Curation in the Sciences: View from Scientific Content Curators Larry H. Bernstein, MD, FCAP, Dr. Justin D. Pearlman, MD, PhD, FACC and Dr. Aviva Lev-Ari, PhD, RN

Curation is an active filtering of the web’s  and peer reviewed literature found by such means – immense amount of relevant and irrelevant content. As a result content may be disruptive. However, in doing good curation, one does more than simply assign value by presentation of creative work in any category. Great curators comment and share experience across content, authors and themes. Great curators may see patterns others don’t, or may challenge or debate complex and apparently conflicting points of view.  Answers to specifically focused questions comes from the hard work of many in laboratory settings creatively establishing answers to definitive questions, each a part of the larger knowledge-base of reference. There are those rare “Einstein’s” who imagine a whole universe, unlike the three blind men of the Sufi tale.  One held the tail, the other the trunk, the other the ear, and they all said this is an elephant!
In my reading, I learn that the optimal ratio of curation to creation may be as high as 90% curation to 10% creation. Creating content is expensive. Curation, by comparison, is much less expensive.

– Larry H. Bernstein, MD, FCAP

Curation is Uniquely Distinguished by the Historical Exploratory Ties that Bind –Larry H. Bernstein, MD, FCAP

The explosion of information by numerous media, hardcopy and electronic, written and video, has created difficulties tracking topics and tying together relevant but separated discoveries, ideas, and potential applications. Some methods to help assimilate diverse sources of knowledge include a content expert preparing a textbook summary, a panel of experts leading a discussion or think tank, and conventions moderating presentations by researchers. Each of those methods has value and an audience, but they also have limitations, particularly with respect to timeliness and pushing the edge. In the electronic data age, there is a need for further innovation, to make synthesis, stimulating associations, synergy and contrasts available to audiences in a more timely and less formal manner. Hence the birth of curation. Key components of curation include expert identification of data, ideas and innovations of interest, expert interpretation of the original research results, integration with context, digesting, highlighting, correlating and presenting in novel light.

Justin D Pearlman, MD, PhD, FACC from The Voice of Content Consultant on The  Methodology of Curation in Cardiovascular Original Research: Cases in Methodology Design for Content Co-Curation The Art of Scientific & Medical Curation

 

In Power of Analogy: Curation in Music, Music Critique as a Curation and Curation of Medical Research Findings – A Comparison, Drs. Larry Bernstein and Aviva Lev-Ari likens the medical and scientific curation process to curation of musical works into a thematic program:

 

Work of Original Music Curation and Performance:

 

Music Review and Critique as a Curation

Work of Original Expression what is the methodology of Curation in the context of Medical Research Findings Exposition of Synthesis and Interpretation of the significance of the results to Clinical Care

… leading to new, curated, and collaborative works by networks of experts to generate (in this case) ebooks on most significant trends and interpretations of scientific knowledge as relates to medical practice.

 

In Summary: How Scientific Content Curation Can Help

 

Given the aforementioned problems of:

        I.            the complex and rapid deluge of scientific information

      II.            the need for a collaborative, open environment to produce transformative innovation

    III.            need for alternative ways to disseminate scientific findings

CURATION MAY OFFER SOLUTIONS

        I.            Curation exists beyond the review: curation decreases time for assessment of current trends adding multiple insights, analyses WITH an underlying METHODOLOGY (discussed below) while NOT acting as mere reiteration, regurgitation

 

      II.            Curation providing insights from WHOLE scientific community on multiple WEB 2.0 platforms

 

    III.            Curation makes use of new computational and Web-based tools to provide interoperability of data, reporting of findings (shown in Examples below)

 

Therefore a discussion is given on methodologies, definitions of best practices, and tools developed to assist the content curation community in this endeavor.

Methodology in Scientific Content Curation as Envisioned by Aviva lev-Ari, PhD, RN

 

At Leaders in Pharmaceutical Business Intelligence, site owner and chief editor Aviva lev-Ari, PhD, RN has been developing a strategy “for the facilitation of Global access to Biomedical knowledge rather than the access to sheer search results on Scientific subject matters in the Life Sciences and Medicine”. According to Aviva, “for the methodology to attain this complex goal it is to be dealing with popularization of ORIGINAL Scientific Research via Content Curation of Scientific Research Results by Experts, Authors, Writers using the critical thinking process of expert interpretation of the original research results.” The following post:

Cardiovascular Original Research: Cases in Methodology Design for Content Curation and Co-Curation

 

https://pharmaceuticalintelligence.com/2013/07/29/cardiovascular-original-research-cases-in-methodology-design-for-content-curation-and-co-curation/

demonstrate two examples how content co-curation attempts to achieve this aim and develop networks of scientist and clinician curators to aid in the active discussion of scientific and medical findings, and use scientific content curation as a means for critique offering a “new architecture for knowledge”. Indeed, popular search engines such as Google, Yahoo, or even scientific search engines such as NCBI’s PubMed and the OVID search engine rely on keywords and Boolean algorithms …

which has created a need for more context-driven scientific search and discourse.

In Science and Curation: the New Practice of Web 2.0, Célya Gruson-Daniel (@HackYourPhd) states:

To address this need, human intermediaries, empowered by the participatory wave of web 2.0, naturally started narrowing down the information and providing an angle of analysis and some context. They are bloggers, regular Internet users or community managers – a new type of profession dedicated to the web 2.0. A new use of the web has emerged, through which the information, once produced, is collectively spread and filtered by Internet users who create hierarchies of information.

.. where Célya considers curation an essential practice to manage open science and this new style of research.

As mentioned above in her article, Dr. Lev-Ari represents two examples of how content curation expanded thought, discussion, and eventually new ideas.

  1. Curator edifies content through analytic process = NEW form of writing and organizations leading to new interconnections of ideas = NEW INSIGHTS

i)        Evidence: curation methodology leading to new insights for biomarkers

 

  1. Same as #1 but multiple players (experts) each bringing unique insights, perspectives, skills yielding new research = NEW LINE of CRITICAL THINKING

ii)      Evidence: co-curation methodology among cardiovascular experts leading to cardiovascular series ebooks

Life-cycle of Science 2

The Life Cycle of Science 2.0. Due to Web 2.0, new paradigms of scientific collaboration are rapidly emerging.  Originally, scientific discovery were performed by individual laboratories or “scientific silos” where the main method of communication was peer-reviewed publication, meeting presentation, and ultimately news outlets and multimedia. In this digital era, data was organized for literature search and biocurated databases. In an era of social media, Web 2.0, a group of scientifically and medically trained “curators” organize the piles of data of digitally generated data and fit data into an organizational structure which can be shared, communicated, and analyzed in a holistic approach, launching new ideas due to changes in organization structure of data and data analytics.

 

The result, in this case, is a collaborative written work above the scope of the review. Currently review articles are written by experts in the field and summarize the state of a research are. However, using collaborative, trusted networks of experts, the result is a real-time synopsis and analysis of the field with the goal in mind to

INCREASE THE SCIENTIFIC CURRENCY.

For detailed description of methodology please see Cardiovascular Original Research: Cases in Methodology Design for Content Co-Curation The Art of Scientific & Medical Curation

 

In her paper, Curating e-Science Data, Maureen Pennock, from The British Library, emphasized the importance of using a diligent, validated, and reproducible, and cost-effective methodology for curation by e-science communities over the ‘Grid:

“The digital data deluge will have profound repercussions for the infrastructure of research and beyond. Data from a wide variety of new and existing sources will need to be annotated with metadata, then archived and curated so that both the data and the programmes used to transform the data can be reproduced for use in the future. The data represent a new foundation for new research, science, knowledge and discovery”

— JISC Senior Management Briefing Paper, The Data Deluge (2004)

 

As she states proper data and content curation is important for:

  • Post-analysis
  • Data and research result reuse for new research
  • Validation
  • Preservation of data in newer formats to prolong life-cycle of research results

However she laments the lack of

  • Funding for such efforts
  • Training
  • Organizational support
  • Monitoring
  • Established procedures

 

Tatiana Aders wrote a nice article based on an interview with Microsoft’s Robert Scoble, where he emphasized the need for curation in a world where “Twitter is the replacement of the Associated Press Wire Machine” and new technologic platforms are knocking out old platforms at a rapid pace. In addition he notes that curation is also a social art form where primary concerns are to understand an audience and a niche.

Indeed, part of the reason the need for curation is unmet, as writes Mark Carrigan, is the lack of appreciation by academics of the utility of tools such as Pinterest, Storify, and Pearl Trees to effectively communicate and build collaborative networks.

And teacher Nancy White, in her article Understanding Content Curation on her blog Innovations in Education, shows examples of how curation in an educational tool for students and teachers by demonstrating students need to CONTEXTUALIZE what the collect to add enhanced value, using higher mental processes such as:

  • Knowledge
  • Comprehension
  • Application
  • Analysis
  • Synthesis
  • Evaluation

curating-tableA GREAT table about the differences between Collecting and Curating by Nancy White at http://d20innovation.d20blogs.org/2012/07/07/understanding-content-curation/

 

 

 

 

 

 

 

 

 

 

 

University of Massachusetts Medical School has aggregated some useful curation tools at http://esciencelibrary.umassmed.edu/data_curation

Although many tools are related to biocuration and building databases but the common idea is curating data with indexing, analyses, and contextual value to provide for an audience to generate NETWORKS OF NEW IDEAS.

See here for a curation of how networks fosters knowledge, by Erika Harrison on ScoopIt

(http://www.scoop.it/t/mobilizing-knowledge-through-complex-networks)

 

“Nowadays, any organization should employ network scientists/analysts who are able to map and analyze complex systems that are of importance to the organization (e.g. the organization itself, its activities, a country’s economic activities, transportation networks, research networks).”

Andrea Carafa insight from World Economic Forum New Champions 2012 “Power of Networks

 

Creating Content Curation Communities: Breaking Down the Silos!

 

An article by Dr. Dana Rotman “Facilitating Scientific Collaborations Through Content Curation Communities” highlights how scientific information resources, traditionally created and maintained by paid professionals, are being crowdsourced to professionals and nonprofessionals in which she termed “content curation communities”, consisting of professionals and nonprofessional volunteers who create, curate, and maintain the various scientific database tools we use such as Encyclopedia of Life, ChemSpider (for Slideshare see here), biowikipedia etc. Although very useful and openly available, these projects create their own challenges such as

  • information integration (various types of data and formats)
  • social integration (marginalized by scientific communities, no funding, no recognition)

The authors set forth some ways to overcome these challenges of the content curation community including:

  1. standardization in practices
  2. visualization to document contributions
  3. emphasizing role of information professionals in content curation communities
  4. maintaining quality control to increase respectability
  5. recognizing participation to professional communities
  6. proposing funding/national meeting – Data Intensive Collaboration in Science and Engineering Workshop

A few great presentations and papers from the 2012 DICOSE meeting are found below

Judith M. Brown, Robert Biddle, Stevenson Gossage, Jeff Wilson & Steven Greenspan. Collaboratively Analyzing Large Data Sets using Multitouch Surfaces. (PDF) NotesForBrown

 

Bill Howe, Cecilia Aragon, David Beck, Jeffrey P. Gardner, Ed Lazowska, Tanya McEwen. Supporting Data-Intensive Collaboration via Campus eScience Centers. (PDF) NotesForHowe

 

Kerk F. Kee & Larry D. Browning. Challenges of Scientist-Developers and Adopters of Existing Cyberinfrastructure Tools for Data-Intensive Collaboration, Computational Simulation, and Interdisciplinary Projects in Early e-Science in the U.S.. (PDF) NotesForKee

 

Ben Li. The mirages of big data. (PDF) NotesForLiReflectionsByBen

 

Betsy Rolland & Charlotte P. Lee. Post-Doctoral Researchers’ Use of Preexisting Data in Cancer Epidemiology Research. (PDF) NoteForRolland

 

Dana Rotman, Jennifer Preece, Derek Hansen & Kezia Procita. Facilitating scientific collaboration through content curation communities. (PDF) NotesForRotman

 

Nicholas M. Weber & Karen S. Baker. System Slack in Cyberinfrastructure Development: Mind the Gaps. (PDF) NotesForWeber

Indeed, the movement of Science 2.0 from Science 1.0 had originated because these “silos” had frustrated many scientists, resulting in changes in the area of publishing (Open Access) but also communication of protocols (online protocol sites and notebooks like OpenWetWare and BioProtocols Online) and data and material registries (CGAP and tumor banks). Some examples are given below.

Open Science Case Studies in Curation

1. Open Science Project from Digital Curation Center

This project looked at what motivates researchers to work in an open manner with regard to their data, results and protocols, and whether advantages are delivered by working in this way.

The case studies consider the benefits and barriers to using ‘open science’ methods, and were carried out between November 2009 and April 2010 and published in the report Open to All? Case studies of openness in research. The Appendices to the main report (pdf) include a literature review, a framework for characterizing openness, a list of examples, and the interview schedule and topics. Some of the case study participants kindly agreed to us publishing the transcripts. This zip archive contains transcripts of interviews with researchers in astronomy, bioinformatics, chemistry, and language technology.

 

see: Pennock, M. (2006). “Curating e-Science Data”. DCC Briefing Papers: Introduction to Curation. Edinburgh: Digital Curation Centre. Handle: 1842/3330. Available online: http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation– See more at: http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/curating-e-science-data#sthash.RdkPNi9F.dpuf

 

2.      cBIO -cBio’s biological data curation group developed and operates using a methodology called CIMS, the Curation Information Management System. CIMS is a comprehensive curation and quality control process that efficiently extracts information from publications.

 

3. NIH Topic Maps – This website provides a database and web-based interface for searching and discovering the types of research awarded by the NIH. The database uses automated, computer generated categories from a statistical analysis known as topic modeling.

 

4. SciKnowMine (USC)- We propose to create a framework to support biocuration called SciKnowMine (after ‘Scientific Knowledge Mine’), cyberinfrastructure that supports biocuration through the automated mining of text, images, and other amenable media at the scale of the entire literature.

 

  1. OpenWetWareOpenWetWare is an effort to promote the sharing of information, know-how, and wisdom among researchers and groups who are working in biology & biological engineering. Learn more about us.   If you would like edit access, would be interested in helping out, or want your lab website hosted on OpenWetWare, pleasejoin us. OpenWetWare is managed by the BioBricks Foundation. They also have a wiki about Science 2.0.

6. LabTrove: a lightweight, web based, laboratory “blog” as a route towards a marked up record of work in a bioscience research laboratory. Authors in PLOS One article, from University of Southampton, report the development of an open, scientific lab notebook using a blogging strategy to share information.

7. OpenScience ProjectThe OpenScience project is dedicated to writing and releasing free and Open Source scientific software. We are a group of scientists, mathematicians and engineers who want to encourage a collaborative environment in which science can be pursued by anyone who is inspired to discover something new about the natural world.

8. Open Science Grid is a multi-disciplinary partnership to federate local, regional, community and national cyberinfrastructures to meet the needs of research and academic communities at all scales.

 

9. Some ongoing biomedical knowledge (curation) projects at ISI

IICurate
This project is concerned with developing a curation and documentation system for information integration in collaboration with the II Group at ISI as part of the BIRN.

BioScholar
It’s primary purpose is to provide software for experimental biomedical scientists that would permit a single scientific worker (at the level of a graduate student or postdoctoral worker) to design, construct and manage a shared knowledge repository for a research group derived on a local store of PDF files. This project is funded by NIGMS from 2008-2012 ( RO1-GM083871).

10. Tools useful for scientific content curation

 

Research Analytic and Curation Tools from University of Queensland

 

Thomson Reuters information curation services for pharma industry

 

Microblogs as a way to communicate information about HPV infection among clinicians and patients; use of Chinese microblog SinaWeibo as a communication tool

 

VIVO for scientific communities– In order to connect this information about research activities across institutions and make it available to others, taking into account smaller players in the research landscape and addressing their need for specific information (for example, by proving non-conventional research objects), the open source software VIVO that provides research information as linked open data (LOD) is used in many countries.  So-called VIVO harvesters collect research information that is freely available on the web, and convert the data collected in conformity with LOD standards. The VIVO ontology builds on prevalent LOD namespaces and, depending on the needs of the specialist community concerned, can be expanded.

 

 

11. Examples of scientific curation in different areas of Science/Pharma/Biotech/Education

 

From Science 2.0 to Pharma 3.0 Q&A with Hervé Basset

http://digimind.com/blog/experts/pharma-3-0/

Hervé Basset, specialist librarian in the pharmaceutical industry and owner of the blog “Science Intelligence“, to talk about the inspiration behind his recent book  entitled “From Science 2.0 to Pharma 3.0″, published by Chandos Publishing and available on Amazon and how health care companies need a social media strategy to communicate and convince the health-care consumer, not just the practicioner.

 

Thomson Reuters and NuMedii Launch Ground-Breaking Initiative to Identify Drugs for Repurposing. Companies leverage content, Big Data analytics and expertise to improve success of drug discovery

 

Content Curation as a Context for Teaching and Learning in Science

 

#OZeLIVE Feb2014

http://www.youtube.com/watch?v=Ty-ugUA4az0

Creative Commons license

 

DigCCur: A graduate level program initiated by University of North Carolina to instruct the future digital curators in science and other subjects

 

Syracuse University offering a program in eScience and digital curation

 

Curation Tips from TED talks and tech experts

Steven Rosenbaum from Curation Nation

http://www.youtube.com/watch?v=HpncJd1v1k4

 

Pawan Deshpande form Curata on how content curation communities evolve and what makes a good content curation:

http://www.youtube.com/watch?v=QENhIU9YZyA

 

How the Internet of Things is Promoting the Curation Effort

Update by Stephen J. Williams, PhD 3/01/19

Up till now, curation efforts like wikis (Wikipedia, Wikimedicine, Wormbase, GenBank, etc.) have been supported by a largely voluntary army of citizens, scientists, and data enthusiasts.  I am sure all have seen the requests for donations to help keep Wikipedia and its other related projects up and running.  One of the obscure sister projects of Wikipedia, Wikidata, wants to curate and represent all information in such a way in which both machines, computers, and humans can converse in.  About an army of 4 million have Wiki entries and maintain these databases.

Enter the Age of the Personal Digital Assistants (Hellooo Alexa!)

In a March 2019 WIRED article “Encyclopedia Automata: Where Alexa Gets Its Information”  senior WIRED writer Tom Simonite reports on the need for new types of data structure as well as how curated databases are so important for the new fields of AI as well as enabling personal digital assistants like Alexa or Google Assistant decipher meaning of the user.

As Mr. Simonite noted, many of our libraries of knowledge are encoded in an “ancient technology largely opaque to machines-prose.”   Search engines like Google do not have a problem with a question asked in prose as they just have to find relevant links to pages. Yet this is a problem for Google Assistant, for instance, as machines can’t quickly extract meaning from the internet’s mess of “predicates, complements, sentences, and paragraphs. It requires a guide.”

Enter Wikidata.  According to founder Denny Vrandecic,

Language depends on knowing a lot of common sense, which computers don’t have access to

A wikidata entry (of which there are about 60 million) codes every concept and item with a numeric code, the QID code number. These codes are integrated with tags (like tags you use on Twitter as handles or tags in WordPress used for Search Engine Optimization) so computers can identify patterns of recognition between these codes.

Now human entry into these databases are critical as we add new facts and in particular meaning to each of these items.  Else, machines have problems deciphering our meaning like Apple’s Siri, where they had complained of dumb algorithms to interpret requests.

The knowledge of future machines could be shaped by you and me, not just tech companies and PhDs.

But this effort needs money

Wikimedia’s executive director, Katherine Maher, had prodded and cajoled these megacorporations for tapping the free resources of Wiki’s.  In response, Amazon and Facebook had donated millions for the Wikimedia projects.  Google recently gave 3.1 million USD$ in donations.

 

Future postings on the relevance and application of scientific curation will include:

Using Scientific Content Curation as a Method for Validation and Biocuration

 

Using Scientific Content Curation as a Method for Open Innovation

 

Other posts on this site related to Content Curation and Methodology include:

The growing importance of content curation

Data Curation is for Big Data what Data Integration is for Small Data

6 Steps to More Effective Content Curation

Stem Cells and Cardiac Repair: Content Curation & Scientific Reporting

Cancer Research: Curations and Reporting

Cardiovascular Diseases and Pharmacological Therapy: Curations

Cardiovascular Original Research: Cases in Methodology Design for Content Co-Curation The Art of Scientific & Medical Curation

Exploring the Impact of Content Curation on Business Goals in 2013

Power of Analogy: Curation in Music, Music Critique as a Curation and Curation of Medical Research Findings – A Comparison

conceived: NEW Definition for Co-Curation in Medical Research

The Young Surgeon and The Retired Pathologist: On Science, Medicine and HealthCare Policy – The Best Writers Among the WRITERS

Reconstructed Science Communication for Open Access Online Scientific Curation

 

 

Read Full Post »


Benefits of Open Data for Economic Research

By Angela Guess on October 24, 2012 6:00 PM

Guo of OpenEconomics.net recently discussed the benefits of open data for economic research. He writes, “There used to be a time when data was costly: There was not much data around. Comparable GDP data, for example, has only been collected starting in the early mid 20th Century. Computing power was expensive and costly: Data and commands were stored on punch cards, and researchers only had limited hours to run their statistical analyses at the few computers available at hand.”

He goes on, “Today, however, statistics and econometric analysis has arrived in every office: Open Data initiatives at the World Bank and governments have made it possible to download cross-country GDP and related data using a few mouse-clicks. The availability of open source statistical packages such as R allows virtually everyone to run quantitative analyses on their own laptops and computers. Consequently, the number of empirical papers have increased substantially. The [above] figure (taken from Espinosa et al. 2012) plots the number of econometric (statistical) outputs per article in a given year: Quantitative research has really taken off since the 1960s. Where researchers used datasets with a few dozens of observations, modern applied econometricians now often draw upon datasets boasting millions of detailed micro-level observations.”

Image: Courtesy OpenEconomics

SOURCE:

http://semanticweb.com/benefits-of-open-data-for-economic-research_b32917

The Benefits of Open Data (part II) – Impact on Economic Research

October 21, 2012 in Open Economics

A couple of weeks ago, I wrote the first part of the three part series on Open Data in Economics. Drawing upon examples from top research that focused on how providing information and data can help increase the quality of public service provision, the article explored economic research on open data. In this second part, I would like to explore the impact of openness on economic research.

We live in a data-driven age

There used to be a time when data was costly: There was not much data around. Comparable GDP data, for example, has only been collected starting in the early mid 20th Century. Computing power was expensive and costly: Data and commands were stored on punch cards, and researchers only had limited hours to run their statistical analyses at the few computers available at hand.

Today, however, statistics and econometric analysis has arrived in every office: Open Data initiatives at the World Bank and governments have made it possible to download cross-country GDP and related data using a few mouse-clicks. The availability of open source statistical packages such as R allows virtually everyone to run quantitative analyses on their own laptops and computers. Consequently, the number of empirical papers have increased substantially. The left figure (taken from Espinosa et al. 2012) plots the number of econometric (statistical) outputs per article in a given year: Quantitative research has really taken off since the 1960s. Where researchers used datasets with a few dozens of observations, modern applied econometricians now often draw upon datasets boasting millions of detailed micro-level observations.

Why we need open data and access

The main economic argument in favour of open data is gains from trade. These gains come in several dimensions: First, open data helps avoid redundancy. As a researcher, you may know there are often same basic procedures (such as cleaning datasets, merging datasets) that have been done thousands of times, by hundreds of different researchers. You may also have experienced the time wasted compiling a dataset someone else already put together, but was unwilling to share: Open data in these cases can save a lot of time, allowing you to build upon the work of others. By feeding your additions back to the ecosystem, you again ensure that others can build on your data work. Just like there is no need to re-invent the wheel several times, the sharing of data allows researchers to build on existing data work and devote valuable time to genuinely new research.

Second, open data ensures the most efficient allocation of scarce resources – in this case datasets. Again, as a researcher, you may know that academics often treat their datasets as private gold mines. Indeed, entire research careers are often built on possessing a unique dataset. This hoarding often results in valuable data lying around on a forgotten harddisk, not fully used and ultimately wasted. What’s worse, the researcher – even though owning a unique dataset – may not be the most skilled to make full use of the dataset, while someone else may possess the necessary skills but not the data. Only recently, I had the opportunity to talk to a group of renown economists who – over the past decades – have compiled an incredibly rich dataset. During the conversation, it was mentioned that they themselves may have only exploited 10% of the data – and were urgently looking for fresh PhDs and talented researchers to unlock the full potential of the their data. But when data is open, there is no need to search, and data can be allocated to the most skilled researcher.

Finally, and perhaps most importantly, open data – by increasing transparency – also fosters scientific rigour: When datasets and statistical procedures are made available to everyone, a curious undergraduate student may be able to replicate and possibly refute the results of a senior researcher. Indeed, journals are increasingly asking researchers to publish their datasets along with the paper. But while this is a great step forward, most journals still keep the actual publication closed, asking for horrendous subscription fees. For example, readers of my first post may have noticed that many of the research articles linked could not be downloaded without a subscription or university affiliation. Since dissemination, replication and falsification are key features of science, the role of both open data and open access become essential to knowledge generation.

But there are of course challenges ahead: For example, while a wider access to data and statistical tools is a good thing, the ease of running regressions with a few mouse-clicks also results in a lot of mindless data mining and nonsensical econometric outputs. Quality control, hence, is and remains important. There are and in some cases also should be some barriers to data sharing. In some cases, researchers have invested a substantial time of their lives to construct their datasets, in which case it is understandable why some are uncomfortable to share their “baby” with just anyone. In addition, releasing (even anonymized) micro-level data often raises concerns of privacy protection. These issues – and existing solutions – will be discussed in the next post.

SOURCE:

http://openeconomics.net/2012/10/21/the-benefits-of-open-data-part-ii-impact-on-economic-research/

Advocacy for Open Access Publishing and the Economic Synergy obtained from Open Data

Anatomy of open access publishing: a study of longitudinal development and internal structure

Mikael Laakso* and Bo-Christer Björk

Author Affiliations

Hanken School of Economics, Helsinki, Finland

BMC Medicine 2012, 10:124 doi:10.1186/1741-7015-10-124

The electronic version of this article is the complete one and can be found online at:http://www.biomedcentral.com/1741-7015/10/124

Received: 27 July 2012
Accepted: 28 September 2012
Published: 22 October 2012

© 2012 Laakso and Björk; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Open access (OA) is a revolutionary way of providing access to the scholarly journal literature made possible by the Internet. The primary aim of this study was to measure the volume of scientific articles published in full immediate OA journals from 2000 to 2011, while observing longitudinal internal shifts in the structure of OA publishing concerning revenue models, publisher types and relative distribution among scientific disciplines. The secondary aim was to measure the share of OA articles of all journal articles, including articles made OA by publishers with a delay and individual author-paid OA articles in subscription journals (hybrid OA), as these subsets of OA publishing have mostly been ignored in previous studies.

Methods

Stratified random sampling of journals in the Directory of Open Access Journals (n = 787) was performed. The annual publication volumes spanning 2000 to 2011 were retrieved from major publication indexes and through manual data collection.

Results

An estimated 340,000 articles were published by 6,713 full immediate OA journals during 2011. OA journals requiring article-processing charges have become increasingly common, publishing 166,700 articles in 2011 (49% of all OA articles). This growth is related to the growth of commercial publishers, who, despite only a marginal presence a decade ago, have grown to become key actors on the OA scene, responsible for 120,000 of the articles published in 2011. Publication volume has grown within all major scientific disciplines, however, biomedicine has seen a particularly rapid 16-fold growth between 2000 (7,400 articles) and 2011 (120,900 articles). Over the past decade, OA journal publishing has steadily increased its relative share of all scholarly journal articles by about 1% annually. Approximately 17% of the 1.66 million articles published during 2011 and indexed in the most comprehensive article-level index of scholarly articles (Scopus) are available OA through journal publishers, most articles immediately (12%) but some within 12 months of publication (5%).

Conclusions

OA journal publishing is disrupting the dominant subscription-based model of scientific publishing, having rapidly grown in relative annual share of published journal articles during the last decade.

Keywords:

Open access; scientific publishing

Background

Open access (OA) has expanded the possibilities for disseminating one’s own research and accessing that of others [1,2]. OA, in the context of scholarly publishing, is a term widely used to refer to unrestricted online access to articles published in scholarly journals. There are two distinct ways for scholarly articles to become available OA, either directly provided by the journal publisher (gold OA), or indirectly by being uploaded and made freely available somewhere else on the Web (green OA). Both options increase the potential readership of any article to over a billion individuals with Internet access and indirectly speed up the spread of new research ideas. While the majority of OA journals do not charge authors anything for the services provided, a growing minority of professionally operating journals charge authors fees ranging from 20 to 3800 USD, with an estimated average of 900 USD [3].

OA is closely related to developments in other media content delivery businesses, and its ethos is well aligned with the fundamental openness principle of science itself as well as the ideologies behind Wikipedia and open source software. However, what makes scientific publishing distinct is the influence journal prestige and rankings have on journal selection for authors submitting article manuscripts [4]. There are also vested interests to preserve the status quo of the current subscription market among stakeholders, with dominant publishers seeing OA as a potential threat to the bottom-line. Friction caused by these and other factors can be argued to slow down the process of OA adoption because journals are not direct substitutes for each other and subscription-based journal copyright agreements can prohibit parallel distribution of published content. However, following in the footsteps of the National Institutes of Health in the US, public research funders in the UK have recently launched strategies to increase OA to publicly funded research [5]. While the ultimate goal of increasing access to publicly funded research is known and widely accepted it is difficult to reach compromises that balance the long- and short-term interests of the stakeholders involved [6].

Important changes in policy facilitating growth of OA happen on many levels, influencing research publishing both upstream and downstream. The examples from the public funders in the US and UK are merely the most ambitious movements so far: public and private research funders large and small, universities, publishers and research institutes all contribute to forming the evolving OA landscape. The problem that has persisted with OA since the start is the lack of readily available data for how this particular subset of journal publishing is developing over time, an aspect which is described in closer detail in the Methods section. Policymakers should have an interest in knowing how common OA is today, how fast the share of OA has increased and what proportion of journal articles are currently OA? The purpose of this study is to provide answers to these types of questions.

Aim of the study

This study focuses on providing measurement of the longitudinal development gold OA publication volume for the years 2000 to 2011 as a whole and by subtype: full immediate journal OA, delayed OA and hybrid OA. As will be described in more detail further on, earlier studies have mostly ignored the subset of delayed OA journals. This is partly because there is no comprehensive index of such journals similar to the service the Directory of Open Access Journals (DOAJ) provides for immediate OA journals, and partly because of the divisive acceptance of delayed OA as a valid form of OA. However, the subset of delayed OA journals is both substantial in volume and is populated with many high-quality journals; five of the 10 most-cited journals within Thomson Reuters Web of Knowledge in the period from 1999 to 2009 are currently delayed OA while the others are subscription-access only [7]. Hybrid OA is the term commonly used for describing individual articles being provided openly within subscription-only journals through an optional author payment; it is only recently that this type of OA has been properly studied [8].

The chosen research aim is related to some existing areas of OA research that warrant mention to clarify the specific contribution of this study. Green OA is not part of the scope of this study as that is a wholly different research problem and one that requires its own set of methods, as different versions of articles are scattered around on the Web. Furthermore, this study does not extensively discuss or evaluate the pros or cons of OA, since there is already a well-developed body of literature focusing on issues such as relationships between OA and readership, citation or impact [912]. In summary, the aim is to provide comprehensive and up-to-date quantitative measurement of gold OA journals and articles. The results and data of this study can then potentially act as a foundation for more targeted research enquiries.

Previous studies

Researchers have applied different methods to cope with the lack of readily available quantitative data to study the OA phenomenon, ranging from labor-intensive manual article-counting [1315] to automated Web-crawling [16,17]. What is known about the early years of OA, both gold and green, is mostly through a series of independent studies providing snapshots for individual years based on sampling various publication indexes. The fact that studies have been based around OA prevalence within different publication indexes and the diverse adopted sampling methods makes comparisons or composition of longitudinal development inexact. Nevertheless, these are the best figures currently available. The earliest comprehensive study suggests the 2003 share for gold OA to have been 2.9% for articles included in the Thomson Reuters Web of Knowledge [18]. The next study was performed for the 2006 publication volume based on data from UlrichsWeb [19] and the DOAJ [20], where a gold OA share of 8.1% and a green OA share of 11.3% resulted in a combined OA share of 19.4% [14]. For 2008 articles, the Thomson Reuters Web of Knowledge gold OA share was measured to be 6.6% and green OA 14%, resulting in a figure of total OA of 20.6% [21]. Also for 2008, a large-scale study based on English-language journals listed in the DOAJ calculated that 120,000 articles were published OA either through full immediate OA journals or as individual hybrid OA articles [22]. The first comprehensive longitudinal study on the volume of articles published by full immediate OA journals in the DOAJ resulted in an average annual year-on-year growth rate of 30% from 2000 to 2009, with some 191,000 articles published during 2009 [13]. Another longitudinal study, including both gold and green OA, produced a total OA share of 23.1% for Thomson Reuters Web of Knowledge indexed articles published during 2010 [16]. Outside of this 2010 study of Thomson Reuters Web of Knowledge, there are no comprehensive measurements for OA volume since 2009. This study is designed to provide a longitudinal study implementing a well-documented and easily replicable methodology, producing results applicable to multiple publication indexes, producing results that are easy to follow-up and compare with future measurements.

Methods

Sampling

The study is founded on the assumption that the full population of OA journals is listed in the DOAJ. There are OA journals not indexed in this database, but systematically identifying them is not feasible. Because the majority of the 7,372 journals listed in the DOAJ on 1 January 2012 were not included in any indexing service that would reliably keep track of their article output, nor the exact year previously subscription-based journals have converted to OA, gathering data is largely a manual task and one of the major practical challenges for the execution of studies of this type. To strike a balance between feasibility and reliability, stratified random sampling with unequal probabilities was utilized, a sampling method that has proven suitable for similar studies in the past [13]. An argument for adopting this approach in favor of fully random sampling is that the population of OA journals is highly heterogeneous, where a small number of titles output a large proportion of the total article volume [22]. The fact that large journals can be identified with a high degree of certainty through various indexing services also means that reliable, readily available article count information can be used for journals responsible for a major part of the total OA output. A visualization of the sampling is provided in Figure 1A cross-analysis of data available from SCImago [23], Thomson Reuters Web of Knowledge [24] and the DOAJ identified 103 OA journals that had published over 200 articles annually during 2009, 2010 or 2011; these were included in the large journal stratum. The rest of the 7,269 DOAJ journals were represented by a second stratum with a sample of 684 journals selected at random among them, each given an observation weight of 10.62719 (684 × 10.62719 = 7269). The stratum of large journals was only applied an observation weight of 1 since the population of that stratum is exhaustively sampled.

thumbnailFigure 1. Visualization of the sampling.

Data collection

Through a previous study using identical sampling and data collection methodology [13], data for 565 journals spanning publication volumes for 2000 to 2009 could be re-used, with only the need to gather publication volumes for two additional years. Since the existing data material lacked coverage for journals added to the DOAJ during 2010 and 2011, an additional randomly selected sample was drawn out of the journals added within the two missing years adhering to the same sampling probability as the pre-existing sample (0.1011), with 222 new journals added to the existing sample of 565 journals.

Where journal publication volumes could be retrieved from either SCImago or Thomson Reuters Web of Knowledge, such data was used. For the majority of journals, the individual journal websites were visited and the annual entries collected manually. It is worthwhile to note that journals often include editorials, news, book reviews, obituaries and other non-research content. Such material was excluded from all measurements in this study. To provide an accurate representation of retrospective OA volume, articles were not collected for subscription-only journals prior to publishing OA. Determining when a journal has initiated OA publishing often requires manual investigation as the information is not always made explicit on the webpages, and the data concerning this is often incorrect in the journal metadata available in the DOAJ. To support the analysis of the sampled journals, additional data from Scopus [25] and Thomson Reuters Web of Knowledge was utilized in addition to the data that is already available through the DOAJ.

Results

The longitudinal development of full immediate OA article volume spanning 2000 to 2011 is presented visually in Figure 2 and numerically in Table 1, where a breakdown of the total volume is provided for articles split into three different categories: online-only journals that require an article-processing charge, online-only journals that do not require an article-processing charge, and journals that still output print versions for subscribers but have all articles available OA online. It is important to point out that journals still producing a print version might also require an article-processing charge in addition to having income from subscriptions. However, such differentiation is not provided here due to the relative rarity of such journals as well as a desire to focus on these three mutually exclusive business models specifically.

thumbnailFigure 2. Annual volumes of articles in full immediate open access journals, split by type of open access journal.

Table 1. Estimated annual article and journal counts in full immediate open access journals

Overall there has been growth in the annual output among all three categories since the year 2000, going from a total volume of 20,700 articles in 2000 to 340,000 in 2011. Not depicted in Figure 2 but provided in Table 1 is the number of active OA journals for each respective year (journals with at least one article published during the respective year), which has increased from 744 journals in the year 2000 to 6,713 in 2011. The average number of articles per journal has also seen a constant increase, with an average of 26 articles per journal in 2000, 33 in 2005, and 51 for 2011. However, a reminder about the skewed nature of article distribution among journals is relevant here. There is a handful of journals publishing more than 1,000 articles per year and thousands of journals publishing only a few articles annually.

Inspecting the internal structure of the total article mass reveals some major shifts that have happened over the course of a decade. Journals that also publish a parallel print version, which are often old, established journals that decided to make the online version free when they started putting their content on the Web, provided the majority of the OA content up until the year 2008 where, for the first time, online-only journals took the lead in terms of output volume. Since 2008, the online-only journals have sustained a much stronger growth while the OA output provided by journals outputting a print version has plateaued to annual volumes between 100,000 and 110,000 articles. The latter group includes a lot of society journals registered with dedicated portals like SciELO [26], Redalyc [27] and J-Stage [28] providing the technical platform for electronic publishing. Journals with author-processing charges have seen breakout growth during the last three years, going from 80,700 articles in 2009 to 166,700 articles in 2011.

Cross-analysis of the sample with the titles listed in Thomson Reuters Web of Knowledge index and Elsevier’s Scopus index was performed, only including the titles present in the respective index to calculate the share of OA articles of all peer-reviewed articles. Table 2 provides the main results of this analysis, presented as longitudinal breakdowns of publisher-provided OA in the two indexing services. Nearly half of all full immediate OA articles published during 2011 were outside of Scopus and two thirds outside of Thomson Reuters Web of Knowledge, meaning that a large portion of article OA article volume lacks coverage in major publication indexes. This issue highlights the importance of using manual data collection methods in OA studies because data available from indexes only provide part of the total picture. In addition to the results concerning full immediate OA journals, Table 2 also contains volume data for two other types of publisher-provided OA in each respective index: delayed OA and hybrid OA.

Table 2. Proportion of publisher-provided (gold) open access in major indexes

Of the 1.66 million articles indexed by Scopus in 2011, 11% were published in full immediate OA journals, 0.7% as hybrid OA and 5.2% in journals that have a maximum OA delay of 12 months. Together, these account for almost 17% of the total article volume in the whole index. The figures for articles indexed by Thomson Reuters Web of Knowledge are comparable to those of Scopus, with a total publisher-provided OA rate of 16.2% for 2011. Of the 1.29 million articles indexed by Thomson Reuters Web of Knowledge, 7.9% are available in full immediate OA journals, 0.7% as hybrid OA and 6.4% in journals that have a maximum OA delay of 12 months. Overall the results suggest that there has been an increase of about one percentage point annually in relative OA volume in both Scopus and Thomson Reuters Web of Knowledge during 2008 to 2011.

Figure 3 presents the longitudinal development of OA publisher output as measured by the number of articles output by publishers based in different regions of the world. This figure, and all that follow, only includes full immediate OA journals, excluding delayed and hybrid OA. Prior to interpretation it needs to be noted that this is a publisher-centric analysis. In some cases, the publisher is not registered within the same country, or even region of the world, as the journal. The results suggest that Latin American countries were early to have substantial OA output, possibly due to the early availability of the SciELO portal. However, the region has not increased its output at a similar rate as North America, Asia or Europe, who have multiplied their outputs between 2005 and 2011.

thumbnailFigure 3. Open access publisher output across geographic regions.

Figure 4 presents the total OA article volume for 2000, 2005 and 2011 split according to publisher type. The analysis shows that the early years of OA publishing were largely driven by scientific societies, professional associations, universities and their departments as well as individual scientists. Scientific societies and universities have maintained strong growth throughout the decade, while scientist-driven publication has been overshadowed by the article volume produced by the more formally organized publisher types. The most dramatic development since 2005 is the rapid increase in articles published by commercial publishers, jumping from 13,400 articles in 2005 to 119,900 in 2011, resulting in commercial publishers currently being the most common publisher of OA articles. The category of professional non-commercial publishers is a new type of publisher that has rapidly emerged during the last few years, largely attributed to the journals published by the Public Library of Science.

thumbnailFigure 4. Open access publisher type analysis.

Figure 5 presents the OA article volumes for the years 2000, 2005 and 2011 split across the major scientific disciplines, with an additional category for general science journals. Throughout the decade, articles in journals broadly related to biomedicine have held the lead in terms of article volume, and since 2005 the gap to the other disciplines has been further extended. Biomedical journals published 120,900 articles in 2011, constituting 35.5% of the total OA article output for the year. In second place in terms of volume for 2011 is the social sciences and humanities, almost tied with earth and environmental sciences in third place, publishing 56,000 and 54,900 articles respectively. Coming in fourth place in terms of size is engineering, which is the discipline that has seen the most dramatic relative growth between 2005 and 2011, from publishing only 4,800 articles in 2005 to 37,500 articles in 2011. In fifth place for 2011 is physics and astronomy with 16,000 articles; however, previous studies have shown there to be particularly strong practice and supporting infrastructure for parallel publication within this discipline, potentially lessening the demand for OA journals [21]. Chemistry and chemical engineering is sixth in terms of size with 12,700 articles in 2011, followed by general science journals and mathematics at the tail end with 12,600 and 7,200 articles respectively. The category of general science journals is a relatively new one with only marginal volume until recently. Journals belonging to this category have little or no limitations with regards to research subject or scope. Though it could be argued that PLOS ONE is a general science journal, the vast majority of actual articles published so far have been within the scope of biomedicine, thus that specific journal was placed within the biomedicine category for this coarse disciplinary breakdown.

thumbnailFigure 5. Open access across major scientific disciplines.

Discussion

Over the course of the last decade, OA journal publishing has grown universally across diverse types of journal publishers, geographical regions and scientific disciplines. This has resulted in a continuously growing proportion of journal articles being published OA for each year that has passed, with the most recent measurement from this study being 17% when delayed OA articles with a maximum embargo of 12 months are included. However, despite all the studied dimensions showing increases in annual article output over the decade, the results of the study show that growth has not been uniform across the board. OA publishing seems to be in a very dynamic growth phase, with major shifts in the internal composition happening in a relatively short span of time.

A major strength of the study is associated with the labor-intensive manual approach to data collection, where the annual article volumes for each journal included in the sample was registered for the years 2000 to 2011. This approach reduces the risk of using incorrect, skewed or incomplete source data. The methodological transparency should also enable others to produce comparable numbers to follow-up and compare with the measurements provided here. What can be held as a weakness is the reliance on sampling rather than complete population coverage, however, such an approach is not feasible with the indexing tools currently available and manually collecting the data for over 7,000 journals is a very labor-intensive task.

In comparison with existing studies, this is not only the first study to provide comprehensive gold OA measurement for 2010 and 2011, but the results for the earlier years studied are also more accurate and representative of the actual volumes published at the time. The previous directly comparable study suggested that 191,000 articles were published by full immediate OA journals during 2009 [13], whereas this study suggests the volume for the same year to actually be 225,600. The discrepancy in retrospective annual volumes between these two studies, or any other earlier study using data from the DOAJ, is influenced by the time-lag between the time journals actually start publishing OA and the time they get registered to the DOAJ. In part, this is because journals have to submit a request to the DOAJ to be added, meaning that journals rarely are registered from the first issue they publish, if at all. Another issue is the time the DOAJ takes to process new addition requests; as of September 2012 the backlog of journals currently in queue for evaluation is described as being ‘huge’ on the DOAJ contact page [20]. Exploring this issue more closely through the sampled journals, it appears that over half of the sampled journals added to the DOAJ during 2010 and 2011 had been publishing OA already prior to 2010, with a handful of cases publishing OA for over a decade prior to DOAJ registration. As was noted in the introduction, most other earlier studies have been limited by only looking at specific OA subsets for specific years, and are thus not directly comparable. However, despite this inability to compare our estimates directly with earlier studies because of methodological incompatibilities, all the results nevertheless speak for the notion of a strong longitudinal growth for OA, particularly so for the biomedical research field.

The results, in particular the finding that approximately 17% of scholarly journal articles are already now made openly available on the Web within a year by the publishers, should be an important input for the policy discussions on OA in venues like the US Congress, the European Commission and the UK Finch Committee that recently published its report with OA-guidelines for British research funders [6]. This study also sheds new light on the relative contributions of the two complementary routes for achieving OA, the publisher-provided gold route and the author-provided green route, indicating that the contribution of gold (both immediate and articles withheld for short embargo periods) is much larger than many earlier estimates. The results should also be considered together with two other recent studies [3,9]. These studies suggest that the level of article-processing charges paid is on average around 900 USD, which is lower than generally believed, and that the scientific impact of OA journals founded in the last decade, and in particular in biomedicine, is on par with similar subscription journals, as measured by average number of citations.

It no longer seems to be a question whether OA is a viable alternative to the traditional subscription model for scholarly journal publishing; the question is rather when OA publishing will become the mainstream model. What remains to be seen is whether the growth will continue at a similar rate as measured during last few years, or if it will accelerate to an even steeper part of the S-shaped adoption pattern typical of many innovations [29]. As in many other markets where the Internet has thoroughly rewritten the rules of the game, an interesting question is if new entrants, like Public Library of Science and BioMed Central, will take over the market or if the old established actors, commercial and society publishers with subscription-based revenue models, will be able to adapt their business models and regain the ground they have so far lost. Future studies on the internal structure of OA publishing are likely to witness the anatomy transforming yet again. Most of the major internal shifts in OA journal publishing have only happened fairly recently during the last few years and, judging by the momentum at which things are moving, it is hard to imagine the internal dynamics settling down any time soon.

Competing interests

The authors declare that they have no competing financial interests. B-CB founded an OA journal in the 1990s and is emeritus Editor-in-Chief. B-CB is a current board member of the Open Access Scholarly Publishers Association.

Authors’ contributions

ML and B-CB conceived, designed and coordinated the study. ML handled most of the data collection and analysis. Both authors participated equally to interpretation of the results and writing of the manuscript. Both authors have read and approved the final manuscript.

Authors’ information

ML is a doctoral student in Information Systems Science at the Hanken School of Economics, Helsinki, Finland. B-CB is professor of Information Systems Science at the Hanken School of Economics, Helsinki, Finland.

References

  1. Suber P: Open Access. Cambridge: MIT Press; 2012. OpenURL 
  2. Willinsky J: The Access Principle – the Case for Open Access to Research and Scholarship. Cambridge: MIT Press; 2005. OpenURL 
  3. Solomon DJ, Björk B-C: A study of open access journals using article processing charges.J Am Soc Info Sci Technol 2012, 63:1485-1495. Publisher Full Text OpenURL

     

  4. Knight LV, Steinbach TA: Selecting an appropriate publication outlet: a comprehensive model of journal selection criteria for researchers in a broad range of academic disciplines.International Journal of Doctoral Studies 2008, 3:59-79. OpenURL

     

  5. RCUK Announces New Open Access Policy , press release[http://www.rcuk.ac.uk/media/news/2012news/Pages/120716.aspwebcite2012.

     

  6. Finch J: Accessibility, sustainability, excellence: how to expand access to research publications. [http://www.researchinfonet.org/publish/finch/webciteReport of the Working Group on Expanding Access to Published Research Findings. Research Information Network 2012. OpenURL

     

  7. Sciencewatch – Top Ten Most-Cited Journals (All Fields), 1999-2009[http://sciencewatch.com/dr/sci/09/aug2-09_2/webcite 
  8. Björk B-C: The hybrid model for open access publication of scholarly articles: a failed experiment?J Am Soc Inf Sci Technol 2012, 63:1496-1504. Publisher Full Text OpenURL

     

  9. Björk B-C, Solomon DJ: Open access versus subscription journals – a comparison of scientific impact.BMC Med 2012, 10:73. PubMed Abstract | BioMed Central Full Text |PubMed Central Full Text OpenURL

     

  10. Davis PM, Walters WH: The impact of free access to the scientific literature: a review of recent research.J Med Libr Assoc 2011, 99:208-217. PubMed Abstract | Publisher Full Text |PubMed Central Full Text OpenURL

     

  11. Wagner A: Open access citation advantage: an annotated bibliography.Issues in Science and Technology Librarianship 2010., 60:

    doi.10.5062/F4Q81B0W [http://www.istl.org/10-winter/article2.html webcite

    OpenURL

     

  12. Craig ID, Plume AM, McVeigh ME, Pringle J, Amin M: Do open access articles have greater citation impact? a critical review of the literature.Journal of Infometrics 2007, (1):239-248. OpenURL

     

  13. Laakso M, Welling P, Bukvova H, Nyman L, Björk B-C, Hedlund T: The development of open access journal publishing from 1993 to 2009.PLoS One 2011, 6:e20961. PubMed Abstract | Publisher Full Text |PubMed Central Full Text OpenURL

     

  14. Björk B-C, Roos A, Lauri M: Scientific journal publishing: yearly volume and open access availability.Information Research 2009, 14:e391. OpenURL

     

  15. Crawford W: Free electronic refereed journals: getting past the arc of enthusiasm.Learned Publishing 2002, 15:117-123. Publisher Full Text OpenURL

     

  16. Gargouri Y, Larivière V, Gingras Y, Carr L, Harnad S: Green and gold open access percentages and growth, by discipline. [http://arxiv.org/abs/1206.3664webcite 
  17. Matsubayashi M, Kurata K, Sakai Y, Morioka T, Kato S, Mine S, Ueda S: Status of open access in the biomedical field in 2005.J Med Libr Assoc 2009, 97:4-11. PubMed Abstract | Publisher Full Text |PubMed Central Full Text OpenURL

     

  18. McVeigh M: Open Access Journals in the ISI Citation Databases: Analysis of Impact Factors and Citation Patterns.[http://science.thomsonreuters.com/m/pdfs/openaccesscitations2.pdfwebcite2004.

     

  19. UlrichsWeb Serials Solutions [http://ulrichsweb.serialssolutions.com/webcite 
  20. DOAJ – Directory of Open Access Journals [http://www.doaj.orgwebcite 
  21. Björk B-C, Welling P, Laakso M, Majlender P, Hedlund T, Guðnason G: Open access to the scientific journal literature: situation 2009.PLoS One 2010, 5:e11273. PubMed Abstract | Publisher Full Text |PubMed Central Full Text OpenURL

     

  22. Dallmeier-Tiessen S, Goerner B, Darby R, Hyppoelae J, Igo-Kemenes P, Kahn D, Lambert S, Lengenfelder A, Leonard C, Mele S, Polydoratou P, Ross D, Ruiz-Perez S, Schimmer R, Swaisland M, van der Stelt: Open access publishing – models and attributes.[http://edoc.mpg.de/478647webciteMax Planck Digital Library/Informationsversorgung 2010. OpenURL

     

  23. SCImago – SCImago Journal & Country Rank [http://www.scimagojr.com]webcite 
  24. Thomson Reuters Web of Knowledge [http://apps.webofknowledge.comwebcite 
  25. Scopus [http://www.scopus.comwebcite 
  26. SciELO [http://www.scielo.org/webcite 
  27. Redalyc [http://redalyc.uaemex.mx/webcite 
  28. J-Stage [http://www.jstage.jst.go.jp/webcite 
  29. Rogers E: Diffusion of Innovations. New York: Free Press; 1995. OpenURL 

Pre-publication history

The pre-publication history for this paper can be accessed here:

http://www.biomedcentral.com/1741-7015/10/124/prepub

SOURCE:

http://www.biomedcentral.com/1741-7015/10/124

 

Read Full Post »