Measuring Research Impact
Larry H. Bernstein, MD, FCAP, Curator
LPBI
Measuring the influence of scientific research has been based on Impact Factor, which has had much discussion. Impact factor is weighted on both the number of publications and the publication in high impact journals. The problem comes in for young investigators who are not connected with highly accomplished scientists and who are not in a highly productive environment. Another problem is that much research today is carried out by multicenter teams at several highly funded universities and this results in 10 to 30 coauthors. The contributions of each author are listed. Another factor that is not counted in, but might be factored in is the rank of the person in the submitted manuscript – even in a blinded peer review.
A recent paper tries to address these issues.
Capturing the Influence https://www.genomeweb.com/scan/capturing-influence
Many researchers bemoan the use of journal impact factors as a means of assessing the influence of scientific articles, Nature‘s Mollie Bloudoff-Indelicato writes. In response to this, the US National Institutes of Health has developed a new metric, dubbed the Relative Citation Ratio, but this approach, too, has drawn criticism, Bloudoff-Indelicato adds.
In a paper posted at bioRxiv, an NIH team led by George Santangelo describes the RCR as an “article-level and field-independent” way to quantify scientific accomplishment. An article’s RCR is calculated by dividing its citation rate by the average citation rate of articles in the field. The RCR is then compared to a benchmark set of NIH-funded papers.
The team applied the metric to nearly 89,000 articles published between 2003 and 2010, and found that the values they generated tracked with what subject matter experts thought.
According to Nature, Stefano Bertuzzi from the American Society for Cell Biology calls the new metric “stunning” in a blog post, but Ludo Waltman from Leiden University says in his own post that it “doesn’t live up to expectations.” Further, he says that its complexity and lack of transparency will like be an impediment to its wider adoption.
“We don’t suggest [the RCR] is the final answer to measuring,” Santangelo adds. “This is just a tool. No one metric is ever going to be used in isolation by the NIH.”
Relative Citation Ratio (RCR): A new metric that uses citation rates to measure influence at the article level
(of log-values, ’03- ’06 vs ’07-’10)
articles in both 4-yr. intervals) 9.8 8.0 0.56
What if I told you that nearly 90 percent of the publications which have profoundly influenced the life sciences did not appear in a high-impact factor journal? If you signed the San Francisco Declaration on Research Assessment, you probably aren’t surprised. If you haven’t signed DORA, it may be time for you to reconsider the connection between true breakthrough papers and so-called journal impact factors (JIFs).
Today we received strong evidence that significant scientific impact is not tied to the publishing journal’s JIF score. First results from a new analytical method that the National Institutes of Health (NIH) is calling the Relative Citation Ratio (RCR) reveal that almost 90% of breakthrough papers first appeared in journals with relatively modest journal impact factors. According to the RCR, these papers exerted major influence within their fields yet their impact was overlooked, not because of their irrelevance, but because of the widespread use of the wrong metrics to rank science.
In the initial RCR analysis carried out by NIH, high impact factor journals (JIF ≥ 28) account for only 11% of papers that have high RCR (3 or above). Here is hard evidence for what DORA supporters have been saying since 2012. Using the JIF to credit influential work means overlooking 89% of similarly influential papers published in less prestigious venues.
The RCR is the creation of an NIH working group led by George Santangelo in the Office of the NIH Director. Santangelo has just uploaded an article in the Cold Spring Harbor BioArchive repository describing the RCR metric. I believe that the Santangelo proposal would significantly advance research assessment.
This marks a significant change in my own thinking. I am firmly convinced that no single metric can serve all purposes. There is no silver bullet in research evaluation; qualitative review by experts remains the gold standard for assessment. And yet I would bet that this new metric will gain currency, contributing to a new and better understanding of impact in science. The RCR provides us a new sophisticated analytical tool, which I hope will put another nail into the coffin of the phony metric of the journal impact factor. As I and many others have said many times, the JIF is the wrong way of assessing article-level or, even worse, individual productivity.
So, what is this new RCR metric? The Relative Citation Ratio may seem complicated at first glance but the concept is simple and very clever. Key to the RCR is the concept of co-citation network. In essence, this new metric compares the citations an article receives to a custom-built citation network which is relevant to that particular paper. The relevant network is defined by the entire collection of papers which are referenced in the papers that cite the reference paper. All this constitutes the denominator of the RCR, while the numerator is simply the citations received by the reference article.
The values used to calculate the denominator of the defined citation network are based on the Journal Citation Rate (JCR), which is also used to compute in journal impact factor. But it is important to note that the RCR, besides being based on the co-citation network, places the journal metric, the JCR, NOT at the numerator, but at the denominator. This makes this new RCR a robust article-level metric normalized to the citations in the custom-built field of relevance and to the expected citations that journals receive in that network. The RCR is then normalized to make comparisons easier. To do so the authors use the cohort of NIH R01 funded papers as the benchmark set.
NIH will provide full access to the algorithms and data to calculate the RCR, making this a highly transparent and accessible tool for the whole scientific community. This is a fundamental change in assessment and it is incredibly exciting.
I am not a bibliometrician, so I don’t pretend to have all the skills to evaluate the metric algorithm in detail. But I am familiar with research evaluation and, after reading this paper carefully, I am convinced it adds something important to our toolbox in the thorny field of research assessment. I am reminded by something that the legendary Nobel laureate Renato Dulbecco once told me: Based on the JIF metric not only would Dulbecco have never been awarded the Nobel Prize, but he probably would have never gotten a job, since his landmark papers were published in rather obscure journals.
This is exactly the point underscored by this new RCR analysis. Often highly innovative ideas—new concepts, technologies, or methods— may be of immediate interest only to a very small group of scientists within their highly specialized area. These seemingly arcane advances attract little notice outside that subfield. Yet on the meandering roads of research, an obscure breakthrough with seemingly little relevance to outsiders may reorient the field. What began with a curiosity-driven observation reported in an obscure journal may roll on to become a landmark discovery. I believe the RCR addresses this problem. My concerns about miracle metrics were assuaged by NIH’s careful benchmarking of the RCR, using expert qualitative review of the RCR scored papers that reported strong concordance.
After years of blasting one-size-fits-all metrics, I find myself in the uncomfortable position of cheering for a new one. Yes, the RCR must be road tested further. It must be tried in multiple fields and situations, and modified, if necessary, to address blind spots. And I still hold that qualitative review by experts remains the gold standard for individual assessment. But from this early report by the Santangelo group, I am convinced that here is a metric that reflects how science really evolves in laboratories, scientific meetings, and in obscure journals. It evaluates science by putting discoveries into a meaningful context. I believe that the RCR is a road out of the JIF swamp.
Well done, NIH.
NIH’s new citation metric: A step forward in quantifying scientific impact?
http://www.cwts.nl/media/images/content/59031366e7f0cc474c326a96fe1f0029__300.png
Quantifying the scientific impact of publications based on their citations received is one of the core problems of evaluative bibliometrics. The problem is especially challenging when the impact of publications from different scientific fields needs to be compared. This requires indicators that correct for differences between fields in citation behavior. Bibliometricians have put a lot of effort into the development of these field-normalized indicators. In a recent paper uploaded in bioRxiv, a new indicator is proposed, the Relative Citation Ratio (RCR). The paper is authored by a team of people affiliated to the US National Institutes of Health (NIH). They claim that the RCR metric satisfies a number of criteria that are not met by existing indicators.
Relative Citation Ratio (RCR): A new metric that uses citation rates to measure influence at the article level
iCite is a tool to access a dashboard of bibliometrics for papers associated with a portfolio. Users upload the PubMed IDs of articles of interest (from SPIRES or PubMed), optionally grouping them for comparison. iCite then displays the number of articles, articles per year, citations per year, and Relative Citation Ratio (a field-normalized metric that shows the citation impact of one or more articles relative to the average NIH-funded paper). A range of years can be selected, as well as article type (all, or only research articles), and individual articles can be toggled on and off. Users can download a report table with the article-level detail for later use or further visualization.
A New and Stunning Metric from NIH Reveals the Real Nature of Scientific Impact
In a simplified form, the idea of the RCR metric can be summarized as follows. To quantify the impact of a publication X, all publications co-cited with publication X are identified. A publication Y is co-cited with publication X if there is another publication in which publications X and Y are both cited. The publications co-cited with publication X are considered to represent the field of publication X. For each publication Y belonging to the field of publication X, a journal impact indicator is calculated, the so-called journal citation rate, which is based on the citations received by all publications that have appeared in the same journal as publication Y. Essentially, the RCR of publication X is obtained by dividing the number of citations received by publication X by the field citation rate of publication X, which is defined as the average journal citation rate of the publications belonging to journal X’s field. By comparing publication X’s number of citations received with its field citation rate, the idea is that a field-normalized indicator of scientific impact is obtained. This enables impact comparisons between publications from different scientific fields.
According to the NIH team, “citation metrics must be article-level, field-normalized in a way that is scalable from small to large portfolios without introducing significant bias at any level, benchmarked to peer performance in order to be interpretable, and correlated with expert opinion. In addition, metrics should be freely accessible and calculated in a transparent way.” The NIH team claims that the RCR metric meets each of these criteria, while other indicators proposed in the bibliometric literature always violate at least one of the criteria. If the NIH team were right, this would represent a major step forward in the development of bibliometric indicators of scientific impact. However, the NIH team significantly overstates the value of the RCR metric.
The most significant weakness of the RCR metric is the lack of a theoretical model for why the metric should provide properly field-normalized statistics. In fact, it is not difficult to cast doubt on the theoretical soundness of the RCR metric. The metric for instance has the highly undesirable property that receiving additional citations may cause the RCR of a publication to decrease rather than increase.
Imagine a situation in which we have two fields, economics and biology, and in which journals in economics all have a journal citation rate of 2 while journals in biology all have a journal citation rate of 8. Consider a publication in economics that has received 5 citations. These citations originate from other economics publications, and these citing publications refer only to economics journals. The field citation rate of our publication of interest then equals 2, and consequently we obtain an RCR of 5 / 2 = 2.5. Now suppose that our publication of interest also starts to receive attention outside economics. A biologist decides to cite it in one of his own publications. Apart from this single economics publication, the biologist refers only to biology journals in his publication. Because biology journals have a much higher journal citation rate than economics journals, the field citation rate of our publication of interest will now increase from 2 to for instance (5 × 2 + 1 × 8) / 6 = 3 (obtained by assuming that 5/6th of the publications co-cited with our publication of interest are in economics and that 1/6th are in biology). The RCR of our publication of interest will then decrease from 5 / 2 = 2.5 to 6 / 3 = 2. This example shows that receiving additional citations may cause a decrease in the RCR of a publication. Especially interdisciplinary citations received from publications in other fields, characterized by different citation practices, are likely to have this effect. Publications may be penalized rather than rewarded for receiving interdisciplinary citations.
Many more comments can be made on the theoretical soundness of the RCR metric. For instance, one could criticize the use of journal citation rates in the calculation of a publication’s field citation rate. If a publication is co-cited with a publication in Science, its field citation rate will depend on the journal citation rate of Science, which in turns depends on the citations received by a highly heterogeneous set of publications, since Science publishes works from many different research areas. It then becomes questionable whether a meaningful field citation rate will be obtained. However, rather than having a further technical discussion on the RCR metric, I will focus on two other claims made by the NIH team.
First, the NIH team claims that “RCR values are well correlated with reviewers’ judgments”. Although the NIH team has put an admirable amount of effort into validating the RCR metric with expert opinion, this claim needs to be assessed critically. The NIH team has performed an extensive analysis of the correlation of RCR values with expert judgments, but it hasn’t performed a comparison with similar correlations obtained for other metrics. Therefore we still don’t know whether the RCR metric correlates more favorably with expert opinion than other metrics do. Given the theoretical problems of the RCR metric, I in fact don’t expect such a favorable outcome.
Second, the NIH team claims that a strength of the RCR metric relative to other metrics is the transparency of its calculation. This is highly contestable. The calculation of the RCR metric as explained above is fairly complex, and this is in fact a simplified version of the actual calculation, which is even more complex. It for instance involves the use of a regression model and a correction for the age of publications. Comparing the RCR metric with other metrics proposed in the bibliometric literature, I would consider transparency to be a weakness rather than a strength of the RCR metric.
Does the RCR metric represent a significant step forward in quantifying scientific impact? Even though the metric is based on some interesting ideas (e.g., the use of co-citations to define the field of a publication), the answer to this question must be negative. The RCR metric doesn’t fulfill the various claims made by the NIH team. Given the questionable theoretical properties of the RCR metric, claiming unbiased field normalization is not justified. Correlation with expert opinion has been investigated, but because no other metrics have been included in the analysis, a proper benchmark is missing. Claiming transparency is problematic given the high complexity of the calculation of the RCR metric.
During recent years, various sophisticated field-normalized indicators have been proposed in the bibliometric literature. Examples include so-called ‘source-normalized’ indicators (exemplified by the SNIP journal impact indicator provided in the Elsevier Scopus database), indicators that perform field normalization based on a large number of algorithmically defined fields (used in the CWTS Leiden Ranking), and an interesting indicator proposed in a recent paper by the Swedish bibliometrician Cristian Colliander. None of these indicators meets all of the criteria suggested by the NIH team, and none of them offers a fully satisfactory solution to the problem of quantifying scientific impact. Yet, I consider these indicators preferable over the RCR metric in terms of both theoretical soundness and transparency of calculation. Given the sometimes contradictory objectives in quantifying scientific impact (e.g., the trade-off between accuracy and transparency), a perfect indicator of scientific impact probably will never be found. However, even when this is taken into account, the RCR metric doesn’t live up to expectations.
Leave a Reply