Posts Tagged ‘publication rating’

Impact Factors and Achievement

Larry H. Bernstein, MD, FCAP, Curator


Tired of Impact Factors? Try the SJR indicator

By Martin Fenner   Jan, 2008

Picking the right journal is one of the most important decisions when you start to work on a paper. You probably have a gut feeling of the journals that are best suited for your paper in progress. To make this decision more objective, you can rely on the Impact Factor of a journal. The Impact Factor is roughly the average number of citations per paper in a given journal and is published by Thomson Scientic. Higher Impact Factors mean more prestigous journals. This information is also frequently used for job or grant applications.

Impact factors have been around for more than 40 years and they generally been very helpful. But there are two big problems:

Impact Factors are published by one privately owned company
Given the importance of Impact Factors for many aspects of scientific publishing, it would be preferable if there were alternatives. And Impact Factors are not freely available, but must be purchased from Thomson Scientific.

Impact Factors might not be the best tool to measure scientific quality
Impact factors have several shortcomings. Because they are a convenient way to judge the scientific output of a person, organization, journal or country, they are often overused. They should for example not be used to compare journals in different fields, e.g. cell biology and particle physics. Measures like the Hirsch Number might be a better tool to measure the scientific output of an individual scientist. And sometimes the judgement of your peers in the field is more important than simple numbers.

The SCImago Journal Rank indicator tries to overcome these two shortcomings. The index was created by the SCImage Research Group, located at several Spanish universities. The index uses information from the Scopus abstract and citation database of research literature owned by Elsevier.

In contrast to the Impact Factor, the SJR indicator measures not simply the number of citations per paper. Citations from a journal with a higher SJRindicator lead to a higher SJR indicator for the cited journal (more details here). This approach is similar to PageRank (described in this paper), the algorithm for web searches by Sergey Brin and Lawrence Page that made Google what it is today. Eigenfactor is another scientific ranking tool that uses a PageRank algorithm.

Most of the time, journals with high Impact Factors have high SJR indicators. Nature and Science are still head to head. We will find unexpected results and discrepancies between the two over time. In my field of oncology, both the Journal of the NCI and Cancer Research are ranked higher than the Journal of Clinical Oncology.

You can read more about the SJR indicator in this Nature News article.

Goodbye PLOS Blogs, Welcome Github Pages

By Martin Fenner     June 15, 2013

This is the last Gobbledygook post on PLOS Blogs, and at the same time the first post at the new Github blog location. I have been blogging at PLOS Blogs since the PLOS Blogs Network was launched in September 2010, so this step wasn’t easy. But I have two good reasons.

In May 2012 I started to work as technical lead for the PLOS Article-Level Metrics project. Although this is contract work, and I also do other things – including spending 5% of my time as clinical researcher at Hannover Medical School – this created the awkward situation that I was never quite sure whether I was blogging as Martin Fenner or as someone working for PLOS. This was all in my head, as I never had any restrictions in my blogging from PLOS. With the recent launch of the PLOS Tech Blog there is now a good venue for the kind of topics I like to write about, and I have started to work on two posts for this new blog.

There will always be topics for which the PLOS Tech Blog is not a good fit, and for these posts I have launched the new personal blog at Github. But the main reason for this new blog is a technical one: I’m moving away from blogging on WordPress to writing my posts in markdown (a lightweight markup language), that are then transformed into static HTML pages using Jekyll and Pandoc. Last weekend I co-organized the workshop Scholarly Markdown together with Stian Haklev. A full workshop report will follow in another post, but the discussions before, at and after the workshop convinced me that Scholarly Markdown has a bright future and that it is time to move more of my writing to markdown. At the end of the workshop each participant suggested a todo item that he/she would be working on, and my todo item was “Think about document type where MD shines”. Markdown might be good for writing scientific papers, but I think it really shines in shorter scientific documents that can easily be shared with others. And blog posts are a perfect fit.

The new site is work in progress. Over time I will copy over all old blog posts from PLOS Blogs, and will work on the layout as well as additional features. Special thanks to Carl Boettiger for helping me to get started with Jekyll and Github pages.

re3data.org: registry of research data repositories launched

By Martin Fenner
Posted: June 1, 2013

Earlier this week re3data.org – the Registry of Research Data Repositories –officially launched. The registry is nicely described in a preprint also published this week.

re3data.org offers researchers, funding organizations, libraries and publishers and overview of the heterogeneous research data repository landscape. Information icons help researchers to identify an adequate repository for the storage and reuse of their data.

The Shape of Science

The Shape of Science is a new graphical interface designed to access the bibliometric indicators database of the SCImago Journal & Country Rank portal (based on 2012 data).

Shape of Science - SCImage


The SCImago Journal & Country Rank is a portal that includes the journals and country scientific indicators developed from the information contained in the Scopus® database (Elsevier B.V.). These indicators can be used to assess and analyze scientific domains.

This platform takes its name from the SCImago Journal Rank (SJR) indicator, developed by SCImago from the widely known algorithmGoogle PageRank™. This indicator shows the visibility of the journals contained in the Scopus® database from 1996.

SCImago is a research group from the Consejo Superior de Investigaciones Científicas (CSIC), University of Granada, Extremadura, Carlos III (Madrid) and Alcalá de Henares, dedicated to information analysis, representation and retrieval by means of visualisation techniques.

As well as SJR Portal, SCImago has developed The Atlas of Science project, which proposes the creation of an information system whose major aim is to achieve a graphic representation of Ibero American Science Research. Such representation is conceived as a collection of interactive maps, allowing navigation functions throughout the semantic spaces formed by the maps.

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Sergey Brin and Lawrence Page

{sergey, page}@cs.stanford.edu

Computer Science Department, Stanford University, Stanford, CA 94305


In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/
To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine — the first such detailed public description we know of to date.
Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

 Keywords: World Wide Web, Search Engines, Information Retrieval, PageRank, Google

  1. Introduction

(Note: There are two versions of this paper — a longer full version and a shorter printed version. The full version is available on the web and the conference CD-ROM.)
The web creates new challenges for information retrieval. The amount of information on the web is growing rapidly, as well as the number of new users inexperienced in the art of web research. People are likely to surf the web using its link graph, often starting with high quality human maintained indices such as Yahoo! or with search engines. Human maintained lists cover popular topics effectively but are subjective, expensive to build and maintain, slow to improve, and cannot cover all esoteric topics. Automated search engines that rely on keyword matching usually return too many low quality matches. To make matters worse, some advertisers attempt to gain people’s attention by taking measures meant to mislead automated search engines. We have built a large-scale search engine which addresses many of the problems of existing systems. It makes especially heavy use of the additional structure present in hypertext to provide much higher quality search results. We chose our system name, Google, because it is a common spelling of googol, or 10100 and fits well with our goal of building very large-scale search engines.

1.1 Web Search Engines — Scaling Up: 1994 – 2000

Search engine technology has had to scale dramatically to keep up with the growth of the web. In 1994, one of the first web search engines, the World Wide Web Worm (WWWW) [McBryan 94] had an index of 110,000 web pages and web accessible documents. As of November, 1997, the top search engines claim to index from 2 million (WebCrawler) to 100 million web documents (from Search Engine Watch). It is foreseeable that by the year 2000, a comprehensive index of the Web will contain over a billion documents. At the same time, the number of queries search engines handle has grown incredibly too. In March and April 1994, the World Wide Web Worm received an average of about 1500 queries per day. In November 1997, Altavista claimed it handled roughly 20 million queries per day. With the increasing number of users on the web, and automated systems which query search engines, it is likely that top search engines will handle hundreds of millions of queries per day by the year 2000. The goal of our system is to address many of the problems, both in quality and scalability, introduced by scaling search engine technology to such extraordinary numbers.

  1. System Features

The Google search engine has two important features that help it produce high precision results. First, it makes use of the link structure of the Web to calculate a quality ranking for each web page. This ranking is called PageRank and is described in detail in [Page 98]. Second, Google utilizes link to improve search results.

2.1 PageRank: Bringing Order to the Web

The citation (link) graph of the web is an important resource that has largely gone unused in existing web search engines. We have created maps containing as many as 518 million of these hyperlinks, a significant sample of the total. These maps allow rapid calculation of a web page’s “PageRank”, an objective measure of its citation importance that corresponds well with people’s subjective idea of importance. Because of this correspondence, PageRank is an excellent way to prioritize the results of web keyword searches. For most popular subjects, a simple text matching search that is restricted to web page titles performs admirably when PageRank prioritizes the results (demo available at google.stanford.edu). For the type of full text searches in the main Google system, PageRank also helps a great deal.

2.1.1 Description of PageRank Calculation

Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page’s importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows:

We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.

PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper.

2.1.2 Intuitive Justification

PageRank can be thought of as a model of user behavior. We assume there is a “random surfer” who is given a web page at random and keeps clicking on links, never hitting “back” but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the “random surfer” will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages. This allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking. We have several other extensions to PageRank, again see [Page 98].

Another intuitive justification is that a page can have a high PageRank if there are many pages that point to it, or if there are some pages that point to it and have a high PageRank. Intuitively, pages that are well cited from many places around the web are worth looking at. Also, pages that have perhaps only one citation from something like the Yahoo! homepage are also generally worth looking at. If a page was not high quality, or was a broken link, it is quite likely that Yahoo’s homepage would not link to it. PageRank handles both these cases and everything in between by recursively propagating weights through the link structure of the web.

In Science, It Matters That Women Come Last



People tell me that, as a female scientist, I need to stand up for myself if I want to succeed: Lean in, close the confidence gapfight for tenure. Being a woman in science means knowing that the odds are both against you being there in the first place and against you staying there. Some of this is due to bias; women are less likely to be hired by science faculty, to be chosen for mathematical tasks and to have their papers deemed high quality. But there are also other barriers to success. Female scientists spend more time rearing children and work at institutions with fewer resources.

One measure of how female scientists are faring is how many papers they write. Papers are the coin of academic science, like court victories to lawyers or hits to baseball players. A widely read paper could earn a scientist tenure or a grant. Papers map money, power and professional connections, and that means we can use them to map where female scientists are succeeding and where inequality prevails.

To this end, I downloaded and statistically analyzed 938,301 scientific papers from the arXiv, a website where physicists, mathematicians and other scientists often post their papers. I inferred the authors’ gender from their first names, using a names list of 40,000 international names classified by native speakers.1 Women’s representation on the arXiv has increased significantly over the 23 years my data set covers:



But I wanted to see not only many how papers women wrote, but also on how many they earned the coveted positions of first author (indicating the scientist primarily responsible for the paper) and last author (indicating the senior scientist who supervised the work). The news is both good and bad. When a female scientist writes a paper, she is more likely to be first author than the average author on that paper. But she is less likely to be last author, writes far fewer papers and is especially unlikely to publish papers on her own. Because she writes fewer papers, she ends up more isolated in the network of scientists, with additional consequences for her career.

The average male scientist authors 45 percent more papers than the average female scientist;2 he authors more than twice as many solo papers, on which he is the only author. (Solo papers can look particularly impressive because the scientist gets all the credit for the work.) Sixty times as many multi-­author papers with identifiable gender for all authors will have all male authors as all female authors; twice as many will have all male authors as any female author.3



As a consequence, women end up at the fringes of the scientific world. We can consider two scientists “connected” if they’ve collaborated on a paper, but even though women tend to work on papers with more authors, they have significantly fewer collaborators and are significantly less central to the overall community of people publishing scientific papers.4 This social isolation matters because of nepotism: Being friends with a scientist who reviews a paper, grant or job application can provide a crucial bonus.

One female scientist I spoke with suggested that women may appear on fewer papers because their contributions are often ignored. “Some men get added to papers even if their contribution was cosmetic, yet women who contributed ideas (and perhaps even writing or data) are left out,” said the woman, who blogs pseudonymously as Female Science Professor.

Maria Mateen, a friend of mine and a psychology researcher at Stanford, offered another explanation for why men write more papers: They are more likely to be “principal investigators” (PIs), senior researchers who run their own labs. In many fields, PIs get their names on papers by default, usually as last author, because they provide funding or resources for the scientists who do most of the work. When I identified PIs in my data set (scientists who were last authors on at least three papers with four or more authors), they were indeed less likely to be women: 12 percent of PIs were women, as opposed to 17 percent of scientists overall. And these PIs wrote far more papers and more first-author papers as well. But though this effect may partially explain the gender discrepancy in publication counts, it probably does not fully explain it: When we compare male PIs to female ones, or male non­-PIs to female non­-PIs, the men still have more papers.

Women might compensate for writing fewer papers by more frequently ending up as first author on the papers they do write. Of the 938,301 papers, 200,485 had multiple authors whose gender I could discern, and of these 56,765, or 28.3 percent, had at least one female author.5 Knowing that women are often less assertive and less inclined to negotiate, I expected to find that they would be pushed out of first authorship. But I found the opposite. After I discarded all papers with only a single author (for which it makes little sense to talk about first authorship) and all papers with authors listed in alphabetical order (to account for the fact that, in fields like mathematics where author order is alphabetical, being first author is no longer prestigious) I was left with 74,829 papers. Had male and female authors been equally likely to come first, there would be 9,683 papers with female-first authors; instead, there are 10,941 — 13 percent more than expected.6 (This difference, like all differences described, is statistically significant.)

But remember that another coveted position on a scientific paper is last author. This often indicates the senior scientist who supervised the work. In the arXiv data set, women are 13 percent less likely to be last authors, possibly because, as noted above, they are less likely to be principal investigators in both the arXiv data set and in previous analyses.



There’s a chance that women are overrepresented as first authors only because they’re underrepresented as last authors. To address this, I looked at all papers with three or more authors and compared how often women were first author to how often they were middle author, and how often they were last author to how often they were middle author. This prevents first authorship from affecting last authorship or vice versa. The results were largely in line with what I found for the entire set: Women were overrepresented in first author positions (relative to middle author) by 8.9 percent and underrepresented in last author positions (relative to middle author) by 10.5 percent.

Women are more likely to be first authors in fields in which they are better represented. A paper written in a field with more female authors is more likely to have a female first author, even when we control for how many authors on the paper are women.7 This effect is like one I observed when studying how women performed in online classes: The more women in a class, the higher the grades they earned relative to men.

This doesn’t necessarily mean that women are benefiting directly from interactions with other women, however. Perhaps the fields with the most women are somehow friendlier to women, making it easier for women to excel and end up as first author. On the other hand, I found evidence that women tend to work together. If a paper has one female author, the other authors on the paper are 35 percent more likely to be female given the share of female authors in the field overall.8 A different study found that female scientists tended to hire more women than male scientists did (and that the gap between whom elite male and female scientists hired was particularly large).

The arXiv data set goes back only 23 years and does not contain every paper in every field. Most papers on the arXiv are in math or physics, and some are in computer science, finance and biology. But there are no papers in the social sciences, and some scientists may not post papers on the arXiv.9 Still, my conclusions are consistent with previous analyses, which have found that female academics publish fewer papers and tend to publish with other women. (One study also found that women’s papers receive fewer citations, data I did not have for the arXiv.)

I also spoke to Jevin West, a professor at the University of Washington, who studies scholarly publication and conducted a similar analysis of gender and authorship using the JSTOR archives. JSTOR, which is not freely available, contains papers back to 1545 and also includes papers from the social sciences and humanities. West said he thought the arXiv contained a fairly comprehensive collection of papers in the fields it focuses on, and our analyses agreed on several points: Women published fewer papers in the JSTOR data set as well, and they were less likely to be last authors. Curiously, the overrepresentation of women in first-author positions may be specific to the hard sciences. Although women were more likely to be first authors in fields such as ecology and molecular biology in the JSTOR data set, they were not in law or sociology.

Once we’ve identified the gender gaps, the next step is to explain them. How much of women’s underrepresentation is due to bias and how much toother factors? While it’s clear that gender bias in science exists, it’s hard to prove merely by examining publication data (though some convincing cases have been made). Other studies have shown that female scientists spend more time on non-research activities, like child­-rearing and teaching, tend to work at institutions that emphasize teaching over research and are more likely to leave the workforce for family reasons. Social dynamics with male scientists may also affect female scientists detrimentally. Women also tend to cite themselves lessself­-promote lessnegotiate less and see smaller performance gains from competition. Wendy Cieslak, the former principal program director for nuclear weapons science and technology at Sandia National Laboratories, emphasized the importance of the confidence gap. “We often don’t recognize and accept that [it] is holding us back until much later in life, when we look back,” she said.

The intense competition in academic science, combined with the gender gap — and uncertainty about how much of that gap is due to bias — is enough to drive a female scientist a little crazy. Was I left off a paper because I’m not smart enough or because I’m female? Do I need to negotiate more forcefully to keep up with my male peers, or will doing so backfire? I once applied for a fellowship and was told, “While clearly a very smart student, applicant’s ‘confidence’ comes across as arrogance.” I wondered whether the reviewer would have written that had I been a man.

The data gives us two causes for hope. The first is that while we are far from gender parity in the sciences, we’re getting closer. As the first chart above shows, women are gaining ground in papers published and posted to the arXiv, and their representation has also increased in the JSTOR data set.

The second is that the rise of big data has made it far easier to study gender inequality. A recent Wharton School study showing that professors are less likely to respond to students who are women or minorities was made far easier by the ability to email 6,548 professors rather than enlist an army of stamp-­licking graduate students. Anyone with a computer can analyze a data set in an effort to find signs of inequality. I hope that the next time I look at the arXiv, I will find more of these analyses. And I hope that more of them will be by lone women.

chance of success - index

chance of success – index

function of previously authored PubMed papers

function of previously authored PubMed papers

function of previous 1st-authored papers in PubMed

function of previous 1st-authored papers in PubMed

Read Full Post »

%d bloggers like this: