Posts Tagged ‘Google’

Google, Verily’s Uses AI to Screen for Diabetic Retinopathy

Reporter : Irina Robu, PhD

Google and Verily, the life science research organization under Alphabet designed a machine learning algorithm to better screen for diabetes and associated eye diseases. Google and Verily believe the algorithm can be beneficial in areas lacking optometrists.

The algorithm is being integrated for the first time in a clinical setting at Aravind Eye Hospital in Madurai, India where it is designed to screen for diabetic retinopathy and diabetic macular edema. After a patient is imaged by trained staff using a fundus camera, the image is uploaded to the screening algorithm through management software. The algorithm then analyzes the images for the diabetic eye diseases before returning the results.

Numerous AI-driven approaches have lately been effective in detecting diabetic retinopathy with high accuracy. An AI-based grading system was able to effectively diagnose two patients with the disease. Furthermore, an AI-driven approach for detecting an early sign of diabetic retinopathy attained an accuracy rate of more than 98 percent.

According to the R. Usha Kim, Chief of retina services at the Aravind Eye Hospital the algorithm permits physicians to work closely with patients on treatment and management of their disease, whereas increasing the volume of screenings we can perform. Automated grading of diabetic retinopathy has possible benefits such as increasing efficiency, reproducible, and coverage of screening programs and improving patient outcomes by providing early detection and treatment.

Even if the technology sounds promising, current research show there are long way until it can directly transfer from the lab into clinic.



Read Full Post »

Best Big Data?

Larry H. Bernstein, MD, FCAP, Curator




What’s The Big Data?

Google’s RankBrain Outranks the Best Brains in the Industry

Bloomberg recently broke the news that Google is “turning its lucrative Web search over to AI machines.” Google revealed to the reporter that for the past few months, a very large fraction of the millions of search queries Google responds to every second have been “interpreted by an artificial intelligence system, nicknamed RankBrain.”

The company that has tried hard to automate its mission to organize the world’s information was happy to report that its machines have again triumphed over humans. When Google search engineers “were asked to eyeball some pages and guess which they thought Google’s search engine technology would rank on top,” RankBrain had an 80% success rate compared to “the humans [who] guessed correctly 70 percent of the time.”

There you have it. Google’s AI machine RankBrain, after only a few months on the job, already outranks the best brains in the industry, the elite engineers that Google typically hires.

Or maybe not. Is RankBrain really “smarter than your average engineer” and already “living up to its AI hype,” as the Bloomberg article informs us, or is this all just, well, hype?

Desperate to find out how far our future machine overlords are already ahead of the best and the brightest (certainly not “average”), I asked Google to shed more light on the test, e.g., how do they determine the “success rate”?

“That test was fairly informal, but it was some of our top search engineers looking at search queries and potential search results and guessing which would be favored by users. (We don’t have more detail to share on how that’s determined; our evaluations are a pretty complex process).”

I guess both RankBrain and Google search engineers were given possible search results to a given query and RankBrain outperformed humans in guessing which are the “better” results, according to some undisclosed criteria.

I don’t know about you, but my TinyBrain is still confused. Wouldn’t Google search engine, with or without RankBrain, outperform any human being, including the smartest people on earth, in terms of “guessing” which search results “would be favored by users”? Haven’t they been mining the entire corpus of human knowledge for more than fifteen years and, by definition, have produced a search engine that “understands” relevance more than any individual human being?

The key to the competition, I guess, is that the “search queries” used in it were not just any search queries but complex queries containing words that have different meaning in different context. It’s the kind of queries that will stump most human beings and it’s quite surprising that Google engineers scored 70% on search queries that presumably require deep domain knowledge in all human endeavors, in addition to search expertise.

The only example of a complex query given in the Bloomberg article is “What’s the title of the consumer at the highest level of a food chain?” The word “consumer” in this context is a scientific term for something that consumes food and the label (the “title”) at highest level of the food chain is “predator.”

This explanation comes from search guru Danny Sullivan who has come to the rescue of perplexed humans like me, providing a detailed RankBrain FAQ, up to the limits imposed by Google’s legitimate reluctance to fully share its secrets. Sullivan: “From emailing with Google, I gather RankBrain is mainly used as a way to interpret the searches that people submit to find pages that might not have the exact words that were searched for.”

Sullivan points out that a lot of work done by humans is behind Google’s outstanding search results (e.g., creating a synonym list or a database with connections between “entities”—places, people, ideas, objects, etc.). But Google needs now to respond to some 450 million new queries per day, queries that have never been entered before into its search engine.

RankBrain “can see patterns between seemingly unconnected complex searches to understand how they’re actually similar to each other,” writes Sullivan. In addition, “RankBrain might be able to better summarize what a page is about than Google’s existing systems have done.”

Finding out the “unknown unknowns,” discovering previously unknown (to humans) links between words and concepts is the marriage of search technology with the hottest trend in big data analysis—deep learning. The real news about RankBrain is that it is the first time Google applied deep learning, the latest incarnation of “neural networks” and a specific type of machine learning, to its most prized asset—its search engine.

Google has been doing machine learning since its inception. The first published paper listed in the AI and  machine learning section of its research page is from 2001, and, to use just one example, Gmail is so good at detecting spam because of machine learning). But Goggle hasn’t applied machine learning to search. That there has been internal opposition to doing so we learn from a summary of a 2008 conversation between Anand Rajaraman and Peter Norvig, co-author of the most popular AI textbook and leader of Google search R&D since 2001. Here’s the most relevant excerpt:

The big surprise is that Google still uses the manually-crafted formula for its search results. They haven’t cut over to the machine learned model yet. Peter suggests two reasons for this. The first is hubris: the human experts who created the algorithm believe they can do better than a machine-learned model. The second reason is more interesting. Google’s search team worries that machine-learned models may be susceptible to catastrophic errors on searches that look very different from the training data. They believe the manually crafted model is less susceptible to such catastrophic errors on unforeseen query types.

This was written three years after Microsoft has applied machine learning to its search technology. But now, Google got over its hubris. 450 million unforeseen query types per day are probably too much for “manually crafted models” and google has decided that a “deep learning” system such as RankBrain provides good enough protection against “catastrophic errors.”

Read Full Post »

Twitter, Google, LinkedIn Enter in the Curation Foray: What’s Up With That?


Reporter: Stephen J. Williams, Ph.D.

Recently Twitter has announced a new feature which they hope to use to increase engagement on their platform. Originally dubbed Project Lightning and now called Moments, this feature involves many human curators which aggregate and curate tweets surrounding individual live events(which used to be under #Live).

As Madhu Muthukumar (@justmadhu), Twitter’s Product Manager, published a blog post describing Moments said:

“Every day, people share hundreds of millions of tweets. Among them are things you can’t experience anywhere but on Twitter: conversations between world leaders and celebrities, citizens reporting events as they happen, cultural memes, live commentary on the night’s big game, and many more,” the blog post noted. “We know finding these only-on-Twitter moments can be a challenge, especially if you haven’t followed certain accounts. But it doesn’t have to be.”

Please see more about Moments on his blog here.

Moments is a new tab on Twitter’s mobile and desktop home screens where the company will curate trending topics as they’re unfolding in real-time — from citizen-reported news to cultural memes to sports events and more. Moments will fall into five total categories, including “Today,” “News,” “Sports,” “Entertainment” and “Fun.” (Source: Fox)

Now It’s Google’s Turn


As Dana Blankenhorn wrote in his article Twitter, Google Try It Buzzfeed’s Way With Curation

in SeekingAlpha

What’s a challenge for Google is a direct threat to Twitter’s existence.

For all the talk about what doesn’t work in journalism, curation works. Following the news, collecting it and commenting, and encouraging discussion, is the “secret sauce” for companies like Buzzfeed, Vox, Vice and The Huffington Post, which often wind up getting more traffic from a story at, say The New York Times (NYSE:NYT), than the Times does as a result.

Curation is, in some ways, a throwback to the pre-Internet era. It’s done by people. (At least I think I’m a people.) So as odd as it is for Twitter (NYSE:TWTR) to announce it will curate live events it’s even odder to see Google (NASDAQ:GOOG) (NASDAQ:GOOGL) doing it in a project called YouTube Newswire.

Buzzfeed, Google’s content curation platform, made for desktop as well as a mobile app, allows sharing of curated news, viral videos.

The feel for both Twitter and Google’s content curation will be like a newspaper, with an army of human content curators determining what is the trendiest news to read or videos to watch.

BuzzFeed articles, or at least, the headlines can easily be mined from any social network but reading the whole article still requires that you open the link within the app or outside using a mobile web browser. Loading takes some time–a few seconds longer. Try browsing the BuzzFeed feed on the app and you’ll notice the obvious difference.

However it was earlier this summer in a Forbes article Why Apple, Snapchat and Twitter are betting on human editors, but Facebook and Google aren’t that Apple, Snapchat and Twitter as well as LinkedIn Pulse and Instragram were going to use human editors and curators while Facebook and Google were going to rely on their powerful algorithms. Google (now Alphabet) CEO Eric Schmidt has even called Apple’s human curated playlists “elitist” although Google Play has human curated playlists.

Maybe Google is responding to views on its Google News like this review in VentureBeat:

Google News: Less focused on social signals than textual ones, Google News uses its analytic tools to group together related stories and highlight the biggest ones. Unlike Techmeme, it’s entirely driven by algorithms, and that means it often makes weird choices. I’ve heard that Google uses social sharing signals from Google+ to help determine which stories appear on Google News, but have never heard definitive confirmation of that — and now that Google+ is all but dead, it’s mostly moot. I find Google News an unsatisfying home page, but it is a good place to search for news once you’ve found it.

Now WordPress Too!


WordPress also has announced its curation plugin called Curation Traffic.

According to WordPress

You Own the Platform, You Benefit from the Traffic

“The Curation Traffic™ System is a complete WordPress based content curation solution. Giving you all the tools and strategies you need to put content curation into action.

It is push-button simple and seamlessly integrates with any WordPress site or blog.

With Curation Traffic™, curating your first post is as easy as clicking “Curate” and the same post that may originally only been sent to Facebook or Twitter is now sent to your own site that you control, you benefit from, and still goes across all of your social sites.”

The theory the more you share on your platform the more engagement the better marketing experience. And with all the WordPress users out there they have already an army of human curators.

So That’s Great For News But What About Science and Medicine?


The news and trendy topics such as fashion and music are common in most people’s experiences. However more technical areas of science, medicine, engineering are not in most people’s domain so aggregation of content needs a process of peer review to sort basically “the fact from fiction”. On social media this is extremely important as sensational stories of breakthroughs can spread virally without proper vetting and even influence patient decisions about their own personal care.

Expertise Depends on Experience

In steps the human experience. On this site ( we attempt to do just this. A consortium of M.D.s, Ph.D. and other medical professionals spend their own time to aggregate not only topics of interest but curate on specific topics to add some more insight from acceptable sources over the web.

In Power of Analogy: Curation in Music, Music Critique as a Curation and Curation of Medical Research Findings – A Comparison; Dr. Larry Berstein compares a museum or music curator to curation of scientific findings and literature and draws similar conclusions from each: that a curation can be a tool to gain new insights previously unseen an observer. A way of stepping back to see a different picture, hear a different song.


For instance, using a Twitter platform, we curate #live meeting notes and tweets from meeting attendees (please see links below and links within) to give a live conference coverage

and curation and analysis give rise not only to meeting engagement butunique insights into presentations.


In addition, the use of a WordPress platform allows easy sharing among many different social platforms including Twitter, Google+, LinkedIn, Pinterest etc.

Hopefully, this will catch on to the big powers of Twitter, Google and Facebook to realize there exists armies of niche curation communities which they can draw on for expert curation in the biosciences.

Other posts on this site on Curation and include


Inevitability of Curation: Scientific Publishing moves to embrace Open Data, Libraries and Researchers are trying to keep up

The Methodology of Curation for Scientific Research Findings

Scientific Curation Fostering Expert Networks and Open Innovation: Lessons from Clive Thompson and others

The growing importance of content curation

Data Curation is for Big Data what Data Integration is for Small Data

Stem Cells and Cardiac Repair: Content Curation & Scientific Reporting

Cardiovascular Diseases and Pharmacological Therapy: Curations

Power of Analogy: Curation in Music, Music Critique as a Curation and Curation of Medical Research Findings – A Comparison








Read Full Post »

Impact Factors and Achievement

Larry H. Bernstein, MD, FCAP, Curator


Tired of Impact Factors? Try the SJR indicator

By Martin Fenner   Jan, 2008

Picking the right journal is one of the most important decisions when you start to work on a paper. You probably have a gut feeling of the journals that are best suited for your paper in progress. To make this decision more objective, you can rely on the Impact Factor of a journal. The Impact Factor is roughly the average number of citations per paper in a given journal and is published by Thomson Scientic. Higher Impact Factors mean more prestigous journals. This information is also frequently used for job or grant applications.

Impact factors have been around for more than 40 years and they generally been very helpful. But there are two big problems:

Impact Factors are published by one privately owned company
Given the importance of Impact Factors for many aspects of scientific publishing, it would be preferable if there were alternatives. And Impact Factors are not freely available, but must be purchased from Thomson Scientific.

Impact Factors might not be the best tool to measure scientific quality
Impact factors have several shortcomings. Because they are a convenient way to judge the scientific output of a person, organization, journal or country, they are often overused. They should for example not be used to compare journals in different fields, e.g. cell biology and particle physics. Measures like the Hirsch Number might be a better tool to measure the scientific output of an individual scientist. And sometimes the judgement of your peers in the field is more important than simple numbers.

The SCImago Journal Rank indicator tries to overcome these two shortcomings. The index was created by the SCImage Research Group, located at several Spanish universities. The index uses information from the Scopus abstract and citation database of research literature owned by Elsevier.

In contrast to the Impact Factor, the SJR indicator measures not simply the number of citations per paper. Citations from a journal with a higher SJRindicator lead to a higher SJR indicator for the cited journal (more details here). This approach is similar to PageRank (described in this paper), the algorithm for web searches by Sergey Brin and Lawrence Page that made Google what it is today. Eigenfactor is another scientific ranking tool that uses a PageRank algorithm.

Most of the time, journals with high Impact Factors have high SJR indicators. Nature and Science are still head to head. We will find unexpected results and discrepancies between the two over time. In my field of oncology, both the Journal of the NCI and Cancer Research are ranked higher than the Journal of Clinical Oncology.

You can read more about the SJR indicator in this Nature News article.

Goodbye PLOS Blogs, Welcome Github Pages

By Martin Fenner     June 15, 2013

This is the last Gobbledygook post on PLOS Blogs, and at the same time the first post at the new Github blog location. I have been blogging at PLOS Blogs since the PLOS Blogs Network was launched in September 2010, so this step wasn’t easy. But I have two good reasons.

In May 2012 I started to work as technical lead for the PLOS Article-Level Metrics project. Although this is contract work, and I also do other things – including spending 5% of my time as clinical researcher at Hannover Medical School – this created the awkward situation that I was never quite sure whether I was blogging as Martin Fenner or as someone working for PLOS. This was all in my head, as I never had any restrictions in my blogging from PLOS. With the recent launch of the PLOS Tech Blog there is now a good venue for the kind of topics I like to write about, and I have started to work on two posts for this new blog.

There will always be topics for which the PLOS Tech Blog is not a good fit, and for these posts I have launched the new personal blog at Github. But the main reason for this new blog is a technical one: I’m moving away from blogging on WordPress to writing my posts in markdown (a lightweight markup language), that are then transformed into static HTML pages using Jekyll and Pandoc. Last weekend I co-organized the workshop Scholarly Markdown together with Stian Haklev. A full workshop report will follow in another post, but the discussions before, at and after the workshop convinced me that Scholarly Markdown has a bright future and that it is time to move more of my writing to markdown. At the end of the workshop each participant suggested a todo item that he/she would be working on, and my todo item was “Think about document type where MD shines”. Markdown might be good for writing scientific papers, but I think it really shines in shorter scientific documents that can easily be shared with others. And blog posts are a perfect fit.

The new site is work in progress. Over time I will copy over all old blog posts from PLOS Blogs, and will work on the layout as well as additional features. Special thanks to Carl Boettiger for helping me to get started with Jekyll and Github pages. registry of research data repositories launched

By Martin Fenner
Posted: June 1, 2013

Earlier this week – the Registry of Research Data Repositories –officially launched. The registry is nicely described in a preprint also published this week. offers researchers, funding organizations, libraries and publishers and overview of the heterogeneous research data repository landscape. Information icons help researchers to identify an adequate repository for the storage and reuse of their data.

The Shape of Science

The Shape of Science is a new graphical interface designed to access the bibliometric indicators database of the SCImago Journal & Country Rank portal (based on 2012 data).

Shape of Science - SCImage

The SCImago Journal & Country Rank is a portal that includes the journals and country scientific indicators developed from the information contained in the Scopus® database (Elsevier B.V.). These indicators can be used to assess and analyze scientific domains.

This platform takes its name from the SCImago Journal Rank (SJR) indicator, developed by SCImago from the widely known algorithmGoogle PageRank™. This indicator shows the visibility of the journals contained in the Scopus® database from 1996.

SCImago is a research group from the Consejo Superior de Investigaciones Científicas (CSIC), University of Granada, Extremadura, Carlos III (Madrid) and Alcalá de Henares, dedicated to information analysis, representation and retrieval by means of visualisation techniques.

As well as SJR Portal, SCImago has developed The Atlas of Science project, which proposes the creation of an information system whose major aim is to achieve a graphic representation of Ibero American Science Research. Such representation is conceived as a collection of interactive maps, allowing navigation functions throughout the semantic spaces formed by the maps.

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Sergey Brin and Lawrence Page

{sergey, page}

Computer Science Department, Stanford University, Stanford, CA 94305


In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at
To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine — the first such detailed public description we know of to date.
Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

 Keywords: World Wide Web, Search Engines, Information Retrieval, PageRank, Google

  1. Introduction

(Note: There are two versions of this paper — a longer full version and a shorter printed version. The full version is available on the web and the conference CD-ROM.)
The web creates new challenges for information retrieval. The amount of information on the web is growing rapidly, as well as the number of new users inexperienced in the art of web research. People are likely to surf the web using its link graph, often starting with high quality human maintained indices such as Yahoo! or with search engines. Human maintained lists cover popular topics effectively but are subjective, expensive to build and maintain, slow to improve, and cannot cover all esoteric topics. Automated search engines that rely on keyword matching usually return too many low quality matches. To make matters worse, some advertisers attempt to gain people’s attention by taking measures meant to mislead automated search engines. We have built a large-scale search engine which addresses many of the problems of existing systems. It makes especially heavy use of the additional structure present in hypertext to provide much higher quality search results. We chose our system name, Google, because it is a common spelling of googol, or 10100 and fits well with our goal of building very large-scale search engines.

1.1 Web Search Engines — Scaling Up: 1994 – 2000

Search engine technology has had to scale dramatically to keep up with the growth of the web. In 1994, one of the first web search engines, the World Wide Web Worm (WWWW) [McBryan 94] had an index of 110,000 web pages and web accessible documents. As of November, 1997, the top search engines claim to index from 2 million (WebCrawler) to 100 million web documents (from Search Engine Watch). It is foreseeable that by the year 2000, a comprehensive index of the Web will contain over a billion documents. At the same time, the number of queries search engines handle has grown incredibly too. In March and April 1994, the World Wide Web Worm received an average of about 1500 queries per day. In November 1997, Altavista claimed it handled roughly 20 million queries per day. With the increasing number of users on the web, and automated systems which query search engines, it is likely that top search engines will handle hundreds of millions of queries per day by the year 2000. The goal of our system is to address many of the problems, both in quality and scalability, introduced by scaling search engine technology to such extraordinary numbers.

  1. System Features

The Google search engine has two important features that help it produce high precision results. First, it makes use of the link structure of the Web to calculate a quality ranking for each web page. This ranking is called PageRank and is described in detail in [Page 98]. Second, Google utilizes link to improve search results.

2.1 PageRank: Bringing Order to the Web

The citation (link) graph of the web is an important resource that has largely gone unused in existing web search engines. We have created maps containing as many as 518 million of these hyperlinks, a significant sample of the total. These maps allow rapid calculation of a web page’s “PageRank”, an objective measure of its citation importance that corresponds well with people’s subjective idea of importance. Because of this correspondence, PageRank is an excellent way to prioritize the results of web keyword searches. For most popular subjects, a simple text matching search that is restricted to web page titles performs admirably when PageRank prioritizes the results (demo available at For the type of full text searches in the main Google system, PageRank also helps a great deal.

2.1.1 Description of PageRank Calculation

Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page’s importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows:

We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.

PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper.

2.1.2 Intuitive Justification

PageRank can be thought of as a model of user behavior. We assume there is a “random surfer” who is given a web page at random and keeps clicking on links, never hitting “back” but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the “random surfer” will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages. This allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking. We have several other extensions to PageRank, again see [Page 98].

Another intuitive justification is that a page can have a high PageRank if there are many pages that point to it, or if there are some pages that point to it and have a high PageRank. Intuitively, pages that are well cited from many places around the web are worth looking at. Also, pages that have perhaps only one citation from something like the Yahoo! homepage are also generally worth looking at. If a page was not high quality, or was a broken link, it is quite likely that Yahoo’s homepage would not link to it. PageRank handles both these cases and everything in between by recursively propagating weights through the link structure of the web.

In Science, It Matters That Women Come Last


People tell me that, as a female scientist, I need to stand up for myself if I want to succeed: Lean in, close the confidence gapfight for tenure. Being a woman in science means knowing that the odds are both against you being there in the first place and against you staying there. Some of this is due to bias; women are less likely to be hired by science faculty, to be chosen for mathematical tasks and to have their papers deemed high quality. But there are also other barriers to success. Female scientists spend more time rearing children and work at institutions with fewer resources.

One measure of how female scientists are faring is how many papers they write. Papers are the coin of academic science, like court victories to lawyers or hits to baseball players. A widely read paper could earn a scientist tenure or a grant. Papers map money, power and professional connections, and that means we can use them to map where female scientists are succeeding and where inequality prevails.

To this end, I downloaded and statistically analyzed 938,301 scientific papers from the arXiv, a website where physicists, mathematicians and other scientists often post their papers. I inferred the authors’ gender from their first names, using a names list of 40,000 international names classified by native speakers.1 Women’s representation on the arXiv has increased significantly over the 23 years my data set covers:


But I wanted to see not only many how papers women wrote, but also on how many they earned the coveted positions of first author (indicating the scientist primarily responsible for the paper) and last author (indicating the senior scientist who supervised the work). The news is both good and bad. When a female scientist writes a paper, she is more likely to be first author than the average author on that paper. But she is less likely to be last author, writes far fewer papers and is especially unlikely to publish papers on her own. Because she writes fewer papers, she ends up more isolated in the network of scientists, with additional consequences for her career.

The average male scientist authors 45 percent more papers than the average female scientist;2 he authors more than twice as many solo papers, on which he is the only author. (Solo papers can look particularly impressive because the scientist gets all the credit for the work.) Sixty times as many multi-­author papers with identifiable gender for all authors will have all male authors as all female authors; twice as many will have all male authors as any female author.3


As a consequence, women end up at the fringes of the scientific world. We can consider two scientists “connected” if they’ve collaborated on a paper, but even though women tend to work on papers with more authors, they have significantly fewer collaborators and are significantly less central to the overall community of people publishing scientific papers.4 This social isolation matters because of nepotism: Being friends with a scientist who reviews a paper, grant or job application can provide a crucial bonus.

One female scientist I spoke with suggested that women may appear on fewer papers because their contributions are often ignored. “Some men get added to papers even if their contribution was cosmetic, yet women who contributed ideas (and perhaps even writing or data) are left out,” said the woman, who blogs pseudonymously as Female Science Professor.

Maria Mateen, a friend of mine and a psychology researcher at Stanford, offered another explanation for why men write more papers: They are more likely to be “principal investigators” (PIs), senior researchers who run their own labs. In many fields, PIs get their names on papers by default, usually as last author, because they provide funding or resources for the scientists who do most of the work. When I identified PIs in my data set (scientists who were last authors on at least three papers with four or more authors), they were indeed less likely to be women: 12 percent of PIs were women, as opposed to 17 percent of scientists overall. And these PIs wrote far more papers and more first-author papers as well. But though this effect may partially explain the gender discrepancy in publication counts, it probably does not fully explain it: When we compare male PIs to female ones, or male non­-PIs to female non­-PIs, the men still have more papers.

Women might compensate for writing fewer papers by more frequently ending up as first author on the papers they do write. Of the 938,301 papers, 200,485 had multiple authors whose gender I could discern, and of these 56,765, or 28.3 percent, had at least one female author.5 Knowing that women are often less assertive and less inclined to negotiate, I expected to find that they would be pushed out of first authorship. But I found the opposite. After I discarded all papers with only a single author (for which it makes little sense to talk about first authorship) and all papers with authors listed in alphabetical order (to account for the fact that, in fields like mathematics where author order is alphabetical, being first author is no longer prestigious) I was left with 74,829 papers. Had male and female authors been equally likely to come first, there would be 9,683 papers with female-first authors; instead, there are 10,941 — 13 percent more than expected.6 (This difference, like all differences described, is statistically significant.)

But remember that another coveted position on a scientific paper is last author. This often indicates the senior scientist who supervised the work. In the arXiv data set, women are 13 percent less likely to be last authors, possibly because, as noted above, they are less likely to be principal investigators in both the arXiv data set and in previous analyses.


There’s a chance that women are overrepresented as first authors only because they’re underrepresented as last authors. To address this, I looked at all papers with three or more authors and compared how often women were first author to how often they were middle author, and how often they were last author to how often they were middle author. This prevents first authorship from affecting last authorship or vice versa. The results were largely in line with what I found for the entire set: Women were overrepresented in first author positions (relative to middle author) by 8.9 percent and underrepresented in last author positions (relative to middle author) by 10.5 percent.

Women are more likely to be first authors in fields in which they are better represented. A paper written in a field with more female authors is more likely to have a female first author, even when we control for how many authors on the paper are women.7 This effect is like one I observed when studying how women performed in online classes: The more women in a class, the higher the grades they earned relative to men.

This doesn’t necessarily mean that women are benefiting directly from interactions with other women, however. Perhaps the fields with the most women are somehow friendlier to women, making it easier for women to excel and end up as first author. On the other hand, I found evidence that women tend to work together. If a paper has one female author, the other authors on the paper are 35 percent more likely to be female given the share of female authors in the field overall.8 A different study found that female scientists tended to hire more women than male scientists did (and that the gap between whom elite male and female scientists hired was particularly large).

The arXiv data set goes back only 23 years and does not contain every paper in every field. Most papers on the arXiv are in math or physics, and some are in computer science, finance and biology. But there are no papers in the social sciences, and some scientists may not post papers on the arXiv.9 Still, my conclusions are consistent with previous analyses, which have found that female academics publish fewer papers and tend to publish with other women. (One study also found that women’s papers receive fewer citations, data I did not have for the arXiv.)

I also spoke to Jevin West, a professor at the University of Washington, who studies scholarly publication and conducted a similar analysis of gender and authorship using the JSTOR archives. JSTOR, which is not freely available, contains papers back to 1545 and also includes papers from the social sciences and humanities. West said he thought the arXiv contained a fairly comprehensive collection of papers in the fields it focuses on, and our analyses agreed on several points: Women published fewer papers in the JSTOR data set as well, and they were less likely to be last authors. Curiously, the overrepresentation of women in first-author positions may be specific to the hard sciences. Although women were more likely to be first authors in fields such as ecology and molecular biology in the JSTOR data set, they were not in law or sociology.

Once we’ve identified the gender gaps, the next step is to explain them. How much of women’s underrepresentation is due to bias and how much toother factors? While it’s clear that gender bias in science exists, it’s hard to prove merely by examining publication data (though some convincing cases have been made). Other studies have shown that female scientists spend more time on non-research activities, like child­-rearing and teaching, tend to work at institutions that emphasize teaching over research and are more likely to leave the workforce for family reasons. Social dynamics with male scientists may also affect female scientists detrimentally. Women also tend to cite themselves lessself­-promote lessnegotiate less and see smaller performance gains from competition. Wendy Cieslak, the former principal program director for nuclear weapons science and technology at Sandia National Laboratories, emphasized the importance of the confidence gap. “We often don’t recognize and accept that [it] is holding us back until much later in life, when we look back,” she said.

The intense competition in academic science, combined with the gender gap — and uncertainty about how much of that gap is due to bias — is enough to drive a female scientist a little crazy. Was I left off a paper because I’m not smart enough or because I’m female? Do I need to negotiate more forcefully to keep up with my male peers, or will doing so backfire? I once applied for a fellowship and was told, “While clearly a very smart student, applicant’s ‘confidence’ comes across as arrogance.” I wondered whether the reviewer would have written that had I been a man.

The data gives us two causes for hope. The first is that while we are far from gender parity in the sciences, we’re getting closer. As the first chart above shows, women are gaining ground in papers published and posted to the arXiv, and their representation has also increased in the JSTOR data set.

The second is that the rise of big data has made it far easier to study gender inequality. A recent Wharton School study showing that professors are less likely to respond to students who are women or minorities was made far easier by the ability to email 6,548 professors rather than enlist an army of stamp-­licking graduate students. Anyone with a computer can analyze a data set in an effort to find signs of inequality. I hope that the next time I look at the arXiv, I will find more of these analyses. And I hope that more of them will be by lone women.

chance of success - index

chance of success – index

function of previously authored PubMed papers

function of previously authored PubMed papers

function of previous 1st-authored papers in PubMed

function of previous 1st-authored papers in PubMed

Read Full Post »

Searchable Genome for Drug Development

Reporter: Aviva Lev-Ari, PhD, RN

The Druggable Genome Is Now Googleable

By Aaron Krol

November 22, 2013 | Relationships between human genetic variation and drug responses are being documented at an accelerating rate, and have become some of the most promising avenues of research for understanding the molecular pathways of diseases and pharmaceuticals alike. Drug-gene interactions are a cornerstone of personalized medicine, and learning about the drugs that mediate gene expression can point the way toward new therapeutics with more targeted effects, or novel disease targets for existing drugs. So it may seem surprising that, until October of this year, a researcher interested in pharmacogenetics generally needed the help of a dedicated bioinformatician just to access the known background on a gene’s drug associations.

Obi and Malachi Griffith are particularly dedicated bioinformaticians, who specialize in applying data analytics to cancer research, a rich field for drug-gene information. Like many professionals in their budding field, the Griffiths pursued doctoral research in bioinformatics applications at a time when this was not quite recognized as a distinct discipline, and quickly found their data-mining talents in hot demand. “We found ourselves answering the same questions over and over again,” says Malachi. “A clinician or researcher, who perhaps wasn’t a bioinformatician, would have a list of genes, and would ask, ‘Well, which of these genes are kinases? Which of these genes has a known drug or is potentially druggable?’ And we would spend time writing custom scripts and doing ad hocanalyses, and eventually decided that you really shouldn’t need a bioinformatics expert to answer this question for you.”

The Griffiths – identical twin brothers, though Malachi helpfully sports a beard – had by this time joined each other at one of the world’s premiere genomic research centers, the Genome Institute at Washington University in St. Louis, and figured they had the resources to improve this state of affairs. The Genome Institute is generously funded by the NIH and was a major contributor to the Human Genome Project; the Griffiths had congregated there deliberately after completing post-doctoral fellowships at the Lawrence Berkeley National Laboratory in California (Obi) and the Michael Smith Genome Sciences Centre in Vancouver (Malachi). “When we finished our PhDs, we knew we would like to set up a lab together,” says Obi. At the Genome Institute, they pitched the idea of building a free, searchable online database of drug-gene associations, and soon the Drug Gene Interaction Database (DGIdb) was under development.

In Search of the Druggable Genome

Existing public databases, like DrugBank, the Therapeutic Target Database, and PharmGKB, were the first ports of call, where a wealth of information was waiting to be re-aggregated in a searchable format. “For their use cases [these databases] are quite powerful,” says Obi. “They were just missing that final component, which is user accessibility for the non-informatics expert.” Getting all this data into DGIdb was and remains the most labor-intensive part of the project. At least two steps removed from the original sources establishing each interaction, the Griffiths felt they had to reexamine each data point, tracing it back to publication and scrutinizing its reliability. “It’s sort of become a rite of passage in our group,” says Malachi. “When new people join the lab, they have to really dig into this resource, learn what it’s all about, and then contribute some of their time toward manual curation.”

The website’s main innovation, however, is its user interface, which presents itself like Google but returns results a little more like a good medical records system. The homepage lets you enter a gene or panel of genes into a search box, and if desired, add a few basic filters. Entering search terms brings up a chart that quickly summarizes any known drug interactions, which can then be further filtered or tracked back to the original sources. The emphasis is not on a detailed breakdown of publications or molecular behavior, but on immediately viewing which drugs affect a given gene’s expression and how. “We did try to place quite a bit of emphasis on creating something that was intuitive and easy to use,” says Malachi. Beta testing involved watching unfamiliar users navigate the website and taking notes on how they interacted with the platform.

DGIdb went live in February of this year, followed by a publication in Nature Methods this October, and the database is now readily accessible at The code is open source and can be modified for any specific use case, using the Perl, Ruby, Shell, or Python programming languages, and the Genome Institute has also made available their internal API for users who want to run documents through the database automatically, or perform more sophisticated search functions. User response will be key to sustaining and expanding the project, and the Griffiths are looking forward to an update that draws on outside researchers’ knowledge. “A lot of this information [on drug-gene interactions] really resides in the minds of experts,” says Malachi, “and isn’t in a form that we can easily aggregate it from… We’re really motivated to have a crowdsourcing element, so that we can start to harness all of that information.” In the meantime, the bright orange “Feedback” button on every page of the site is being bombarded with requests to add specific interactions to the database.

Not all these interactions are easy to validate. “Another area that we’re really actively trying to pursue,” adds Malachi, “is getting information out of sources where text mining is required, where information is really not in a form where the interaction between genes and drugs is laid out quickly.” He cites the example of, where the results of all registered clinical trials in the United States are made available online. This surely includes untapped material on drug-gene interactions, but nowhere are those results neatly summarized. “You either have a huge manual curation problem on your hands – there’s literally hundreds of thousands of clinical trial records – or you have to come up with some kind of machine learning, text-mining approach.” So far, the Genome Institute has been limited to manual curation for this kind of scenario, but with a resource as large as the clinical trials registry, the Griffiths hope to bring their programming savvy to bear on a more efficient attack.

In the meantime, new resources are continuously being brought into the database, rising from eleven data sources on launch to sixteen now, with more in the curation pipeline. DGIdb is already regularly incorporated in the Genome Institute’s research. Every cancer patient sequenced at Washington University has her genetic data run first through an analytics pipeline to find genes with unusual variants or levels of expression, and then through DGIdb to see whether any of these genes are known to be druggable. This is an ideal use case for the database, which is presently biased toward cancer-related interactions, the Griffiths’ own area of research.

The twins have a personal investment in advancing cancer therapeutics. Their mother died in her forties from an aggressive case of breast cancer, while Obi and Malachi were still in high school, and their family has continued to suffer disproportionately from cancer ever since. Says Obi, “We’ve had the opportunity to see [everything from] terrible, tragic outcomes… to the other end of the spectrum, where advances in the way cancer is treated were able to really make a huge difference to both our cousin and our brother,” both in remission after life-threatening cases of childhood leukemia and Ewing’s sarcoma, respectively. “Everyone can tell these stories,” Malachi adds, “but we’ve had a little more than our fair share.”

DGIdb can’t influence cancer care directly – most of the data available on drug-gene interactions is too tentative for clinical use – but it can spur research into more personalized treatments for genetically distinct cancers, and increasingly for other diseases as more information is brought inside. Meanwhile, companies like Foundation Medicine and MolecularHealth are drawing on similar drug-gene datasets, narrowed down to the most actionable information, to tailor clinical action to individual cancer patients. The Griffiths are cautiously optimistic that research like the Genome Institute’s is approaching the crucial tipping point where finely tuned clinical decisions could be made based on a patient’s genetic profile. “We’re still firmly on the academic research side,” says Malachi, but “we’re definitely at the stage where this idea needs to be pursued aggressively.”


Read Full Post »

Google Glass in the Medical Field

Reporter: Aviva Lev-Ari, PhD, RN


Technology and medicine: Applying Google Glass in the medical field


Rosemary Sparacio
Friday, November 15, 2013
Every day, new strides in technology make headlines in all kinds of areas. Nowhere is it is more prevalent or exciting than in the medical field. And one of the most talked about new tech “gadgets” to come onto the scene and into the consciousness of just about everyone who follows the news is Google Glass.

The last few months have seen story after story about Goggle Glass being used by physicians. But as far back as a year ago, when Pelu Tran, a third-year medical student at Stanford, and Ian Shakil, a consultant at a West Coast start-up, saw and tried out Google Glass, they realized that the implications in medicine alone would be compelling. So much so that they founded a startup exclusively to investigate the use of Glass for medicine.

Basically, Google Glass is a small hands-free computer, head-mounted as a small glass block, on a conventional glass frame, that can have Wi-Fi, Bluetooth and a camera and voice activation. Proponents see the potential for the device’s use over a wide range of medical applications, from cutting down the time a physician has to do paperwork — thus giving the physician more time to focus on the patient’s problem — to assisting in surgery.

Dr. Pierre Theodore, M.D., was the first surgeon to receive permission to utilize Google Glass as an auxiliary surgical tool, while performing thoracic surgery in October. He was able to preload his patient’s information, like CT scans and X-rays, so that access to it would be right there on the “screen,” and he would not have to turn away during the surgery.

Theodore described it as similar to looking at the rearview mirror of your car: “That rearview is always there when I need it, but it’s not there when I don’t.”


Using Google Glass to consult with a distant colleague, Dr. Christopher Kaeding, a surgeon at the Ohio State University Medical Center, streamed a live, point-of-view video while he operated to repair a torn ACL in August. At the same time across town, medical students at OSU College of Medicine were able to view the surgery real-time.

One of Kaeding’s colleagues watched the surgery sitting in his office. According to Kaeding: “Once we got into the surgery, I often forgot the device was there. It just seemed very intuitive and fit seamlessly.”

In early 2012, a surgical team at the University of Alabama at Birmingham (UAB) performed the first surgery using a technology called VIPAAR partnered with Google Glass. VIPAAR, which stands for virtual interactive presence in augmented reality, is a technology developed by UAB in 2003, which provides real-time, two-way interactive video conferencing to enhance surgery.

In this surgery, UAB orthopedic surgeon Brent Ponce, M.D., performed shoulder replacement surgery using Google Glass during the operation. At the same time, Dr. P. Danturuli, a surgeon sitting in his Atlanta office, was interacting with Ponce. The built-in camera in Google Glass transmitted the image of the surgical field to Atlanta. VIPAAR allowed Danturuli to introduce his hands into the virtual operating room. At the same time, Ponce saw Danturuli’s hands as a ghostly image in his heads-up display.

It’s real-time, real life, right there, as opposed to a Skype or video conference call which allows for dialogue back and forth, but is not really interactive,” Ponce said.

This technology can revolutionize telemedicine. What used to be a telephone call between two physicians, now has the potential for a small regional hospital to get hands and instruments into the field of a surgeon who has the skill but has perhaps performed the surgery only a few times. Obviously, adjustments will need to be made to fine tune VIPAAR and Google Glass.

And, of course, the potential goes beyond medicine. Imagine having the potential to connect to an expert in any field and have that expert be able to reach in to show you how to solve a problem.

Right now, Google Glass is in the hands of only about 10,000 people, (1,000 only in medical fields) using and experimenting with the technology. The thinking is that in less than five years, this kind of innovation has the potential to improve many fields and afford greater teaching opportunities through better high-tech access to information.

Share this article

About the Author

Rosemary Sparacio

Rosemary Sparacio is a freelance medical and technical writer, and she substitute teaches in her current home in South Carolina. Rosemary has always been involved in healthcare and education, starting out in the lab as a med tech and in R&D. Her career lead her to teaching microbiology at a community college, while working in the pharmaceutical industry for Pfizer.



Read Full Post »

Business Intelligence Application for Pharmaceutical and Biotech Professionals

Submitted by

Dr Stephen Breslin

Chief Executive | Glasgow Science Centre

50 Pacific Quay | Glasgow | G51 1EA

Download the FREE Software Application 

The Sophie Pharma & Biotech App is a powerful personal business and technology intelligence tool to increase your productivity.  Sophie will work on your Ipad, Iphone, Android tablet or smartphone.

You can download it directly from the Apple and Google store. For more information visit <;urlhash=kBVe&amp;_t=mbox_mebc>

Read Full Post »

Older Posts »