EPrints irstats versus Google Analytics

January 12, 2012 in Uncategorized by Bernard Scaife

I am being asked about statistics which is never a good thing. We have both the traditional IRstats package running and are using Google Analytics on our repository. However, they appear to give different results and we’d like to know why so we can get a better understanding of what is being counted.

GA has been tracking pdf downloads using their asynchronous code  from at least 1 Oct 2011. In GA, we downloaded a list of all pdfs downloaded between Oct 1 2011 and 18 Dec 2011.In irstats, I downloaded monthly download counts of papers for Oct 1 2011 and 18 Dec 2011.

Starting with eprints top 10

EPrints monthlydownload counts (screenshot) shows the top 10 papers for the period

And alongside them, here are the equivalent downloads for GA for the top 3

Eprints id Eprints downloads GA downloads GA (all)
58 (top paper) 2130 9 155
2819 900 1 4
2050 880 13 21

Now the top downloads for GA top 10

Eprints id Eprints downloads GA downloads GA (all)
1134 (top paper) 0 77 292
6091 1 75 271
6587 3 30 34

So there is no direct correlation between the two systems. Presumably, Irstats should be the more comprehensive as it has access to the server-side for tracking requests for pdfs that are downloaded direct from a search engine result list. GA cannot do this as far as I can see, although in many case, users will click a full eprints record first and *then* download (which will be recorded). However, why do we have situations where GA records more pdfs being downloaded than EPrints? All comments/theories welcome.