EPrints irstats versus Google Analytics
January 12, 2012 in Uncategorized by Bernard Scaife
I am being asked about statistics which is never a good thing. We have both the traditional IRstats package running and are using Google Analytics on our repository. However, they appear to give different results and we’d like to know why so we can get a better understanding of what is being counted.
GA has been tracking pdf downloads using their asynchronous code from at least 1 Oct 2011. In GA, we downloaded a list of all pdfs downloaded between Oct 1 2011 and 18 Dec 2011.In irstats, I downloaded monthly download counts of papers for Oct 1 2011 and 18 Dec 2011.
Starting with eprints top 10
EPrints monthlydownload counts (screenshot) shows the top 10 papers for the period
And alongside them, here are the equivalent downloads for GA for the top 3
| Eprints id | Eprints downloads | GA downloads | GA (all) |
| 58 (top paper) | 2130 | 9 | 155 |
| 2819 | 900 | 1 | 4 |
| 2050 | 880 | 13 | 21 |
Now the top downloads for GA top 10
| Eprints id | Eprints downloads | GA downloads | GA (all) |
| 1134 (top paper) | 0 | 77 | 292 |
| 6091 | 1 | 75 | 271 |
| 6587 | 3 | 30 | 34 |
So there is no direct correlation between the two systems. Presumably, Irstats should be the more comprehensive as it has access to the server-side for tracking requests for pdfs that are downloaded direct from a search engine result list. GA cannot do this as far as I can see, although in many case, users will click a full eprints record first and *then* download (which will be recorded). However, why do we have situations where GA records more pdfs being downloaded than EPrints? All comments/theories welcome.

