You are browsing the archive for IRStats.

EPrints irstats versus Google Analytics

6:09 pm in Uncategorized by Bernard Scaife

I am being asked about statistics which is never a good thing. We have both the traditional IRstats package running and are using Google Analytics on our repository. However, they appear to give different results and we’d like to know why so we can get a better understanding of what is being counted.

GA has been tracking pdf downloads using their asynchronous code  from at least 1 Oct 2011. In GA, we downloaded a list of all pdfs downloaded between Oct 1 2011 and 18 Dec 2011.In irstats, I downloaded monthly download counts of papers for Oct 1 2011 and 18 Dec 2011.

Starting with eprints top 10

EPrints monthlydownload counts (screenshot) shows the top 10 papers for the period

And alongside them, here are the equivalent downloads for GA for the top 3

Eprints id Eprints downloads GA downloads GA (all)
58 (top paper) 2130 9 155
2819 900 1 4
2050 880 13 21

Now the top downloads for GA top 10

Eprints id Eprints downloads GA downloads GA (all)
1134 (top paper) 0 77 292
6091 1 75 271
6587 3 30 34

So there is no direct correlation between the two systems. Presumably, Irstats should be the more comprehensive as it has access to the server-side for tracking requests for pdfs that are downloaded direct from a search engine result list. GA cannot do this as far as I can see, although in many case, users will click a full eprints record first and *then* download (which will be recorded). However, why do we have situations where GA records more pdfs being downloaded than EPrints? All comments/theories welcome.

Statistically relevant

11:47 am in Ars Technica by Rory McNicholl

Over the last year or so we’ve installed and configured (in some cases reconfigured) the IRStats package for several of the LEAP repositories, including those hosted by ULCC. It seemed a good moment to share a few thoughts about the process of getting “all statted up” with EPrints.

By default, and without any further action, IRStats provides a kind of smorgasbord control panel, demonstrating the many optional graphs, charts and list available. You can see an example on our own ULCC Publications repository.

More recently we’ve seen growing demand among repository managers to share data on downloads with both their depositors and users at large. It’s really important for repository managers to select carefully which statistics views they actually want or need to display – we can only suggest things we think might work. Once you’ve decided on the views you want, we can look at the most effective ways to display them: and this is why I’ve been having fun souping up some of the displays already offered by IRstats.

The first display we’ve been working on is the Statistics digest. These are common enough and we’ve used the example of UCL Discovery repository as the basis of work for both SAS-Space and SOAS institutional repository.

The second approach has been to re-style the IRstats “dashboard” view to lay the graphs on top of each other and then use some Javascript to handle the tabbed navigation. This seemed a more elegant approach than inserting lots of charts in the abstract page itself (as, for example, at ECS EPrints). I’ve used this display technique to display statistics for individual eprints for the School of Pharmacy, as well as SAS and SOAS.

IRStats on School of Pharmacy EPrints
The tabbed display of graphs and tables was also combined with a ‘modal box’ display that keeps the height of page the same (for example on this Abstract page at SOAS. At the bottom of the Abstract page I’ve added a statistics section showing the number full-text downloads, and a link that displays detailed stats in an overlaid box.

This method doesn’t just work for individual items, but can be used on other datasets in too. For example, on SAS-Space we have added it to the bottom of their Collection browse pages, so that at the bottom of each Collection view there is an opportunity to view download statistics for that collection as a whole.

Additionally in SAS-Space, since it is a repository for a number of discrete institutes, there was a requirement for institutional editors to have access to their own institute’s statistics. To achieve this, I allowed access to a constrained version of the IRStats control panel for editor-users who had the appropriate editorial permissions for the institute in question. (Unless you are a SAS-Space editor, you won’t be able to access this.)

Which statistics views to insert as tabs is the decision of the repository manager. Views we’ve used include:

  • Monthly downloads
  • Daily downloads
  • Unique visitors
  • Referrers
  • Search Engines
  • Top 10 items downloaded (only for a Collection, Repository or Division)
  • Top 10 search terms

From a technical point-of-view, we will have to review these configurations when we upgrade to EPrints version 3.3, possibly later in the year (if it’s released!!), in conjunction with our VM infrastructure migration, and start doing things with EPStats rather than IRStats. But we now have an effective framework for adding statistics quickly to any EPrints installation.