EPrints irstats versus Google Analytics

January 12, 2012 in Uncategorized by Bernard Scaife

I am being asked about statistics which is never a good thing. We have both the traditional IRstats package running and are using Google Analytics on our repository. However, they appear to give different results and we’d like to know why so we can get a better understanding of what is being counted.

GA has been tracking pdf downloads using their asynchronous code  from at least 1 Oct 2011. In GA, we downloaded a list of all pdfs downloaded between Oct 1 2011 and 18 Dec 2011.In irstats, I downloaded monthly download counts of papers for Oct 1 2011 and 18 Dec 2011.

Starting with eprints top 10

EPrints monthlydownload counts (screenshot) shows the top 10 papers for the period

And alongside them, here are the equivalent downloads for GA for the top 3

Eprints id Eprints downloads GA downloads GA (all)
58 (top paper) 2130 9 155
2819 900 1 4
2050 880 13 21

Now the top downloads for GA top 10

Eprints id Eprints downloads GA downloads GA (all)
1134 (top paper) 0 77 292
6091 1 75 271
6587 3 30 34

So there is no direct correlation between the two systems. Presumably, Irstats should be the more comprehensive as it has access to the server-side for tracking requests for pdfs that are downloaded direct from a search engine result list. GA cannot do this as far as I can see, although in many case, users will click a full eprints record first and *then* download (which will be recorded). However, why do we have situations where GA records more pdfs being downloaded than EPrints? All comments/theories welcome.

Queen Mary Research Online: launched at last!

July 4, 2011 in Uncategorized by Sarah Molloy

It’s been a long time coming, and I have to admit that sometimes I didn’t think we’d get there; but at last Queen Mary (UoL) finally has a publicly accessible institutional repository, Queen Mary Research Online.  Hurrah!

Our collections are somewhat modest so far (but 100% full content), but now that I have something to demonstrate, I am hopeful that things will start to appear.  We currently have approximately 1000 items somewhere in the repository (some embargoed, some in the workflow, some openly available; including eTheses), not bad for a year’s graft.

Not one for resting on my laurels, I am now investigating ways to improve the searching, browsing and more ways to reuse the data/linking.  We might also make a few modest tweaks to the interface on the admin side so that managing items is easier.  If you’re a DSpacer and have made improvements to the workflow or reporting, please do tell!

The curious case of duplicate links in the Collection index

June 20, 2011 in Uncategorized by Sarah Molloy

I thought I’d post an update on the state of play with QMRO.

We’re currently working on a bug that is causing items to be assigned to the same Collection in DSpace twice.  The result is that a duplicate link appears in the Collections index for the item pointing to the same URL.  Not only is it very untidy (I’m a librarian, these things should be neat and tidy) it also makes browsing to see what we have; which should still be possible given our number of records, quite impossible.  Items can legitimately belong to more than one collection in DSpace, but they are not supposed to be able to belong to the same collection more than once.

Needless to say, our IT Services people are working on it, but it is causing us to delay public launch of the service still and this is becoming a source of frustration, not just for me but for our senior executive who are very keen to see it go live.

Any thoughts people may have very welcome!

In the meantime, I am working on my academic colleagues; encouraging them to put content in and to start thinking about Open Access more widely.  So far, they seem to be quite receptive, although this might be so that I’ll stop banging on about it and leave them in peace!

Thoughts, useful contacts and general moral support all gratefully received

SarahM, Queen Mary

London conference on OA publishing in the arts and humanities

May 25, 2011 in Uncategorized by Peter Webster

The School of Advanced Study is staging a one-day conference on this topic, in London, on Friday July 15th.
This symposium brings together academics, journal editors, publishers, librarians, funding bodies and repository practitioners to consider issues of particular concern in the arts and humanities. It will examine the economic and public policy aspects of humanities OA, as well as the different modes in which OA is currently delivered for scholars in the humanities.

Confirmed speakers include Professor Shearer West (AHRC), Neil Jacobs (JISC), Dr Paul Ayris (Director of Library Services, UCL) and Tessa Harvey (Wiley-Blackwell).

There will also be presentations on OA journals produced by commercial publishers and on campus, and from specialist humanities repositories, including SAS-Space.

The conference is free to attend, with lunch provided. For further details, and to reserve a place, please contact Dr Peter Webster (Peter.Webster@sas.ac.uk)

A provisional programme is available at: http://sas-space.blogspot.com/2011/05/conference-open-access-publishing-in.html

RSP Survey & Cerify

May 13, 2011 in CRIS by Richard Davis

Two bits of news just passed through my Tweetdeck that I thought worth sharing, in case any of you missed them.

  • The JISC Repositories Support Project is surveying the UK research repository scene. and has an online questionnaire aimed at repository managers. You can find out more about it on the RSP Blog (and maybe even win an Ipod shuffle if you enter before July 31st). The results should make interesting reading as a snapshot of the repository landscape, and no doubt help us all determine what we are doing well, and what we could do better.
  • For anyone with a current interest in Research Information Management systems, the work of UKOLN’s Cerify project looks to be particularly valuable. A closed “data surgery” event for the Cerify project partners is being held next week; but it looks like there will be public outputs, and, sooner or later, practical advice on the process of implementing the CERIF research information management standard. (The JISC RIM programme is managed by our good friend and former LEAP colleague Josh Brown.)

Statistically relevant

April 27, 2011 in Ars Technica by Rory McNicholl

Over the last year or so we’ve installed and configured (in some cases reconfigured) the IRStats package for several of the LEAP repositories, including those hosted by ULCC. It seemed a good moment to share a few thoughts about the process of getting “all statted up” with EPrints.

By default, and without any further action, IRStats provides a kind of smorgasbord control panel, demonstrating the many optional graphs, charts and list available. You can see an example on our own ULCC Publications repository.

More recently we’ve seen growing demand among repository managers to share data on downloads with both their depositors and users at large. It’s really important for repository managers to select carefully which statistics views they actually want or need to display – we can only suggest things we think might work. Once you’ve decided on the views you want, we can look at the most effective ways to display them: and this is why I’ve been having fun souping up some of the displays already offered by IRstats.

The first display we’ve been working on is the Statistics digest. These are common enough and we’ve used the example of UCL Discovery repository as the basis of work for both SAS-Space and SOAS institutional repository.

The second approach has been to re-style the IRstats “dashboard” view to lay the graphs on top of each other and then use some Javascript to handle the tabbed navigation. This seemed a more elegant approach than inserting lots of charts in the abstract page itself (as, for example, at ECS EPrints). I’ve used this display technique to display statistics for individual eprints for the School of Pharmacy, as well as SAS and SOAS.

IRStats on School of Pharmacy EPrints
The tabbed display of graphs and tables was also combined with a ‘modal box’ display that keeps the height of page the same (for example on this Abstract page at SOAS. At the bottom of the Abstract page I’ve added a statistics section showing the number full-text downloads, and a link that displays detailed stats in an overlaid box.

This method doesn’t just work for individual items, but can be used on other datasets in too. For example, on SAS-Space we have added it to the bottom of their Collection browse pages, so that at the bottom of each Collection view there is an opportunity to view download statistics for that collection as a whole.

Additionally in SAS-Space, since it is a repository for a number of discrete institutes, there was a requirement for institutional editors to have access to their own institute’s statistics. To achieve this, I allowed access to a constrained version of the IRStats control panel for editor-users who had the appropriate editorial permissions for the institute in question. (Unless you are a SAS-Space editor, you won’t be able to access this.)

Which statistics views to insert as tabs is the decision of the repository manager. Views we’ve used include:

  • Monthly downloads
  • Daily downloads
  • Unique visitors
  • Referrers
  • Search Engines
  • Top 10 items downloaded (only for a Collection, Repository or Division)
  • Top 10 search terms

From a technical point-of-view, we will have to review these configurations when we upgrade to EPrints version 3.3, possibly later in the year (if it’s released!!), in conjunction with our VM infrastructure migration, and start doing things with EPStats rather than IRStats. But we now have an effective framework for adding statistics quickly to any EPrints installation.

Importing to eprints from Sirsidynix Symphony LMS

March 14, 2011 in Uncategorized by Bernard Scaife

We need to do this for our new DERA repository because a lot of the records of e-only official publications – complete with
their often broken links :) are already catalogued there. I can export the records as marc exchange from the system, used the fantastic MarcEdit tool to convert to MARC21XML and then use a custom php script to convert this to EP3XML for importing. I’ve done this successfully in our “traditional” eprints repository. For example, to import theses records where we now have some digitised abstracts to link in.

It all works fairly smoothly although there is bound to be a simpler way. Would be interested to know what do others do.

Handy Hints: MIME-Types

March 11, 2011 in Ars Technica by Rory McNicholl

Some repositories have reported issues with Microsoft “DOCX” files, which IE8 in particular may treat as a ZIP file. This is a potential problem with all the current slew of MS file types. The solution is to add the following entries to your web server configuration.

Extension MIME Type
.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.xltx application/vnd.openxmlformats-officedocument.spreadsheetml.template
.potx application/vnd.openxmlformats-officedocument.presentationml.template
.ppsx application/vnd.openxmlformats-officedocument.presentationml.slideshow
.pptx application/vnd.openxmlformats-officedocument.presentationml.presentation
.sldx application/vnd.openxmlformats-officedocument.presentationml.slide
.docx application/vnd.openxmlformats-officedocument.wordprocessingml.document
.dotx application/vnd.openxmlformats-officedocument.wordprocessingml.template
.xlam application/vnd.ms-excel.addin.macroEnabled.12
.xlsb application/vnd.ms-excel.sheet.binary.macroEnabled.12

Exactly how you (or more likely your system manager) achieve this depends on your Web platform (e.g. Apache, Tomcat, IIS) but whoever runs it should be able to make the necessary changes, and once the Web server is restarted, the new types should be picked up. (We’ve just done this for the ULCC-hosted repositories.)

MIME-Types” have a long and chequered history as a way of identifying file types to internet applications. To some extent IE8 is correct to infer (in the absence of better information from the Web server) that .docx files are ZIP files, because MS Office Open XML formats are bundled using the ZIP compression tool. But in general what one really wants the browser to do is pass the file to an Office application, not WinZip.

Ironically, it seems other browsers do correctly infer MS OOXML file types.

Handy Hints: Using EPrints export plugins

March 8, 2011 in Ars Technica by Richard Davis

For dynamic integration with other systems, EPrints 3.x has a wide range of available export formats for metadata and data.

Depending on your templates and configuration, the many export options may or may not be explicitly shown. If they are, they are usually on the Browse View pages of your repository.

Even if they are not shown, we can often work out what their URLs might be.

For example in a typical View page from the SAS-Space repository, we find the dropdown box supporting options to export in Reference Manager format and EndNote format, among others. Try out any that are of interest and make a note of the URL it generates.

Compare also with the URL of the RSS feed, on the same page, and you will see that this is also part of the same suite of functions.

We can also do a similar thing with Latest Tool across the whole repository:

So if you have another library or CRIS system that needs dynamic interaction with an EPrints repository, find out if any of the existing export filters (as per the dropdown list on the View pages) might meet their needs. If the defaults filters aren’t enough, there are more available on the EPrints Files repository that your tech team can install, or you can develop new custom exporters that can be used in exactly the same way.

Buzzwords in the ‘Repository Community’

February 16, 2011 in Uncategorized by Sarah Molloy

I’ve always been fascinated how certain ideas will spread through a community (somewhat like a disease, but let’s put that to one side for now).  Having recently attended the RSP Winter School at Armathwaite Hall (beautiful venue as usual), I notice the current buzz word is ‘embedding’.

Whilst I realise that this is not a new idea, there have after all been at least two previous JISC funded projects (EMBED¹ and Embedding repositories²) with it in the title, this is the first time I’ve met people who are attempting to achieve it in real life.

How do you ‘embed’ a system or process into someone else’s work pattern?  How do you turn your system into something that is part of their routine?  Having not attempted this yet, I’d be interested to hear what others think, and what you’re doing to achieve this.  At the moment, I am envisaging brain washing techniques so if you’re beginning to see some successes from your efforts, please do share!

¹The Embed project wiki http://cclibweb-2.dmz.cranfield.ac.uk/embed/index.php/Embed_Wiki

²Linton, H (2008). Embedding Repositories in Research Management Systems and Processes [online].http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2007/reposrmsystems.aspx [accessed 02/08/2010]