As promised, this is a write-up of the Group mash project I participated in at Mashed Library 2009, which won team members Amazon vouchers – thanks UKOLN!
Something I learned from last year’s Mashed Library event was that if you have an idea, tell people about it before you forget what it was and you may well get help to bring it to life. So this was my tweet (with corrected typo!) that started the project:
#mashlib09 I want a tool that takes in feed of reading list and suggests new items to add to reading list. Any takers?
2:07 PM Jul 7th
I expanded on my tweet to the group. Our college has externally validated degrees, and this year its courses were revalidated by a new university. The courses passed, but one comment the validating body made was that the reading lists were a bit out of date and could do with revision (a story I’m guessing is familiar to many). So, it’s looking likely that next term I’ll give a half-hour training session to academic staff on revising reading lists, covering searching relevant literature indexes, current awareness etc, as we particularly want to encourage the use of our journals. I thought it would be good to give them a starting point by analysing each course’s existing reading list and using the content to generate suggested searches.
I found it interesting to see what Owen did once we’d discussed the absolute basics – he started mapping out on paper the different stages required to make the tool, then we discussed what web services we could use to obtain/wrangle the data for each part, then divided up the workload between us.
- Input = reading list data – I knew I could get this in a rudimentary RSS format from my library’s catalogue, but hadn’t yet done the custom work required. However I had set up feeds for new items, including one for new books, so we chose to use this as a proof of concept.
- Output = We scaled this back – originally a list of results from carrying out suggested searches, we decided instead to provide a list of suggested searches which the user can click on to execute.
- Process = the tricky bit! We decided Yahoo Pipes would allow us to manipulate the data. We needed to take in the RSS reading list, extract the subject terms from each item, then count up the most common subject terms and output those. We’d then use those to build search URLs for an online resource, in this case RILM, which is an excellent service abstracting musical literature.
I’ve published the two Pipes that make up the tool: Extract keywords from JL book feed and Create RILM search from keywords (which uses the keywords pipe). Virtually all of the first pipe was done on the day plus the groundwork for the second pipe (figuring out the syntax of a RILM search URL). Afterwards I joined everything up and made a few tweaks, including ditching a plan for searching different combinations of search terms.
A summary of the steps involved (you can ‘view source’ to see the modules in each pipe – Yahoo account required):
- Keywords pipe: Fetch feed from inputed URLs
- Use a regular expression to extract the subject
- Filter non-unique items, which adds a count of how many times each subject appears
- Sort descending by this count, then trim off everything but the top 3 subjects
- Strip out everything in the feed except the subject terms, and make them into a new RSS feed
- RILM pipe: Take in the output of the keywords pipe, then do a bit of data tidying – add ‘Search RILM for’ to the feed title, and set up the subjects for the search URL by inserting Boolean ANDs etc.
- Build the URL using the cleaned search terms, and add it as a link to the RSS feed
- Output – three-item RSS feed with the title ‘Search RILM for x’, linking to the search URL.
To give it a go, go to Create RILM search from keywords, copy a reading list RSS feed from my library catalogue’s RSS page and paste it into the ‘Enter RSS url’ box then submit (or just use the example feed provided), and hopefully you’ll get three suggested searches back. If you have IP/Athens/Shib access to RILM via Ebsco, feel free to click through and see how the search results look! Known bugs include that some of our search options don’t seem to be working via permalink, e.g. sorting by date descending and limiting to published from 2000 onwards. There’s likely to be some rather dud searches too, but I hope this can be resolved by pointing out these are only suggested starting searches and will need tweaking.
Although lots of the specifics are tailored to my needs, e.g. the regular expression for extracting the subjects and the data-cleaning/URL building, and the pipe I made myself is a bit clunky and could do with streamlining, these pipes could be quite easily adapted for other institutions or indeed different purposes, so please feel free to clone the pipes and tinker away!