Code4lib 2006 Submissions

24 Jul 2007

The proposals as we get them.

Walking with Geeks: Real Life Misuses of XML 6 Dec 2005

Walking with Geeks: Real Life Misuses of XML
XML and related technologies are much talked about but little understood by many lowly library practitioners such as ourselves. But this has not stopped us from using and misusing them to meet mundane user needs such as creating dynamic web pages without a live database, feeding content to the campus portal, performing data conversions and extending the capabilities of our OpenURL resolver. This presentation will describe 3-4 simple uses of XML, XSLT and some basic scripting to provide content to our users.

Dianne Cmor and Thomas Hodge
Distributed eLibrary
Weill Cornell Medical College in Qatar

Hacking at Voyager with Python 6 Dec 2005

Name: Don Bright

Endeavor’s Voyager provides access to the bibliographic database via Microsoft Access routed through Oracle ODBC drivers on MS Windows. This presents barriers for installation and for programming. An alternative exists. It involves a python library that accesses Voyager’s database via Jpype and an Oracle JDBC driver. This makes possible single .exe file installation and programming with free tools. Useful programs developed include a barcode lookup helper and a local database replicator for faster SQL experimentation.

Firefox Search Plugins: Searching Your Library in the Browser 6 Dec 2005

(Michael Sauers, BCR’s Internet Trainer)
The Firefox browser has a built-in search bar allowing users to search such databases as Google, and Wikipedia. What many don’t realize is that you can create customized searches that can be added to Firefox. This session will walk you through the creation of a search plugin that, once installed, will allow your patrons to search your OPAC, or other open search engine, from within Firefox without having to access the library’s site first.

Michael Sauers, Librarian/Trainer/Writer

Aurora, CO

XML Framework 6 Dec 2005

I would mainly like to come to the conference assuming my prof dev request goes through. However, I thought I might propose a talk since I am in the planning/design stages of a fairly large and somewhat innovative project.

I am currently designing out what I am calling internally as an “XML Framework” for our library’s primary access system to our entire collection of resources. The goal of this XML Framework is to contain our entire catalog which contains our journals and books as well as a collection for our digital library and our digital repository and possibly even our library’s website. The system will be built from the eXist XML database which structures “collections” in a heirarchical system much like a filesystem. The collections have permissions just like a unix filesystem that allows the control for management, etc. The book and journal records would be generated in MARCXML format from our library catalog and our digital library and digital repository content would be stored using the METS XML format. With everything in a single “XML Framework” this allows incredibly easy access following the google style of search where you simply type in your keywords and then decide what it i s you are looking for instead of determining you want to search for a book first of an article and using the appropriate search method.

I would like to describe my findings and how I plan to implement this system.

I am also going to implement the Metalib XServer into this plan as we currently have just launched our pilot application using the XServer. I would also be willing to discuss my development efforts in this area as well.

Stacks on the Tracks…More efficient library application development with Ruby on Rails 6 Dec 2005

Ruby on Rails is gaining ground in commercial web software development, but this web development framework is ideal for many of the in-house projects library software developers often find themselves struggling with as well. By simplifying many of the low-level functions and processes required in most web application development, Rails allows developers to spend a maximum amount of time working with users to determine, and deliver, high-level functionalities and features that help them be more efficient as well. In turn, the consistent Model-View-Controller application architecture and, indeed, the Ruby language itself helps lone developers on tight schedules produce, refactor, and modify their application code much more quickly and efficiently than is often the case with other popular scripting languages.

Chris Stearns
Software Development
Auburn University Libraries


Library Text Mining 6 Dec 2005

Using the TeraGrid1 and the SRB DataGrid2, we have sufficient
computational and storage facilities to run normally prohibitively
expensive processing tasks. By integrating text and data mining
tools3[4] within the Cheshire35 information architecture, we can
parse the natural language present in 20 million MARC records (the
University of California’s MELVYL collection) and extract information to
provide to search/retrieve applications. In this talk, we’ll discuss
the results of applying new techniques to ‘old’ data.


[May or may not be able to attend, but there’s a proposal :)]

Rob Sanderson, (

WikiD 7 Dec 2005

Ward Cunningham describes a wiki as “the simplest online database that
could possibly work”. The cost of this simplicity is that wikis are
generally limited to a single collection containing a single kind of
record (viz. WikiMarkupLanguage records). WikiD extends the Wiki model
to support multiple WikiCollections containing arbitrary schemas of XML
records with minimal additional complexity. Furthermore, displays and
services can be customized on a per-collection basis.

Project site:
Demo site:


P.S. This sounds like a great conference!

Jeffrey A. Young
Software Architect
Office of Research, Mail Code 710
OCLC Online Computer Library Center, Inc.
6565 Frantz Rd.
Dublin, OH 43017-3395

Voice: 614-764-4342
Voice: 800-848-5878, ext. 4342
Fax: 614-718-7477

EIMS and XML 30 Dec 2005

Proposal: EIMS ( is a web-accessible EPA catalog of projects and products. We have been accepting XML documents from a variety of sources into our relational database. Since our database schema is fixed, but the XML schemas vary widely, we have specified a target (Dublin Core+) schema as a target for submission. We describe design choices made when choosing this solution and actual problems and challenges experienced in our first large trial.

--Derek Lane
CSC contractor to EPA to4047 eims/nsdi support
(919) 380 4540

Capturing Bibliographic Metadata 30 Dec 2005

Meta-data could make proper attribution of web-based documents easy
for students and scholars alike. Why don’t more pages contain
bibliographic meta-data (like Dublin Core)? For that matter, how may
do? Why aren’t there tools to help the end-user capture and format
meta-data? This talk will examine these issues and report on an on-
going survey of the web that is measuring the percentage of pages
containing meta-data. An open-source tool for transferring meta-data
to a bibliographic database will also be shown.

Dr. Peter Jörgensen
Assistant Professor
College of Information
Florida State University
Tallahassee, FL 32306
850-644-4139 (work)
850-574-0776 (home)

Teaching the library and information community how to remix information 2 Jan 2006

I will articulate a framework that I am using to teach LIS students how to remix information with XML and web services. Because information remix comes across as a grab bag of techniques, students need a framework for learning a particular example of remix in depth so they can understand remixing in a broader context. In my talk, I will reflect on using Flickr as a paradigmatic example in elucidating remix to LIS students.

Raymond Yee 2195 Hearst (250-22)
Technology Architect UC Berkeley
Interactive University Project Berkeley, CA 94720-3810 510-642-0476 (work) 413-541-5683 (fax)

AHAH: When Good is Better than Best 2 Jan 2006

It can be difficult to enhance, fix or extend legacy/closed-source web applications such as online catalogs without being able to alter the web application directly.

I will discuss using AHAH (Asynchronous HTTPRequest and HTML) as a technique for doing so and compare it to AJAX, proxying and SSI. Examples from the Seattle Public Library’s next generation online catalog will be presented. Performance and scalability concerns will also be covered, time permitting.

Casey Durfee

Building an International Network of Shared Metadata 2 Jan 2006

Libraries tend to avoid collecting any sort of long-term data about patron library usage or preferences wherever possible. Although this is done to protect patron privacy, it also greatly restricts the range of social features (such as suggestions based upon a patron’s checkout history or patron-supplied reviews and comments) that library catalogs can offer.

I will discuss cryptographic techniques for securely and anonymously collecting and sharing user-supplied metadata across library systems.

Casey Durfee
Systems Analyst
The Seattle Public Library
1000 4th Ave
Seattle WA 98101

What Blog Applications Can Teach Us About Library Software Architecture 2 Jan 2006

The number of programmers in the library world is growing and our individual efforts have shown great promise, but they exist largely as a spectacle that few libraries can enjoy. We need better means to aggregate our efforts and share solutions that can be employed by libraries without programming staff.

Looking outside libraries, we see some interesting examples in the blog world. The blog world is growing with new bloggers every day, but the most interesting aspect is how many people with limited technical skills are using (maintaining and configuring) blog applications like WordPress or Moveable Type, and how quickly the contributions of the many plugin and theme developers are implemented on those blogs. What lessons can we learn from this and how might a library application built from those lessons work? Are some software architectures better at leveraging the network effects of the growing number of developers in our community than others?

I’m working on a project that attempts to answer those questions and I hope to release a public beta shortly. I’d like to demo it and ask for participation.

Casey Bisson

E-Learning Application Developer
Plymouth State University
Plymouth, New Hampshire
ph: 603-535-2256

Accelerated acceleration for the XMLFile implementation of
3 Jan 2006

EIMS ( is a web-accessible EPA catalog of projects
and products. We provide access to our records via OAI-PMH2. mostly to
the National Science Digitatl Library (NSDL).
We needed a pre-tested implementation of the protocol, and chose Hussein
Suleman’s XMLFile ( Perl
While exploring the implementation, we found that large numbers (tens of
thousands) of records could only be published very slowly. We quickly
show how to accelerate this implementation ~20 times in our environment,
with some real-world trade-offs.

—Derek Lane
CSC contractor to EPA to4047 eims/nsdi support
(919) 380 4540

“Quality Metrics” 3 Jan 2006

This talk will discuss the core development activities of the “Quality
Metrics” project at Emory’s Woodruff Library. This project is being
conducted under an IMLS grant to research requirements for and build
a working prototype digital library search system.

What this project is doing that is new is truly generalizing and
integrating explicit and latent quality indicators which allow
users to ascertain the fitness of digital library resources. Most
search engine components have only one indicator: content-query
similarity (“relevance”). Google only has two, adding PageRank to the
latter. Our system, QM-search, will have an unlimited number of these,
which will be customizable by the digital librarian for the target
community and collections, and even customizeable from user to user or
search to search.

Some basic examples of quality indicators that digital libraries might
be able to exploit would be activations (views online or check-outs in
circulation), selection (compilation in “bookmark” lists online or
additions to course reserves lists), extent of review (from a peer-
reviewed journal, conference, or not?), or citation-based metrics.

The ouput of QM-search will be in a completely generalized XML format,
with the search results represented as a structure based on the
structure specified in the input “organization spec”. This XML output
can be transformed into presentation HTML resembling anything from a
“linear” Google-like search results list to an A9-like column display to
more exotic groupings and breakdowns.

Requirements for QM-search are being gleaned from focus groups being
conducted at Emory (preliminary results will be shared), and development
is being conducted as a high-level layer atop the excellent Lucene open
source search engine project.

—Aaron Krowne Head of Digital Library Research Emory University General Libraries President and Founder, Office: 404-712-2810 Cell: 404-405-5766

SRW/U and You 6 Jan 2006

I’d like a chance to talk about SRW/U. I can extend it to talk about
OpenSearch and MXG as well. We’ve got open source code to share and
experience with using that code in a cluster environment to search large

I can talk about any or all of that.


Ralph LeVan

ResCarta Digital Library 6 Jan 2006

Using open source software tools, the ResCarta Foundation has
developed a means to store, retrieve, and display digital
collections that does not utilize proprietary file formats or
viewing systems.

Rather, this system uses industry standard TIFF files with
metadata contained in METS/MODS files along with a search
engine to provide a friendly means to integrate a complete
digital collection.

Our presentation will demonstrate this open source software
suite as well as the ResCarta Standards.

Linkr8r (alligator) 6 Jan 2006

I have a simple “purl-izer / link localizer” at
IE compatible for now
What is does should be self explanatory.
It’s very simple code.

It is a singular response to the growing realization that digital libraries / academic libraries/ db vendors are failing their users when users expect to copy research links into bibliographies/ blackboard etc. (as you know). Maybe its worthy of
10 minutes or the “lightning round” ?

Just let me know.
Either way I’ll attend and look forward to meeting some ubergeeks.
(That’s a compliment here)

Charles Lockwood
Digital Librarian
Loyola Notre Dame Library
Baltimore MD 21212

Connecting Everything with unAPI and OPA 6 Jan 2006

unAPI is a simple-to-use, simple-to-implement API for web sites that allows rich object access and can be easily layered over existing services like Atom, OpenSearch, OAI-PMH, or SRU. OPA is a general-purpose identifier resolver that wraps API calls to heavily-used but incompatible web services like those from Amazon, Flickr, and Pubmed.

Together they will do the same thing we do every code4libcon – try to take over the world!

-Dan Chudnov

OAISRB 6 Jan 2006

Michael Witt (
Jigar Kadakia (

The Purdue University Libraries has developed an interface, OAISRB, for the Open
Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to data grid resources
served by the Storage Resource Broker (SRB). OAI-PMH defines a protocol for
exposing and harvesting metadata from networked repositories. Developed by the San
Diego Supercomputer Center, SRB provides a uniform interface to heterogeneous storage
over a network. This presentation will briefly introduce both technologies and
demonstrate their interaction using OAISRB.

Two Paths to Interoperable Metadata 6 Jan 2006

“Two Paths to Interoperable Metadata” [1] proposed a model for metadata
translation that offers substantial gains over models based on the
current community standard, which usually involves an XSLT
implementation. In this presentation, I will discuss implementation
issues with the Semantic Equivalence Expression Language (Seel), our
alternative to XSLT [2]. I will show how Seel eases the complex task of
change management because it represents a more faithful computational
model of the metadata translation problem.


—Devon Smith <> Senior Software Engineer, Office of Research OCLC Online Computer Library Center, Inc

Standards, Reusability, and the Mating Habits of Learning Content 6 Jan 2006

Digital libraries are supposed to foster reuse of digital content but it is hard to combine content from different sources. We are building prototype software that (1) converts different types of courseware to an XML interchange format based on OpenDocument and other specs/standards (2) enables the content to be disaggregated, recombined, re-styled and endowed with SCORM reporting behaviors and (3) realizes instructional design through the use of the SCORM (or IMS) Simple Sequencing. Will demo, discuss and am happy to talk about the bigger picture of reusability in educational digital libraries and standards if given a longer slot.

Name: Robby Robson

Address: Somewhere in Corvallis (use

Practical Aspects of Implementing Open Source in Armenia 7 Jan 2006

A look at Open Source from outside of North America. What is the situation on Open Source in Armenia? What actions will be implemented at Yerevan State University library concerning Open Source? What are problems facing Armenian libraries, as well as those in Georgia and Azerbaijan, in creating digital repositories?

(Forwarded on behalf of Tigran Zargaryan, Head of Automation at Yerevan State University library in Armenia by Art Rhyno)

ERP Options in an OSS World 7 Jan 2006

Enterprise Resource Planning (ERP) applications are considered some of the largest and most complex systems ever written, and support many of the functions that libraries associate with the acquisitions and processing side of their operations. The information retrieval layers of library systems receive a lot of attention with good reason, but there’s also a body of standards and best practices for back office systems which libraries could benefit from as well. Open Source ERP systems offer options for libraries to take advantage of OMG standards and workflow engines, and this session will give an overview of some currently available ERP options.

Art Rhyno

Lipstick on a pig: 7 ways to improve the sex life of your OPAC 8 Jan 2006

Jim Robertson will show how NJIT has used a variety of tools (but largely ColdFusion) to extend their library’s OPAC to engage today’s Millennial (raised in the “Goozlezon” Web 2.0 environment) students: (1) book covers; (2) book reviews, (3) live circulation usage history, (4) recommendation engine, (5) RSS of journals tables of contents, (6) live librarian support, (7) shortcut, durable links (PURL’s) to specific items.

--Jim Robertson, Assistant University Librarian
   New Jersey Institute of Technology    973-596-5798

Web Applications with Map Interfaces 8 Jan 2006

We have been working on our software tools that
automatically generates Web applications with map interfaces.
Our tools support the following unique features
which are not yet supported by commercial applications.

1) Geographical features, e.g., the areas related to
maps, surveys, and research documents, can be
entered, searched, updated, and deleted from
standard Web browsers.

2) An application can be created with little programming.
The look and behavior of the map interface
can be customized with configuration files,
and Web scripts for data access can be automatically
generated from a database schema.

3) Different areas in the world can be covered with
different map projectiosns. The projection used is
automatically switched when a new area using a different
projection is displayed.

Toshimi Minoura, Associate Professor
School of Electrical Engineering and Computer Science
Kelly 2077, Oregon State University
Corvallis, OR 97331-5501, phone (541)737-5580
fax (541)737-3014, e-mail

RefPole: A Knowledge Sharing and Statistical Analysis Tool for Reference 9 Jan 2006

RefPole is a home-grown Windows application to streamline the processes of collecting and analyzing reference data and provides a means for searching and sharing reference expertise among librarians. RefPole was developed with Visual Studio C#, MS-Access and/or MS Data Access Components. It supports both server-client and single desktop environment. Features are configurable to meet individual library’s needs, allowing the reference transaction data to serve as decision support data and a reference knowledge database.

Sarah G. Park,
Hong Gyu Han,
Frank Baudino,

Anatomy of aDORe 9 Jan 2006

The aDORe Archive is a write-once/read-many storage approach for Digital Objects and their constituent datastreams. First, XML-based representations of multiple Digital Objects are concatenated into a single, valid XML file named an XMLtape. Second, ARC files, as introduced by the Internet Archive, are used to contain the constituent datastreams of the Digital Objects. The software was developed by the LANL Digital Library Research & Prototyping Team and is available under GNU LGPL license.

Xiaoming Liu,
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library

The Case for Code4Lib 501©(3) 9 Jan 2006

Libraries face tremendous challenges to create effective and responsive institutions in a Googlezon world. But the type of leadership we need so far hasn’t materialized. If it isn’t going to come from the administrators, let it come from the coders. In this talk I will build a case for establishing Code4Lib as a nonprofit library software cooperative. A financial structure would allow us to put real resources—both financial and human—into bringing libraries into the 21st century.

Roy Tennant

Generating Recommendations in OPACS: Initial Results and Open Areas for Exploration 9 Jan 2006

In the context of a research and prototyping project, the California Digital Library is using catalog content indexed in XTF, along with over 9 million historical circulation transaction records and other external data, to generate recommendations for an academic audience. Early results are promising. This talk will focus on methods, challenges, and plans for further development.

For more information on the project:

Colleen Whitney
California Digital Library and
UC Berkeley Schoool of Information Management and Systems