SourceForge.net Logo

Open standards and software for bibliographies and cataloging

This list is no longer maintained

If you'd like to take over maintenance of the list, email me — John Lee, August 2007


With the proliferation of robust and powerful open source software, the Internet, and standardization on XML data formats, there are unprecedented opportunities for the collection and management of information. Given the greatly increased access to information in the Internet era, metadata — in essence, information about information — becomes all the more essential. For scholars and researchers, among the most essential metadata is bibliographic. Being able to reliably store, find, use and communicate bibliographic data is a basic need of academic research. And yet the state of bibliographic software is largely stuck in the 1980s.

This list provides a quick overview of the landscape of open-source bibliographic software; both where is has been, but more importantly, where it may yet go.

Currently, the emphasis is on the needs of individuals and small groups rather than libraries, but given the growing overlap in the interests of these groups, the list is likely to expand to some extent to cover more library software.


Standards

Bibliographic Data

BibX

A bibliographic DTD intended to be the basis for an XML-based rewrite of the ConTeXt bibliographic module (m-bib). It draws from RIS and BibTeX, but goes beyond them in terms of data richness.

Bibulus

Very similar goals and approach as BibX + m-bib v2, Bibulus defines a bibliographic DTD (though more closely modeled on BibTeX) and uses Perl to format LaTeX documents.

Pybliographer 2 analysis / design documents

Pybliographer 2 will define a rich internal database schema, which will probably necessitate its own canonical XML data format.

Standards related to the various MARC standards

Some of these are simple reworkings of MARC in XML, some are "better MARCs", and some (such as MODS) are simpler than MARC, and related to efforts like BibX.

Library of Congress

The following links are to a collection of XML-related metadata efforts at the Library of Congress.

MARCXML

The Library of Congress' Network Development and MARC Standards Office is developing a framework for working with MARC data in a XML environment. This framework is intended to be flexible and extensible to allow users to work with MARC data in ways specific to their needs. The framework itself includes many components such as schemas, stylesheets, and software tools.

MODS

As an XML schema, the "Metadata Object Description Schema" (MODS) is intended to be able to carry selected data from existing MARC 21 records as well as to enable the creation of original resource description records. It includes a subset of MARC fields and uses language-based tags rather than numeric ones, in some cases regrouping elements from the MARC 21 bibliographic format.

METS

The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library

METS has a modular structure that allows, for example, encapsulation of MODS data.

See also SRW.
XOBIS

XOBIS has some of the same goals as the MARCXML/MODS/METS combination, but also rethinks basic principles of bibliographic representation, and has as its goal to present an "ideal" representation. It is focused on the needs of libraries.

There are many XML schemas available for modeling MARC data. Most take a literal approach, naming elements and attributes after their corresponding MARC fields, subfields, and indicators. Others represent only a small subset of the data libraries use to describe resources. XOBIS attempts to walk the middle path: describe the full set of library information, but reorganize this information into a structure that empowers the use of library data as just one more information resource available in the digital domain.

BiblioML

BiblioML and AuthoritiesML are XML-based formats for the interchange of UNIMARC bibliographic and authority records between applications.

BibTeX in XML

Several XML representations for BibTeX data exist. The idea, of course, is to retain the mass of BibTeX data and formatting styles while getting the benefit of general purpose XML processing tools.

These formats seem to have proliferated for no obvious reason. There has been some consolidation around the (first-listed below) BibTeXML standard, but there still seem to be a couple of other - presumably now obsolete - standards floating around. (Note that bib2xml's format is not a separate standard, but rather a possibly-outdated version of BibTeXML)

BibTeXML

XML DTD and schema. This is the result of a merger of three competing BibTeX-in-XML DTDs (Hendrikse, Kuhlmann, and Gundersen).

From the site:

BibTeXML is shipped with tools to uptranslate native TeX-syntax BibTeX bibliographies to XML, and translating this into any markup scheme...

Our goal is to maintain a strict BibTeX schema and develop (and collect!) conversion tools that will help you tag your bibliographic data in XML and save typing time, or export it to HTML, DocBook or native BibTeX syntax.

Yet another BibTeXML?

Is this a separate standard? Obsolete?

The BibTeXML project also provides a Web-based solution (based on PHP and MySQL) for working with bibliographic data, and the implementation has its particular strong points in the extensibility of the bibliography schema, powerful import and export mechanisms, as well as pluggable XSLT style sheets which may be used to support custom export formats.

BibTeX-XML-HTML

Is this a separate standard? Obsolete?

The main goal of the BibTeX-XML-HTML Bibliography Project is to transform a BibTeX bibliography into an HTML file to facilitate the publication of the bibliography on the web.

Bibliographic Formatting

citestylex

From the RefDB project, the only attempt to represent in XML the binary and proprietary formatting style files from commercial products like Endnote and Reference Manager on one hand, and BibTeX on the other. The intention behind BibX was to couple it with citestylex to provide a complete XML replacement for BibTeX. Citestylex will be adapted to format BibX (and/or MODS) records, which would provide open XML representations of the two most significant aspects of bibliographic data: storage/exchange and formatting.

Pybliographer 2 analysis / design documents

Information Retrieval and Linking

SRW

Falling under the ZING umbrella, and related to the Library of Congress work discussed above is SRW, which among other things is envisioned as a carrier of MODS data.

SRW is the "Search/Retrieve Web Service" protocol, which aims to integrate access to various networked resources, and to promote interoperability between distributed databases, by providing a common utilization framework. SRW is a web-service-based protocol whose underpinnings are formed by bringing together more than 20 years experience from the collective implementers of the Z39.50 Information Retrieval protocol with recent developments in the web technologies arena.

OpenURL

The proposed OpenURL standard is a syntax to create web-transportable packages of metadata and/or identifiers about an information object. Such packages are at the core of context-sensitive or open link technology. By standardizing this syntax, the OpenURL will enable many other innovative user-specific services.

DOI

Provides a unique id for any digital object. Used, for instance, by OpenURL to link to an electronic journal article.

RFC 2288: Using Existing Bibliographic Identifiers as Uniform Resource Names

ISBN, ISSN and SICI as URNs.

ZING

From the site:

ZING, "Z39.50-International: Next Generation", covers a number of initiatives by Z39.50 implementors to make the intellectual/semantic content of Z39.50 more broadly available and to make Z39.50 more attractive to information providers, developers, vendors, and users, by lowering the barriers to implementation while preserving the existing intellectual contributions of Z39.50 that have accumulated over nearly 20 years.

Current ZING initiatives are SRW (including SRU), CQL, ZOOM, ez3950, and ZeeRex. Some (for example, SRW/U) seek to evolve Z39.50 to a more mainstream protocol, while for others (e.g. ZOOM) the purpose is to preserve the existing protocol but hide its complexity.

SLinkS

Inter-publisher reference linking.

OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting)
Amazon API

Web service for searching Amazon's extensive catalog. See below for applications making use of it.

Getting jake data

Access to jake, the volunteer-run journal metadata database.

Dublin core

Software

Building Blocks

BibTeX and LaTeX

LaTeX is the traditional scientific document markup and typesetting system, still widely used today. BibTeX is LaTeX's bibliographic partner, which allows you to keep your bibliographic database(s) separate from your documents. Almost everything LaTeX- and BibTeX-related can be found on CTAN

BibTeX can be entirely replaced with newer formatting engines, but the large number of .bst files (which specify formatting styles) available for BibTeX means that many people prefer to keep BibTeX, and replace only the front-end (with a graphical reference management application) and/or the data format (with an XML format).

m-bib and ConTeXt

ConTeXt is a TeX macro package similar to LaTeX, and has a built-in XML parser. The bibliographic module (m-bib) is a TeX-based replacement (for the most part) of BibTeX. Version 2 is intended to be rewritten to use XML data and formatting files.

ibibproc

From the site:

ibibproc is a bibliography processor similar to BibTEX. Its primary distinguishing features are:

  • Internationalised. References can be multilingual and multiscript.
  • Customisable and extendible. Styles, database formats, front-ends, and back-ends can be tailored and extended.
  • Multiple front- and back-ends. Citations and references can be formatted for use with several document processing systems.
RefDB

Currently supporting PostgreSQL, MySQL, and SQLite:

RefDB is a reference database and bibliography tool for SGML, XML, and LaTeX/BibTeX documents. It allows users to share databases over a network. It is lightweight and portable to basically all platforms with a decent C compiler. And it's released under the GNU General Public License.

Pybliographer

Though Pybliographer is most frequently used as a graphical reference manager application (see below), it was designed as a general-purpose library: ...a simple framework that provides easy to use python classes and functions, and therefore can be extended to many uses (generating HTML pages according to bibliographic searches, etc).

bp

An elderly Perl 4 bibliographic formatting library. Input and output to many formats, character set conversion, etc.

MIT

There is an impressive collection of efforts at MIT that could have profound implications for bibliographic applications and data. See also Haystack.

DSpace

DSpace is a newly developed digital repository created to capture, distribute and preserve the intellectual output of MIT.

DSpace currently uses qualified Dublin Core to store metadata, and PostgreSQL as its storage engine, but support of additional metadata standards (including MODS) is likely to grow as a result of the SIMILE research project.

SIMILE

Simile will leverage and extend DSpace, enhancing its support for arbitrary schemas and metadata, primarily though the application of RDF and semantic web techniques. The project also aims to implement a digital asset dissemination architecture based upon web standards. The dissemination architecture will provide a mechanism to add useful "views" to a particular digital artifact (i.e. asset, schema, or metadata instance), and bind those views to consuming services.

Cheshire II project

Note that parts of the web site are out-of-date. The current version of the system is available from the Cheshire ftp site.

Work is underway on the development of Cheshire III.

The Cheshire II project is developing a next-generation online catalog and full-text information retrieval system using advanced IR techniques. ... The Cheshire II system was designed to overcome twin problems of topical searching in online catalogs, search failure and information overload as well as to provide a bridge between the purely bibliographic realm of previous generations of online catalogs and the rapidly expanding realm of full-text and multimedia information resources. The system incorporates a client/server architecture with implementations of current information retrieval standards including Z39.50 and SGML and XML.

Z39.50 software
Python Z39.50 and ZOOM implementations.
Some useful Amazon API links
Anybody have a good list of MARC software I can link to? This is certainly incomplete:
MARC21 Python module
MARC4J

Java MARC library with SAX- and DOM-like interfaces.

MARC/Perl

Perl MARC library.

MARC template library

C++ MARC library.

Applications

Semantic Blogging and Bibliographies - Requirements Specification

OK, not an application, but looks like an interesting requirements document for anybody who's writing one.

Haystack

Another MIT offering.

Our research seeks to bring modern information management and retrieval technologies to the average computer user in order to make computers a more compelling place for users to interact with their information. Haystack looks into the use of artificial intelligence techniques for analyzing unstructured information and providing more accurate retrieval. We also deal with the modeling, management, and display of user data in more natural and useful ways.

BibDesk

BibDesk is a GUI BibTeX bibliography manager, making extensive use of Mac OS X-native technologies — including integrated PDF viewing, a citation-completion Service, etc. Capable of publishing RSS feeds of reading lists.

Pybliographer

Pybliographer is a tool for managing bibliographic databases.

The current stable version is Pybliographer 1. Work is in progress on Pybliographer 2.

Pybliographer 1 allows retrieving, editing, searching and citing bibliographic records using a GNOME graphical interface. Input and output to BibTeX, ISI, Medline, Ovid and Refer formats is provided, plus citation to LaTeX, LyX, HTML and plain text formats. Particularly good support for BibTeX / LaTeX and LyX. Medline queries direct from the application. Customizable BibTeX templates, Language / charset support. Pybliographer 1 is limited to databases of a few thousand records (this will be fixed in Pybliographer 2).

Open Office Bibliographic Project

Scheduled for inclusion in the next major release of the open source office project Open Office, this will be an integrated bibliographic storage and formatting module. The OpenOffice technical committee on XML file format specification for office applications is also relevant.

RefDB web client

RefDB focuses its attention on a relational database backend and support for markup languages, but includes a simple web client. Supports a variety of structured markup documents (including TEI, DocBook and LaTeX).

Kaspaliste

Bibliography manager for KDE. The OpenOffice people have this to say about it:

Kaspaliste is a literature and knowledge database. It handles all kinds of books, articles, journals, web pages etc. But the database goes beyond simply storing bibliographical information. There is the possibility to create annotated links between pieces of information (like the content of a book chapter) and to group the links in categories. It is based on KDE and uses the Postgres relational database. It is a promising project but currently does not have import or export functions other than a BibTeX export. There are no links to other programs such as OpenOffice.

JabRef

JabRef is a GUI for managing BibTeX databases. ... JabRef works on all platforms and requires java 1.4.2

JBibtexManager

Import from ISI Web of Science, Medline/PubMed XML, Scifinder, OVID and INSPEC formats. Export to BibTeX, HTML and plain text. Searching, editing, sorting, duplicate detection, automatic key generation, customizable BibTeX templates, language support.

B3

Bibliography Base for Biologists, since I'm a biologist and only know about biology journals bibliography. Java 1.4. Has its own BibTeX-in-XML format (joy). Import from Endnote export format, RIS, BibTeX, PubMed and CSV. XSL-based export to BibTeX, HTML, CSV and OpenOffice via CSV or directly to a database.

JReferences

Reference database with file and MySQL backends, oriented towards Docbook. Seems to be dormant.

gBib

gBib is a user-friendly editor and browser for BibTeX databases. You can use it also to insert citations inside a LyX document.

zNote

From the site:

...a web-based bibliography-management tool built with Zope and XML.

zNote is intended to ultimately be a replacement for tools like EndNote, ProCite, and to a certain extent, BibTeX. It uses a hierarchical XML data format which is more flexible than flat data, and it works using a set of pretty simple DOM calls to format, edit, etc.

CMFBibliography

A basic BibTeX-centric bibliographic plone module. Designed for scientific users (this is true of many of these applications, in fact).

biblioz

Plone-based personal book list and reference manager with support for journals and automatic retrieval of data from Amazon.

CitationManager
Sixpack

SIXPACK is a free BibTeX and Reference Manager.... The OpenOffice people have this to say about it (notes in square brackets are theirs):

Sixpack is a graphical and command-line bibliography database manager written in Perl/Tk. It interacts with the supplied package 'bp', (see below) which can import and export from a number of formats including BibTeX, Endnote, Medline, Procite, and many others. It can download references directly off the Web, and open articles using external viewers. It can also interface with Emacs/XEmacs and LyX [LaTeX with a GUI interface] . It also has instructions on how to interact with OpenOffice / StarOffice using CVS files and database import functions [I have used this]. GNU EPrints 2 EPrints is free software which creates online archives. The default configuration creates a research papers archive, but could be used for other purposes.

Tyrannio

Z39.50-capable book cataloging and information retrieval application with command-line, GUI and web front ends. Support for barcode scanning with the :CueCat scanner.PostgreSQL backend.

Bookcase

Book collection manager for KDE 3.x. File-based XML system: there's no database backend. There are plans for extension to other types of collections (CDs, for example), and for Z39.50 client capability. Has it's own simple DTD.

Bookcase is a KDE application for keeping track of your book collection. Ultimately, I'd like it to be similar in capability to AVCataloger or Readerware, although it's still got a ways to go.

Koha

Made in New Zealand by Katipo Communications Ltd. and maintained by a team of volunteers from around the globe, the Koha system is a full catalog, OPAC, circulation and acquisitions system.

Greenstone

Greenstone is a suite of software for building and distributing digital library collections.

OpenBiblio

OpenBiblio is an easy to use, open source, automated library system written in PHP containing OPAC, circulation, cataloging, and staff administration functionality. The purpose of this project is to provide a cost effective library automation solution for private collections, clubs, churches, schools, or public libraries.

PhpMyBibli (in French)

Web-based library cataloging and circulation database, including administration and OPAC functionality. Apache/MySQL/PHP-based.

iVia

Web-based library cataloging database, with an emphasis on electronically-stored collections ('virtual library system'). In addition to human-created catalogs, it crawls the web to do automated catalog creation.

Links

oss4lib

Slashdot-like site covering open source systems for libraries. The projects page is well worth a look if you found some of the links here useful.

Brenda Chawner's Open Source Software and Libraries Bibliography
Open Archives Initiative

develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.

eprints.org

dedicated to opening access to the refereed research literature online through author/institution self-archiving.

Jake

Journal metadata database.

Library Information Interchange Standards

What looks like a rather comprehensive list from an old EC project.

CiteSeer

Interesting no-cost citation-indexed literature database.

ResearchBuzz

From the co-author of O'Reilly's "Google Hacks". Obsessed with search engines, databases, and various info-piles since 1998

TEI

Text Encoding Initiative. International standard for text encoding aimed at the humanities and social sciences: "maximally expressive and minimally obsolescent".

bookpeople mailing list

People publishing free books on the internet (mostly by scanning old books in the public domain). Has also become a place for discussion of recent extensions of copyright.


Personal bibliographic software links
Overview of Personal Bibliographic Software (1999).
Comparisons of commercial Windows and Mac reference management applications.
OpenOffice.org bibliography project links page
Google directory: Bibliographic Utilities
ISI Researchsoft

The vendors of the most popular commercial bibliography management software packages: Endnote, ProCite, Reference Manager.

BookWhere

Commercial catalog search application.

Bruce D'Arcus and John J. Lee, April 2004.