This list is no longer maintained
If you'd like to take over maintenance of the list, email me — John Lee, August 2007
With the proliferation of robust and powerful open source software, the Internet, and standardization on XML data formats, there are unprecedented opportunities for the collection and management of information. Given the greatly increased access to information in the Internet era, metadata — in essence, information about information — becomes all the more essential. For scholars and researchers, among the most essential metadata is bibliographic. Being able to reliably store, find, use and communicate bibliographic data is a basic need of academic research. And yet the state of bibliographic software is largely stuck in the 1980s.
This list provides a quick overview of the landscape of open-source bibliographic software; both where is has been, but more importantly, where it may yet go.
Currently, the emphasis is on the needs of individuals and small groups rather than libraries, but given the growing overlap in the interests of these groups, the list is likely to expand to some extent to cover more library software.
A bibliographic DTD intended to be the basis for an XML-based rewrite of the ConTeXt bibliographic module (m-bib). It draws from RIS and BibTeX, but goes beyond them in terms of data richness.
Very similar goals and approach as BibX + m-bib v2, Bibulus defines a bibliographic DTD (though more closely modeled on BibTeX) and uses Perl to format LaTeX documents.
Pybliographer 2 will define a rich internal database schema, which will probably necessitate its own canonical XML data format.
Some of these are simple reworkings of MARC in XML, some are "better MARCs", and some (such as MODS) are simpler than MARC, and related to efforts like BibX.
The following links are to a collection of XML-related metadata efforts at the Library of Congress.
The Library of Congress' Network Development and MARC Standards Office
is developing a framework for working with MARC data in a XML environment. This
framework is intended to be flexible and extensible to allow users to work with
MARC data in ways specific to their needs. The framework itself includes many
components such as schemas, stylesheets, and software tools.
As an XML schema, the "Metadata Object Description Schema" (MODS) is
intended to be able to carry selected data from existing MARC 21 records as
well as to enable the creation of original resource description records. It
includes a subset of MARC fields and uses language-based tags rather than
numeric ones, in some cases regrouping elements from the MARC 21 bibliographic
format.
The METS schema is a standard for encoding descriptive, administrative,
and structural metadata regarding objects within a digital library
METS has a modular structure that allows, for example, encapsulation of MODS data.
XOBIS has some of the same goals as the MARCXML/MODS/METS combination, but also rethinks basic principles of bibliographic representation, and has as its goal to present an "ideal" representation. It is focused on the needs of libraries.
There are many XML schemas available for modeling MARC data. Most take a
literal approach, naming elements and attributes after their corresponding MARC
fields, subfields, and indicators. Others represent only a small subset of the
data libraries use to describe resources. XOBIS attempts to walk the middle
path: describe the full set of library information, but reorganize this
information into a structure that empowers the use of library data as just one
more information resource available in the digital domain.
BiblioML and AuthoritiesML are XML-based formats for the
interchange of UNIMARC bibliographic and authority records between
applications.
Several XML representations for BibTeX data exist. The idea, of course, is to retain the mass of BibTeX data and formatting styles while getting the benefit of general purpose XML processing tools.
These formats seem to have proliferated for no obvious reason. There has been some consolidation around the (first-listed below) BibTeXML standard, but there still seem to be a couple of other - presumably now obsolete - standards floating around. (Note that bib2xml's format is not a separate standard, but rather a possibly-outdated version of BibTeXML)
XML DTD and schema. This is the result of a merger of three competing BibTeX-in-XML DTDs (Hendrikse, Kuhlmann, and Gundersen).
From the site:
BibTeXML is shipped with tools to uptranslate native TeX-syntax BibTeX bibliographies to XML, and translating this into any markup scheme...
Our goal is to maintain a strict BibTeX schema and develop (and collect!) conversion tools that will help you tag your bibliographic data in XML and save typing time, or export it to HTML, DocBook or native BibTeX syntax.
Is this a separate standard? Obsolete?
The BibTeXML project also provides a Web-based solution (based on PHP and MySQL) for working with bibliographic data,
and the implementation has its particular strong points in the extensibility of the bibliography schema, powerful import and
export mechanisms, as well as pluggable XSLT style sheets which may be used to support custom export formats.
Is this a separate standard? Obsolete?
The main goal of the BibTeX-XML-HTML Bibliography Project is
to transform a BibTeX bibliography into an HTML file to facilitate the
publication of the bibliography on the web.
From the RefDB project, the only attempt to represent in XML the binary and proprietary formatting style files from commercial products like Endnote and Reference Manager on one hand, and BibTeX on the other. The intention behind BibX was to couple it with citestylex to provide a complete XML replacement for BibTeX. Citestylex will be adapted to format BibX (and/or MODS) records, which would provide open XML representations of the two most significant aspects of bibliographic data: storage/exchange and formatting.
Falling under the ZING umbrella, and related to the Library of Congress work discussed above is SRW, which among other things is envisioned as a carrier of MODS data.
SRW is the "Search/Retrieve Web Service" protocol, which aims to integrate
access to various networked resources, and to promote interoperability between
distributed databases, by providing a common utilization framework. SRW is a
web-service-based protocol whose underpinnings are formed by bringing together
more than 20 years experience from the collective implementers of the Z39.50
Information Retrieval protocol with recent developments in the web technologies
arena.
The proposed OpenURL standard is a syntax to create web-transportable
packages of metadata and/or identifiers about an information object. Such
packages are at the core of context-sensitive or open link technology. By
standardizing this syntax, the OpenURL will enable many other innovative
user-specific services.
Provides a unique id for any digital object. Used, for instance, by OpenURL to link to an electronic journal article.
From the site:
ZING, "Z39.50-International: Next Generation", covers a number of initiatives by Z39.50 implementors to make the intellectual/semantic content of Z39.50 more broadly available and to make Z39.50 more attractive to information providers, developers, vendors, and users, by lowering the barriers to implementation while preserving the existing intellectual contributions of Z39.50 that have accumulated over nearly 20 years.
Current ZING initiatives are SRW (including SRU), CQL, ZOOM, ez3950, and ZeeRex. Some (for example, SRW/U) seek to evolve Z39.50 to a more mainstream protocol, while for others (e.g. ZOOM) the purpose is to preserve the existing protocol but hide its complexity.
Inter-publisher reference linking.
Web service for searching Amazon's extensive catalog. See below for applications making use of it.
Access to jake, the volunteer-run journal metadata database.
LaTeX is the traditional scientific document markup and typesetting system, still widely used today. BibTeX is LaTeX's bibliographic partner, which allows you to keep your bibliographic database(s) separate from your documents. Almost everything LaTeX- and BibTeX-related can be found on CTAN
BibTeX can be entirely replaced with newer formatting engines, but the large number of .bst files (which specify formatting styles) available for BibTeX means that many people prefer to keep BibTeX, and replace only the front-end (with a graphical reference management application) and/or the data format (with an XML format).
ConTeXt is a TeX macro package similar to LaTeX, and has a built-in XML parser. The bibliographic module (m-bib) is a TeX-based replacement (for the most part) of BibTeX. Version 2 is intended to be rewritten to use XML data and formatting files.
From the site:
ibibproc is a bibliography processor similar to BibTEX. Its primary distinguishing features are:
- Internationalised. References can be multilingual and multiscript.
- Customisable and extendible. Styles, database formats, front-ends, and back-ends can be tailored and extended.
- Multiple front- and back-ends. Citations and references can be formatted for use with several document processing systems.
Currently supporting PostgreSQL, MySQL, and SQLite:
RefDB is a reference database and bibliography tool for SGML, XML, and
LaTeX/BibTeX documents. It allows users to share databases over a network. It
is lightweight and portable to basically all platforms with a decent C
compiler. And it's released under the GNU General Public License.
Though Pybliographer is most frequently used as a graphical reference
manager application (see below), it was designed as a
general-purpose library: ...a simple framework that provides easy to use
python classes and functions, and therefore can be extended to many uses
(generating HTML pages according to bibliographic searches, etc).
An elderly Perl 4 bibliographic formatting library. Input and output to many formats, character set conversion, etc.
There is an impressive collection of efforts at MIT that could have profound implications for bibliographic applications and data. See also Haystack.
DSpace is a newly developed digital repository created to capture,
distribute and preserve the intellectual output of MIT.
DSpace currently uses qualified Dublin Core to store metadata, and PostgreSQL as its storage engine, but support of additional metadata standards (including MODS) is likely to grow as a result of the SIMILE research project.
Simile will leverage and extend DSpace, enhancing its support for
arbitrary schemas and metadata, primarily though the application of RDF and
semantic web techniques. The project also aims to implement a digital asset
dissemination architecture based upon web standards. The dissemination
architecture will provide a mechanism to add useful "views" to a particular
digital artifact (i.e. asset, schema, or metadata instance), and bind those
views to consuming services.
Note that parts of the web site are out-of-date. The current version of the system is available from the Cheshire ftp site.
Work is underway on the development of Cheshire III.
The Cheshire II project is developing a next-generation online catalog and
full-text information retrieval system using advanced IR techniques. ... The
Cheshire II system was designed to overcome twin problems of topical searching
in online catalogs, search failure and information overload as well as to
provide a bridge between the purely bibliographic realm of previous generations
of online catalogs and the rapidly expanding realm of full-text and multimedia
information resources. The system incorporates a client/server architecture with
implementations of current information retrieval standards including Z39.50 and
SGML and XML.
Java MARC library with SAX- and DOM-like interfaces.
Perl MARC library.
C++ MARC library.
OK, not an application, but looks like an interesting requirements document for anybody who's writing one.
Another MIT offering.
Our research seeks to bring modern information management and
retrieval technologies to the average computer user in order to make computers
a more compelling place for users to interact with their information. Haystack
looks into the use of artificial intelligence techniques for analyzing
unstructured information and providing more accurate retrieval. We also deal
with the modeling, management, and display of user data in more natural and
useful ways.
BibDesk is a GUI BibTeX bibliography manager
, making extensive
use of Mac OS X-native technologies — including integrated PDF viewing, a
citation-completion Service, etc. Capable of publishing RSS feeds of reading
lists.
Pybliographer is a tool for managing bibliographic databases.
The current stable version is Pybliographer 1. Work is in progress on Pybliographer 2.
Pybliographer 1 allows retrieving, editing, searching and citing bibliographic records using a GNOME graphical interface. Input and output to BibTeX, ISI, Medline, Ovid and Refer formats is provided, plus citation to LaTeX, LyX, HTML and plain text formats. Particularly good support for BibTeX / LaTeX and LyX. Medline queries direct from the application. Customizable BibTeX templates, Language / charset support. Pybliographer 1 is limited to databases of a few thousand records (this will be fixed in Pybliographer 2).
Scheduled for inclusion in the next major release of the open source office project Open Office, this will be an integrated bibliographic storage and formatting module. The OpenOffice technical committee on XML file format specification for office applications is also relevant.
RefDB focuses its attention on a relational database backend and support for markup languages, but includes a simple web client. Supports a variety of structured markup documents (including TEI, DocBook and LaTeX).
Bibliography manager for KDE. The OpenOffice people have this to say about it:
Kaspaliste is a literature and knowledge database. It handles all kinds of
books, articles, journals, web pages etc. But the database goes beyond simply
storing bibliographical information. There is the possibility to create
annotated links between pieces of information (like the content of a book
chapter) and to group the links in categories. It is based on KDE and uses the
Postgres relational database. It is a promising project but currently does not
have import or export functions other than a BibTeX export. There are no links
to other programs such as OpenOffice.
JabRef is a GUI for managing BibTeX databases.
...
JabRef works on all platforms and requires java 1.4.2
Import from ISI Web of Science, Medline/PubMed XML, Scifinder, OVID and INSPEC formats. Export to BibTeX, HTML and plain text. Searching, editing, sorting, duplicate detection, automatic key generation, customizable BibTeX templates, language support.
Bibliography Base for Biologists, since I'm a biologist and only know
about biology journals bibliography
. Java 1.4. Has its own BibTeX-in-XML
format (joy). Import from Endnote export format, RIS, BibTeX, PubMed and CSV.
XSL-based export to BibTeX, HTML, CSV and OpenOffice via CSV or directly to
a database
.
Reference database with file and MySQL backends, oriented towards Docbook. Seems to be dormant.
gBib is a user-friendly editor and browser for BibTeX databases. You
can use it also to insert citations inside a LyX document.
From the site:
...a web-based bibliography-management tool built with Zope and XML.
zNote is intended to ultimately be a replacement for tools like EndNote, ProCite, and to a certain extent, BibTeX. It uses a hierarchical XML data format which is more flexible than flat data, and it works using a set of pretty simple DOM calls to format, edit, etc.
A basic BibTeX-centric bibliographic plone module. Designed for scientific users (this is true of many of these applications, in fact).
Plone-based personal book list and reference manager with support for journals and automatic retrieval of data from Amazon.
SIXPACK is a free BibTeX and Reference Manager...
. The
OpenOffice people have this to say about it (notes in square brackets are theirs):
Sixpack is a graphical and command-line bibliography database manager written
in Perl/Tk. It interacts with the supplied package 'bp', (see below) which can
import and export from a number of formats including BibTeX, Endnote, Medline,
Procite, and many others. It can download references directly off the Web, and
open articles using external viewers. It can also interface with Emacs/XEmacs
and LyX [LaTeX with a GUI interface] . It also has instructions on how to
interact with OpenOffice / StarOffice using CVS files and database import
functions [I have used this].
GNU EPrints 2
EPrints is free software which creates online archives. The default
configuration creates a research papers archive, but could be used for other
purposes.
Z39.50-capable book cataloging and information retrieval application with command-line, GUI and web front ends. Support for barcode scanning with the :CueCat scanner.PostgreSQL backend.
Book collection manager for KDE 3.x. File-based XML system: there's no database backend. There are plans for extension to other types of collections (CDs, for example), and for Z39.50 client capability. Has it's own simple DTD.
Bookcase is a KDE application for keeping track of your book collection.
Ultimately, I'd like it to be similar in capability to AVCataloger or
Readerware, although it's still got a ways to go.
Made in New Zealand by Katipo Communications Ltd. and maintained by a
team of volunteers from around the globe, the Koha system is a full catalog,
OPAC, circulation and acquisitions system.
Greenstone is a suite of software for building and distributing
digital library collections.
OpenBiblio is an easy to use, open source, automated library system
written in PHP containing OPAC, circulation, cataloging, and staff administration
functionality. The purpose of this project is to provide a cost effective
library automation solution for private collections, clubs, churches, schools,
or public libraries.
Web-based library cataloging and circulation database, including administration and OPAC functionality. Apache/MySQL/PHP-based.
Web-based library cataloging database, with an emphasis on electronically-stored collections ('virtual library system'). In addition to human-created catalogs, it crawls the web to do automated catalog creation.
Slashdot-like site covering open source systems for libraries. The projects page is well worth a look if you found some of the links here useful.
develops and promotes interoperability standards that aim to
facilitate the efficient dissemination of content.
dedicated to opening access to the refereed research literature online
through author/institution self-archiving.
Journal metadata database.
What looks like a rather comprehensive list from an old EC project.
Interesting no-cost citation-indexed literature database.
From the co-author of O'Reilly's "Google Hacks". Obsessed with search
engines, databases, and various info-piles since 1998
Text Encoding Initiative. International standard for text encoding aimed at the humanities and social sciences: "maximally expressive and minimally obsolescent".
People publishing free books on the internet (mostly by scanning old books in the public domain). Has also become a place for discussion of recent extensions of copyright.
The vendors of the most popular commercial bibliography management software packages: Endnote, ProCite, Reference Manager.
Commercial catalog search application.
Bruce D'Arcus and John J. Lee, April 2004.