Python bits

Officially this is the wwwsearch page, but since that doesn't exist other than on paper, here is some Python code I've written. Mostly these are modules for client-side web programming, but there are a few random bits of code too. All are free (beer & speech).

Subversion repository

Thanks to the people at for generously allowing use of their repository.

To checkout Subversion SVN source, e.g.:

svn co mechanize

svn co ClientForm

Client-side web programming tools for web scraping and web functional testing

Please use the mailing list for questions about these web client modules.


Beta release.

Stateful browser-like web scraping, after Andy Lester's Perl module WWW::Mechanize. Requires Python 2.3 or newer, pullparser, ClientForm and ClientCookie. BSD license (or ZPL 2.1, at your option).


Client-side HTML form handling. Requires Python 2.0 or better. BSD license (or ZPL 2.1, at your option).

Unmaintained Client-side web code


Now part of mechanize.

Client-side HTTP cookie handling. Optional features: urllib2 support, seekable responses, handling of HTTP-EQUIV, Refresh redirection, Referer and robots.txt. Requires Python 2.0 or better. BSD license.


Alpha release. Requires Python 2.3 (probably 2.2 would work with minor tweaking, but I haven't checked). BSD-ish licenses (see the COPYING file for full details: there are several licenses, due to the inclusion of code from several libraries). The JavaScript support is incomplete and buggy, but the ClientForm work-alike part is relatively stable.

Supports both the ClientForm and HTML DOM interfaces (plus "very alpha" JavaScript support). The ability to switch back and forth between the two interfaces allows simpler code than would result from using either interface alone.


Alpha release.

Python/JavaScript bridge module, making use of Mozilla's spidermonkey JavaScript implementation. GPL.


Very early alpha release!

Client-side HTML table handling. Currently requires Python 2.2 or better. MIT license.


Now part of mechanize, but the interface is not public.

Beta release.

A simple "pull API" for HTML parsing, after Perl's HTML::TokeParser. Requires Python 2.2 or better. BSD license.

Python 1.5.2-compatible

and a to accompany it.

I'm not planning to maintain this any more (not that it ever saw any maintenance...).

Not well-tested, I must confess: the project for which I wanted these has been dormant for a while, and I haven't got round to running the standard library tests yet so it's possible there are some remaining 1.5.2-incompatible bits in here.

Not much to be said about this, other than that it derives from the Python 2.1 CVS maintenance branch. If you use this urllib2, use the urllib from here, too. Python license, of course.

Other stuff


Now distributed with PIDDLE itself.

Note: Since I wrote this, I discovered sketch, which is better than XFig in many ways, and is extensible and scriptable in Python.

An XFig backend for PIDDLE (Plug-In Drawing, Does Little Else). Requires Python 2.0 or better. If you're installing into an old PIDDLE installation, all you need to do is drop it in with the rest of the backends (,, etc).

Just a little script to remove duplicate mail from unix mailboxes (also known as mbox format, or V7 format; used by pine and mutt, amongst other mail clients). This is very useful if like me you periodically back up your mail from a server, but like to keep old messages around on the server too. If you do this, you end up with widely overlapping backups, which makes grepping through them a pain. Running this script over your backed-up mailboxes should ensure you end up with exactly one copy of each message. Please back up your backed-up mailboxes first! Requires Python 2.0 or better. Public domain.

Now for the really obscure stuff...

UK ISI Web of Science (WoS) search module

Note: This is almost certainly not working: I don't use this database any more, and it's not even up-to-date with the current ClientCookie and ClientForm interfaces.

Python interface to the ISI database, via the UK Web of Science web site. This is intended to be part of pybliographer, when I get round to integrating it properly. In the mean time, it works as a Python module. Requires Python 2.0 or better and both ClientCookie (>= 0.3.0b) and ClientForm. MIT license.

Bell-ringing method diagram plotter

Please don't ask me about bell-ringing, I know nothing about it. This plots diagrams like this. Requires reportlab, and Python 2.0 or better. There is a simple graphical interface to operate it, which requires PyQt (which itself requires Qt -- follow the link from the PyQt page). The output is in pdf format -- to view it, you need Acrobat Reader or both ghostscript and GSView (Windows) or ghostscript and gv (unix -- not sure where gv for unix lives, but you probably already have these installed if you're on linux). MIT license.

Please use the mailing list for questions about the web client modules.

John J. Lee, October 2006.