SourceForge.net Logo

Python bits

Officially this is the wwwsearch page, but since that doesn't exist other than on paper, here is some Python code I've written. Mostly these are modules for client-side web programming, but there are a few random bits of code too. All are free (beer & speech).


Subversion repository

Thanks to the people at codespeak.net for generously allowing use of their repository.

To checkout Subversion SVN source, e.g.:

svn co http://codespeak.net/svn/wwwsearch/mechanize/trunk/ mechanize

svn co http://codespeak.net/svn/wwwsearch/ClientForm/trunk/ ClientForm

Client-side web programming tools for web scraping and web functional testing

Please use the mailing list for questions about these web client modules.

mechanize

Beta release.

Stateful browser-like web scraping, after Andy Lester's Perl module WWW::Mechanize. Requires Python 2.3 or newer, pullparser, ClientForm and ClientCookie. BSD license (or ZPL 2.1, at your option).

ClientForm

Client-side HTML form handling. Requires Python 2.0 or better. BSD license (or ZPL 2.1, at your option).


Unmaintained Client-side web code

ClientCookie

Now part of mechanize.

Client-side HTTP cookie handling. Optional features: urllib2 support, seekable responses, handling of HTTP-EQUIV, Refresh redirection, Referer and robots.txt. Requires Python 2.0 or better. BSD license.

DOMForm

Alpha release. Requires Python 2.3 (probably 2.2 would work with minor tweaking, but I haven't checked). BSD-ish licenses (see the COPYING file for full details: there are several licenses, due to the inclusion of code from several libraries). The JavaScript support is incomplete and buggy, but the ClientForm work-alike part is relatively stable.

Supports both the ClientForm and HTML DOM interfaces (plus "very alpha" JavaScript support). The ability to switch back and forth between the two interfaces allows simpler code than would result from using either interface alone.

python-spidermonkey

Alpha release.

Python/JavaScript bridge module, making use of Mozilla's spidermonkey JavaScript implementation. GPL.

ClientTable

Very early alpha release!

Client-side HTML table handling. Currently requires Python 2.2 or better. MIT license.

pullparser

Now part of mechanize, but the interface is not public.

Beta release.

A simple "pull API" for HTML parsing, after Perl's HTML::TokeParser. Requires Python 2.2 or better. BSD license.

Python 1.5.2-compatible urllib2.py

and a urllib.py to accompany it.

I'm not planning to maintain this any more (not that it ever saw any maintenance...).

Not well-tested, I must confess: the project for which I wanted these has been dormant for a while, and I haven't got round to running the standard library tests yet so it's possible there are some remaining 1.5.2-incompatible bits in here.

Not much to be said about this, other than that it derives from the Python 2.1 CVS maintenance branch. If you use this urllib2, use the urllib from here, too. Python license, of course.


Other stuff

piddleFIG

Now distributed with PIDDLE itself.

Note: Since I wrote this, I discovered sketch, which is better than XFig in many ways, and is extensible and scriptable in Python.

An XFig backend for PIDDLE (Plug-In Drawing, Does Little Else). Requires Python 2.0 or better. If you're installing into an old PIDDLE installation, all you need to do is drop it in with the rest of the backends (piddlePS.py, piddlePDF.py, etc).

mailtidy.py

Just a little script to remove duplicate mail from unix mailboxes (also known as mbox format, or V7 format; used by pine and mutt, amongst other mail clients). This is very useful if like me you periodically back up your mail from a server, but like to keep old messages around on the server too. If you do this, you end up with widely overlapping backups, which makes grepping through them a pain. Running this script over your backed-up mailboxes should ensure you end up with exactly one copy of each message. Please back up your backed-up mailboxes first! Requires Python 2.0 or better. Public domain.

Now for the really obscure stuff...

UK ISI Web of Science (WoS) search module

Note: This is almost certainly not working: I don't use this database any more, and it's not even up-to-date with the current ClientCookie and ClientForm interfaces.

Python interface to the ISI database, via the UK Web of Science web site. This is intended to be part of pybliographer, when I get round to integrating it properly. In the mean time, it works as a Python module. Requires Python 2.0 or better and both ClientCookie (>= 0.3.0b) and ClientForm. MIT license.

Bell-ringing method diagram plotter

Please don't ask me about bell-ringing, I know nothing about it. This plots diagrams like this. Requires reportlab, and Python 2.0 or better. There is a simple graphical interface to operate it, which requires PyQt (which itself requires Qt -- follow the link from the PyQt page). The output is in pdf format -- to view it, you need Acrobat Reader or both ghostscript and GSView (Windows) or ghostscript and gv (unix -- not sure where gv for unix lives, but you probably already have these installed if you're on linux). MIT license.

Please use the mailing list for questions about the web client modules.

John J. Lee, October 2006.