Officially this is the wwwsearch page, but since that doesn't exist other than on paper, here is some Python code I've written. Mostly these are modules for client-side web programming, but there are a few random bits of code too. All are free (beer & speech).
Thanks to the people at codespeak.net for generously allowing use of their repository.
To checkout Subversion SVN source, e.g.:
svn co http://codespeak.net/svn/wwwsearch/mechanize/trunk/ mechanize
svn co http://codespeak.net/svn/wwwsearch/ClientForm/trunk/ ClientForm
Please use the mailing list for questions about these web client modules.
Stateful browser-like web scraping, after Andy Lester's Perl module
Requires Python 2.3 or newer,
ClientCookie. BSD license (or ZPL 2.1, at your option).
Client-side HTML form handling. Requires Python 2.0 or better. BSD
license (or ZPL 2.1, at your option).
Now part of mechanize.
Client-side HTTP cookie handling. Optional features:
support, seekable responses, handling of
robots.txt. Requires Python
2.0 or better. BSD license.
Very early alpha release!
Client-side HTML table handling. Currently requires Python 2.2 or better. MIT license.
Now part of mechanize, but the interface is not public.
A simple "pull API" for HTML parsing, after Perl's
HTML::TokeParser. Requires Python 2.2 or better. BSD license.
I'm not planning to maintain this any more (not that it ever saw any maintenance...).
Not well-tested, I must confess: the project for which I wanted these has been dormant for a while, and I haven't got round to running the standard library tests yet so it's possible there are some remaining 1.5.2-incompatible bits in here.
Not much to be said about this, other than that it derives from the Python
2.1 CVS maintenance branch. If you use this
urllib2, use the
urllib from here, too. Python license, of course.
Now distributed with PIDDLE itself.
Note: Since I wrote this, I discovered sketch, which is better than XFig in many ways, and is extensible and scriptable in Python.
An XFig backend for PIDDLE (Plug-In Drawing, Does Little Else). Requires Python 2.0 or better. If you're installing into an old PIDDLE installation, all you need to do is drop it in with the rest of the backends (piddlePS.py, piddlePDF.py, etc).
Just a little script to remove duplicate mail from unix mailboxes (also known as mbox format, or V7 format; used by pine and mutt, amongst other mail clients). This is very useful if like me you periodically back up your mail from a server, but like to keep old messages around on the server too. If you do this, you end up with widely overlapping backups, which makes grepping through them a pain. Running this script over your backed-up mailboxes should ensure you end up with exactly one copy of each message. Please back up your backed-up mailboxes first! Requires Python 2.0 or better. Public domain.
Now for the really obscure stuff...
Note: This is almost certainly not working: I don't use this database any more, and it's not even up-to-date with the current ClientCookie and ClientForm interfaces.
Python interface to the ISI database, via the UK Web of Science web site. This is intended to be part of pybliographer, when I get round to integrating it properly. In the mean time, it works as a Python module. Requires Python 2.0 or better and both ClientCookie (>= 0.3.0b) and ClientForm. MIT license.
Please don't ask me about bell-ringing, I know nothing about it. This plots diagrams like this. Requires reportlab, and Python 2.0 or better. There is a simple graphical interface to operate it, which requires PyQt (which itself requires Qt -- follow the link from the PyQt page). The output is in pdf format -- to view it, you need Acrobat Reader or both ghostscript and GSView (Windows) or ghostscript and gv (unix -- not sure where gv for unix lives, but you probably already have these installed if you're on linux). MIT license.
Please use the mailing list for questions about the web client modules.
John J. Lee, October 2006.