This module is unmaintained. Maybe someday...
that code automatically executed at appropriate times, and have the results
reflected both in an HTML DOM tree and in a higher-level browser-like object
model (only the ClientForm part of this browser interface is implemented so
far). Of course, automatic execution of much code depends on the use of either
the browser-like interface or equivalent DOM methods: otherwise, the code can't
It's easy to switch between the ClientForm API and the DOM, thus making it hard to get stuck in a position where further progress requires disproportionate coding effort:
from urllib2 import urlopen from DOMForm import ParseResponse response = urlopen("http://www.example.com/") window = ParseResponse(response) window.document # HTML DOM Level 2 HTMLDocument interface forms = window._htmlforms # list of objects supporting ClientForm.HTMLForm i/face form = forms assert form.name == "some_form" domform = form.node # level 2 HTML DOM HTMLFormElement interface control = form.find_control("some_control") # ClientForm.Control i/face domcontrol = control.node # corresponding level 2 HTML DOM HTMLElement i/face doc.some_form._htmlform # back to the ClientForm.HTMLForm interface again doc.some_form.some_control._control # ClientForm.Control interface again response = urlopen(form.click()) # domform.submit() also works
Note that the level 2 HTML DOM interface is currently based on an old version of the specification, with some imperfect changes to provide some support for XHTML.
The HTML DOM should allow you to get at anything you need to know. Still,
since the DOM does some normalisation and is only created after the original
HTML has been fed through HTMLTidy, you may sometimes need or want access to
the original HTML. ClientCookie's
SeekableProcessor is one way of
from ClientCookie import build_opener, SeekableProcessor opener = build_opener(SeekableProcessor) response = opener.open("http://www.example.com/") window = ParseResponse(response) html = response.read() response.seek(0) # carry on using response object as if it hadn't been .read()
Or you can store the html somewhere, then use ParseFile instead of ParseResponse.
from xml.dom.ext import XHtmlPrint XHtmlPrint(doc, fileobj)
XHtmlPrettyPrint makes nicer output. Both functions will print
any DOM node, not just an
There's some more documentation in the docstrings.
Thanks to Andrew Clover for advice and code on DOM 'liveness', all the PyXML
contributors, and Gisle Aas, for the
HTML::Form Perl code from
which ClientForm was originally derived.
and the DOM implementation. The ClientForm work-alike stuff is
relatively stable (but see the entities and
select_default bugs listed below).
decorate_DOM(window)after this happens, to regenerate the
HTMLFormand all its Controls, and rebind them to the DOM. I probably won't fix this (I'm guessing it won't cause problems).
Windowclass is still just stubs. This will be fixed, gradually. ATM, you can likely quite easily derive your own
Windowclass with stubs that suit your application, and pass it to one of the
Parse*functions through the
innerHTMLisn't implemented. Thanks to my hacks (for live-ness, IE compatibility, bug fixes, changes to match newest DOM standard etc.), it's probably quite buggy, too.
RADIOcontrols. This should be fixed soon.
onclick- executed. You just have to fire your own events:
from DOMForm import fireHTMLEvent, fireMouseEvent # Say we've got a DOM node, domnode, representing a button, and we want to # simulate clicking it. fireHTMLEvent(domnode, "focus") fireMouseEvent(domnode, "click") fireHTMLEvent(domnode, "blur") # Of course, this is missing events like mouseover, which would be fired # by a browser, but we probably don't even need the focus or blur either.
For installation instructions, see the INSTALL file included in the distribution.
Development release. This is the first alpha release: there are many known bugs, and interfaces will change.
Good question. I wanted something smaller, not dependant on any browser, and also liked the idea of an easy-to-understand implementation of the browser object model in pure Python.
2.3 (earlier versions may work, but are untested).
_htmlformsbegin with an underscore?
Because attributes that start with an underscore ("_") are not
I prefer questions and comments to be sent to the mailing list rather than direct to me.
John J. Lee, May 2006.