ClientForm is a Python module for handling HTML forms on the client
side, useful for parsing HTML forms, filling them in and returning the
completed forms to the server. It developed from a port of Gisle Aas'
HTML::Form, from the libwww-perl library, but the
interface is not the same.
from urllib2 import urlopen from ClientForm import ParseResponse forms = ParseResponse(urlopen("http://www.example.com/form.html")) form = forms print form form["author"] = "Gisle Aas" # form.click() returns a urllib2.Request object # (see HTMLForm.click.__doc__ if you don't have urllib2) response = urlopen(form.click("Thanks"))
A more complicated example:
import ClientForm import urllib2 request = urllib2.Request("http://www.example.com/form.html") response = urllib2.urlopen(request) forms = ClientForm.ParseResponse(response) response.close() form = forms print form # very useful! # Indexing allows setting and retrieval of control values original_text = form["comments"] # a string, NOT a Control instance form["comments"] = "Blah." # Controls that represent lists (checkbox, select and radio lists) are # ListControls. Their values are sequences of list item names. # They come in two flavours: single- and multiple-selection: print form.possible_items("cheeses") form["favorite_cheese"] = ["brie"] # single form["cheeses"] = ["parmesan", "leicester", "cheddar"] # multi # is the "parmesan" item of the "cheeses" control selected? print "parmesan" in form["cheeses"] # does cheeses control have a "caerphilly" item? print "caerphilly" in form.possible_items("cheeses") # Sometimes one wants to set or clear individual items in a list: # select the item named "gorgonzola" in the first control named "cheeses" form.set(True, "gorgonzola", "cheeses") # You can be more specific: supply at least one of name, type, kind, id # and nr (most other methods on HTMLForm take the same form of arguments): # deselect "edam" in third CHECKBOX control form.set(False, "edam", type="checkbox", nr=2) # You can explicitly say that you're referring to a ListControl: # set whole value (rather than just one item of) "cheeses" ListControl form.set_value(["gouda"], name="cheeses", kind="list") # last example is almost equivalent to following (but insists that the # control be a ListControl -- so it will skip any non-list controls that # come before the control we want) form["cheeses"] = ["gouda"] # The kind argument can also take values "multilist", "singlelist", "text", # "clickable" and "file": # find first control that will accept text, and scribble in it form.set_value("rhubarb rhubarb", kind="text") form.set_value([""], kind="singlelist") # Often, a single checkbox (a CHECKBOX control with a single item) is # present. In that case, the name of the single item isn't of much # interest, so it's useful to be able to check and uncheck the box # without using the item name: form.set_single(True, "smelly") # check form.set_single(False, "smelly") # uncheck # Add files to FILE controls with .add_file(). Only call this multiple # times if the server is expecting multiple files. # add a file, default value for MIME type, no filename sent to server form.add_file(open("data.dat")) # add a second file, explicitly giving MIME type, and telling the server # what the filename is form.add_file(open("data.txt"), "text/plain", "data.txt") # Many methods have a by_label argument, allowing specification of list # items by label instead of by name. At the moment, only SelectControl # supports this argument (this will be fixed). Sometimes labels are # easier to maintain than names, sometimes the other way around. form.set_value(["Mozzarella", "Caerphilly"], "cheeses", by_label=True) # It's also possible to get at the individual controls inside the form. # This is useful for calling several methods in a row on a single control, # and for the less common operations. The methods are quite similar to # those on HTMLForm: control = form.find_control("cheeses", type="select") print control.value, control.name, control.type print control.possible_items() control.value = ["mascarpone", "curd"] control.set(True, "limburger") # All Controls may be disabled (equivalent of greyed-out in browser) control = form.find_control("comments") print control.disabled # ...or readonly print control.readonly # readonly and disabled attributes can be assigned to control.disabled = False # convenience method, used here to make all controls writable (unless # they're disabled): form.set_all_readonly(False) # ListControl items may also be disabled (setting a disabled item is not # allowed, but clearing one is allowed): print control.get_item_disabled("emmenthal") control.set_item_disabled(True, "emmenthal") # enable all items in control control.set_all_items_disabled(False) # HTMLForm.controls is a list of all controls in the form for control in form.controls: if control.value == "inquisition": sys.exit() request2 = form.click() # urllib2.Request object response2 = urllib2.urlopen(request2) print response2.geturl() print response2.info() # headers print response2.read() # body response2.close()
All of the standard control types are supported:
TYPE=BUTTON and the various
FILE (for file upload). Both standard form encodings
multipart/form-data) are supported.
The module is designed for testing and automation of web interfaces, not for implementing interactive user agents.
Security note: Remember that any passwords you store in
HTMLForm instances will be saved to disk in the clear if you
pickle them (directly or indirectly). The simplest solution to this is to
HTMLForm objects. You could also pickle before
filling in any password, or just set the password to
Python 1.5.2 or above is required. To run the tests, you need the
unittest module (from PyUnit).
unittest is a standard library module with Python 2.1 and
For full documentation, see the docstrings in ClientForm.py.
Note: this page describes the 0.1.x interface. See here for the old 0.0.x interface.
For installation instructions, see the INSTALL file included in the distribution.
Stable release.. There have been many interface changes since 0.0.x, so I don't recommend upgrading old code from 0.0.x unless you want the new features.
FILE control support for file upload, handling
of disabled list items, and a redesigned interface.
cgi, do this?
cgi module does the server end of the job. It
doesn't know how to parse or fill in a form or how to send it back to the
1.5.2 or above.
.click_request_data() instead of
urllib2do I need?
You don't. It's convenient, though. If you have Python 2.0, you need to
upgrade to the version from Python 2.1 (available from www.python.org). Alternatively, use the
1.5.2-compatible version. If you have Python 1.5.2, use this
urllib. Otherwise, you're OK.
The BSD license (included in distribution).
Yes, since 0.1.12.
print form is usually all you need.
HTMLForm.possible_items can be useful. Note that it's
possible to use item labels instead of item names, which can be useful
— use the
by_label arguments to the various methods,
.set_value_by_label() methods on
SelectControl currently supports item labels (which
OPTION element contents). I might not bother to
fix this, since it seems it's probably only useful for
'*'characters mean in the string representations of list controls?
* next to an item means that item is selected.
(foo) around an item mean that item is disabled.
.click*()when that control has non-
Either the control is disabled, or it is not successful for some other reason. 'Successful' (see HTML 4 specification) means that the control will cause data to get sent to the server.
Because by default, it follows browser behaviour when setting the
initially-selected items in list controls that have no items explicitly
selected in the HTML. Use the
select_default argument to
ParseResponse if you want to follow the RFC 1866 rules
instead. Note that browser behaviour violates the HTML 4.01 specification
in the case of
.click()ing on a button not work for me?
RESETbutton doesn't do anything, by design - this is a library for web automation, not an interactive browser. Even in an interactive browser, clicking on
RESETsends nothing to the server, so there is little point in having
.click()do anything special here.
BUTTON TYPE=BUTTONdoesn't do anything either, also by design. This time, the reason is that that
BUTTONis only in the HTML standard so that one can attach callbacks to its events. The callbacks are functions in
BUTTONwhose type is
See the General FAQs page for what to do about this.
import bisect def closest_int_value(form, ctrl_name, value): values = map(int, form.possible_items(ctrl_name)) return str(values[bisect.bisect(values, value) - 1]) form["distance"] = [closest_int_value(form, "distance", 23)]
John J. Lee, January 2005.