SourceForge.net Logo

ClientForm

ClientForm is a Python module for handling HTML forms on the client side, useful for parsing HTML forms, filling them in and returning the completed forms to the server. It developed from a port of Gisle Aas' Perl module HTML::Form, from the libwww-perl library, but the interface is not the same.

Simple example:

 from urllib2 import urlopen
 from ClientForm import ParseResponse

 forms = ParseResponse(urlopen("http://www.example.com/form.html"))
 form = forms[0]
 print form
 form["author"] = "Gisle Aas"

 # form.click() returns a urllib2.Request object
 # (see HTMLForm.click.__doc__ if you don't have urllib2)
 response = urlopen(form.click("Thanks"))

A more complicated example:

 import ClientForm
 import urllib2
 request = urllib2.Request("http://www.example.com/form.html")
 response = urllib2.urlopen(request)
 forms = ClientForm.ParseResponse(response)
 response.close()
 form = forms[0]
 print form  # very useful!

 # Indexing allows setting and retrieval of control values
 original_text = form["comments"]  # a string, NOT a Control instance
 form["comments"] = "Blah."

 # Controls that represent lists (checkbox, select and radio lists) are
 # ListControls.  Their values are sequences of list item names.
 # They come in two flavours: single- and multiple-selection:
 print form.possible_items("cheeses")
 form["favorite_cheese"] = ["brie"]  # single
 form["cheeses"] = ["parmesan", "leicester", "cheddar"]  # multi
 #  is the "parmesan" item of the "cheeses" control selected?
 print "parmesan" in form["cheeses"]
 #  does cheeses control have a "caerphilly" item?
 print "caerphilly" in form.possible_items("cheeses")

 # Sometimes one wants to set or clear individual items in a list:
 #  select the item named "gorgonzola" in the first control named "cheeses"
 form.set(True, "gorgonzola", "cheeses")
 # You can be more specific: supply at least one of name, type, kind, id
 # and nr (most other methods on HTMLForm take the same form of arguments):
 #  deselect "edam" in third CHECKBOX control
 form.set(False, "edam", type="checkbox", nr=2)

 # You can explicitly say that you're referring to a ListControl:
 #  set whole value (rather than just one item of) "cheeses" ListControl
 form.set_value(["gouda"], name="cheeses", kind="list")
 #  last example is almost equivalent to following (but insists that the
 #  control be a ListControl -- so it will skip any non-list controls that
 #  come before the control we want)
 form["cheeses"] = ["gouda"]
 # The kind argument can also take values "multilist", "singlelist", "text",
 # "clickable" and "file":
 #  find first control that will accept text, and scribble in it
 form.set_value("rhubarb rhubarb", kind="text")
 form.set_value([""], kind="singlelist")

 # Often, a single checkbox (a CHECKBOX control with a single item) is
 # present.  In that case, the name of the single item isn't of much
 # interest, so it's useful to be able to check and uncheck the box
 # without using the item name:
 form.set_single(True, "smelly")  # check
 form.set_single(False, "smelly")  # uncheck

 # Add files to FILE controls with .add_file().  Only call this multiple
 # times if the server is expecting multiple files.
 #  add a file, default value for MIME type, no filename sent to server
 form.add_file(open("data.dat"))
 #  add a second file, explicitly giving MIME type, and telling the server
 #   what the filename is
 form.add_file(open("data.txt"), "text/plain", "data.txt")

 # Many methods have a by_label argument, allowing specification of list
 # items by label instead of by name.  At the moment, only SelectControl
 # supports this argument (this will be fixed).  Sometimes labels are
 # easier to maintain than names, sometimes the other way around.
 form.set_value(["Mozzarella", "Caerphilly"], "cheeses", by_label=True)

 # It's also possible to get at the individual controls inside the form.
 # This is useful for calling several methods in a row on a single control,
 # and for the less common operations.  The methods are quite similar to
 # those on HTMLForm:
 control = form.find_control("cheeses", type="select")
 print control.value, control.name, control.type
 print control.possible_items()
 control.value = ["mascarpone", "curd"]
 control.set(True, "limburger")

 # All Controls may be disabled (equivalent of greyed-out in browser)
 control = form.find_control("comments")
 print control.disabled
 # ...or readonly
 print control.readonly
 # readonly and disabled attributes can be assigned to
 control.disabled = False
 # convenience method, used here to make all controls writable (unless
 # they're disabled):
 form.set_all_readonly(False)
 # ListControl items may also be disabled (setting a disabled item is not
 # allowed, but clearing one is allowed):
 print control.get_item_disabled("emmenthal")
 control.set_item_disabled(True, "emmenthal")
 #  enable all items in control
 control.set_all_items_disabled(False)

 # HTMLForm.controls is a list of all controls in the form
 for control in form.controls:
     if control.value == "inquisition": sys.exit()

 request2 = form.click()  # urllib2.Request object
 response2 = urllib2.urlopen(request2)

 print response2.geturl()
 print response2.info()  # headers
 print response2.read()  # body
 response2.close()

All of the standard control types are supported: TEXT, PASSWORD, HIDDEN, TEXTAREA, ISINDEX, RESET, BUTTON (INPUT TYPE=BUTTON and the various BUTTON types), SUBMIT, IMAGE, RADIO, CHECKBOX, SELECT/OPTION and FILE (for file upload). Both standard form encodings (application/x-www-form-urlencoded and multipart/form-data) are supported.

The module is designed for testing and automation of web interfaces, not for implementing interactive user agents.

Security note: Remember that any passwords you store in HTMLForm instances will be saved to disk in the clear if you pickle them (directly or indirectly). The simplest solution to this is to avoid pickling HTMLForm objects. You could also pickle before filling in any password, or just set the password to "" before pickling.

Python 1.5.2 or above is required. To run the tests, you need the unittest module (from PyUnit). unittest is a standard library module with Python 2.1 and above.

For full documentation, see the docstrings in ClientForm.py.

Note: this page describes the 0.1.x interface. See here for the old 0.0.x interface.

Download

For installation instructions, see the INSTALL file included in the distribution.

Stable release.. There have been many interface changes since 0.0.x, so I don't recommend upgrading old code from 0.0.x unless you want the new features.

0.1.x includes FILE control support for file upload, handling of disabled list items, and a redesigned interface.


Old release.

FAQs

John J. Lee, January 2005.