Stateful programmatic web browsing in Python, after Andy Lester's Perl
module WWW::Mechanize
.
mechanize.Browser
is a subclass of
mechanize.UserAgent
, which is, in turn, a subclass of
ClientCookie.OpenerDirector
(like
urllib2.OpenerDirector
) (so any URL can be opened, not just
http:
). mechanize.UserAgent
offers easy
dynamic configuration of user-agent features like protocol, cookie, and
redirection handling, , without having to make a new OpenerDirector each
time, eg. by calling build_opener() (it's not stable yet, though).
ClientForm
interface.
.back()
and .reload()
methods).
Referer
HTTP header is added properly (optional).
An example:
import re from mechanize import Browser b = Browser() b.open("http://www.example.com/") # follow second link with element text matching regular expression response = b.follow_link(text_regex=re.compile(r"cheese\s*shop"), nr=1) assert b.viewing_html() print b.title() print response.geturl() print response.info() # headers print response.read(): # body response.close() b.select_form(name="order") # Browser passes through unknown attributes (including methods) # to the selected HTMLForm (from ClientForm). b["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__) response2 = b.submit() # submit current form response3 = b.back() # back to cheese shop # the history mechanism uses cached requests and responses assert response3 is response # we can still use the response, even though we closed it: response3.seek(0) response3.read() response4 = b.reload() assert response4 is not response3 for link in b.forms(): print form # .links() optionally accepts the keyword args of .follow_/.find_link() for link in b.links(url_regex=re.compile("python.org")): print link
Full documentation is in the docstrings.
Thanks to Ian Bicking, for persuading me that a UserAgent class would be useful.
mechanize.UserAgent
.
HTMLParser
seems to have more parsing problems than
sgmllib
/htmllib
. Fix it.
All documentation (including this web page) is included in the distribution.
This is an alpha release: interfaces may change, and there will be bugs.
Development release.
For installation instructions, see the INSTALL file included in the distribution.
2.2 or above.
ClientCookie 0.4.17 or newer (note the required version!), ClientForm 0.1.x, and pullparser 0.0.4b or newer.
The BSD license (included in distribution).
John J. Lee, January 2004.