ClientCookie is a Python
module for handling HTTP cookies on the client side, useful for
accessing web sites that require cookies to be set and then returned
later. It also provides some other (optional) useful stuff:
HTTP-EQUIV
handling, zero-time Refresh handling, and lazily-seekable
responses. It has developed from a port of Gisle Aas' Perl module
HTTP::Cookies
, from the libwww-perl library.
import ClientCookie response = ClientCookie.urlopen("http://foo.bar.com/")
This function behaves identically to urllib2.urlopen
,
except that it deals with cookies automatically. That's probably all
you need to know.
Here is a more complicated example, involving Request
objects (useful if you want to pass Request
s around, add
headers to them, etc.):
import ClientCookie import urllib2 request = urllib2.Request("http://www.acme.com/") # note we're using the urlopen from ClientCookie, not urllib2 response = ClientCookie.urlopen(request) # let's say this next request requires a cookie that was set in response request2 = urllib2.Request("http://www.acme.com/flying_machines.html") response2 = ClientCookie.urlopen(request2)
In these examples, the workings are hidden inside the
ClientCookie.urlopen
method, which is an extension of
urllib2.urlopen
. Redirects, proxies and cookies are
handled automatically by this function. Other, lower-level,
cookie-aware extensions of urllib2
callables provided
are: build_opener
, install_opener
,
HTTPHandler
and HTTPSHandler
(if your Python
installation has HTTPS support). A bugfixed
HTTPRedirectHandler
is also included (the bug, related to
redirection, will be fixed in 2.3 final and 2.2.3).
An example at a slightly lower level shows what the module is doing more clearly:
import ClientCookie import urllib2 request = urllib2.Request("http://www.acme.com/") response = urllib2.urlopen(request) c = ClientCookie.Cookies() c.extract_cookies(response, request) # let's say this next request requires a cookie that was set in response request2 = urllib2.Request("http://www.acme.com/flying_machines.html") c.add_cookie_header(request2) response2 = urllib2.urlopen(request2) print response2.geturl() print response2.info() # headers for line in response2.readlines(): # body print line
The Cookie class does all the work. There are essentially two
operations: extract_cookies
extracts HTTP cookies from
Set-Cookie
(the original Netscape
cookie standard) and Set-Cookie2
(RFC 2965) headers from
a response if and only if they should be set given the request, and
add_cookie_header
adds Cookie
headers if and
only if they are appropriate for a particular HTTP request. Incoming
cookies are checked for acceptability based on the host name, etc.
Cookies are only set on outgoing requests if they match the request's
host name, path, etc. Cookies may be also be saved to and loaded from
a file. The subclass NetscapeCookies
differs from
Cookies
only in storing cookies using a different,
Netscape-compatible, file format. This Netscape-compatible
('cookies.txt') format loses some information when you save cookies to
a file.
Note that if you're using ClientCookie.urlopen
(or
ClientCookie.HTTPHandler
or
ClientCookie.HTTPSHandler
), you don't need to call
extract_cookies
or add_cookie
header
yourself.
Python 1.5.2 or above is required, and urllib2
is
recommended (for Python 1.5.2, use this
urllib2
and this
urllib
). Note that the version of urllib2
from Python 2.0 is too old: if you have Python 2.0, get the version from
Python 2.1 (available from the source distribution or CVS here
from http://www.python.org/), or
use the 1.5.2-compatible version. Note that you don't need to replace
the original urllib2
/ urllib
- you can just
make sure they're in sys.path
ahead of the copies from
2.0's standard library.
For full documentation, see the docstrings in ClientCookie.py.
This is a beta release (will be some time before I consider it stable, but it's quite useable as it is).
Development release. This is an alpha release: interfaces will change, and it's quite likely to be broken.
For installation instructions, see the INSTALL file included in the distribution.
Cookie
, do
this?
No: Cookie.py does the server end of the job. It doesn't know when to accept cookies from a server or when to pass them back.
1.5.2 or above.
No. You probably want it, though.
You don't, but if you want to use the extended urllib2
callables from ClientCookie, and you have Python 2.0, you need to
upgrade to
the version from Python 2.1 (or use the 1.5.2-compatible
version). If you have Python 1.5.2, use this
urllib2
and
urllib
. Otherwise, you're
OK.
The MIT license (included in distribution).
There is more than one protocol, in fact (see the module docstring for a brief explanation of the history):
from ClientCookie import Cookies print Cookies.extract_cookies.__doc__ print Cookies.add_cookie_header.__doc__
Not to my knowledge. Patches welcome.
NetscapeCookies
?
The former refers to a protocol. The latter is a class in this
package. The reason that the class is named that way has nothing
to do with the protocol - rather, it has to do with the fact
that NetscapeCookies
is able to save cookies in a format
readable by the Netscape Navigator browser (cookies.txt).
Good question. If you know the answer, please let me know.
MSIECookies
does allow loading of cookies from MSIE,
but the format is not well enough understood to save to the same format.
import ClientCookie; print ClientCookie.__doc__
The rest of the docstrings in ClientCookie/_ClientCookie.py are worth reading, too, if you want to do something unusual.
Just call read
or readline
methods on your
response object as many times as you need. The seek method still
works, because the response objects cache read data.
/* FALL THRU */
META HTTP-EQUIV
and Refresh
handling?
import ClientCookie cookies = ClientCookie.Cookies() hh = ClientCookie.HTTPHandler(cookies, handle_http_equiv=1, handle_refresh=1, seekable_responses=0) opener = ClientCookie.build_opener(hh) ClientCookie.install_opener(opener) r = ClientCookie.urlopen("http://www.rhubarb.com/")
ClientCookie.urlopen
uses a global Cookies instance.
How do I avoid global cookies?
import ClientCookie cookies = ClientCookie.Cookies() opener = ClientCookie.build_opener(ClientCookie.HTTPHandler(cookies)) opener.open("http://www.acme.com/")
ClientCookie.urlopen
?
See ClientCookie.__doc__
for full details. Either
(simplest):
import ClientCookie from ClientCookie import Cookies cookies = Cookies(netscape_only=1, blocked_domains=["doubleclick.net"]) # Build an OpenerDirector that uses an HTTPHandler that uses the cookies # instance we've just made. build_opener will add other handlers (such # as FTPHandler) automatically, so we only have to pass an HTTPHandler. opener = ClientCookie.build_opener(ClientCookie.HTTPHandler(cookies)) r = opener.open("http://www.adverts-r-us.co.uk/")after which you can call methods on the cookies object as required.
Alternatively (not recommended) make a Cookies
instance,
then use it by calling add_cookie_header
and
extract_cookies
on it.
For more hints on debugging, print ClientCookie.__doc__
HIDDEN
HTML form controls, for example). Maybe you
messed up your request, and the server is sending you some standard
failure page (even if the page doesn't appear to indicate any failure).
urlopen
and the
extract_cookies
/ add_cookie_header methods
?
Don't do that. You probably just want to use urlopen
,
unless you don't have urllib2
, in which case you should
use the 'manual' mode (extract_cookies
/
add_cookie_header
).
ignore_discard
to
a true value in the Cookies constructor:
import ClientCookie cookies = ClientCookie.Cookies(ignore_discard=1) opener = ClientCookie.build_opener(ClientCookie.HTTPHandler(cookies)) ClientCookie.install_opener(opener) r = ClientCookie.urlopen("http://foobar.com/")
import ClientCookie cookies = ClientCookie.Cookies(netscape_only=1) opener = ClientCookie.build_opener(ClientCookie.HTTPHandler(cookies)) ClientCookie.install_opener(opener) r = ClientCookie.urlopen("http://foobar.com/")
Referer
[sic] and / or
User-Agent
HTTP headers to be set to appropriate values
before it'll send you any cookies.
import ClientCookie, urllib2 req = urllib2.Request("http://foobar.com/") req.add_header("Referer", "http://wwwsearch.sourceforge.net/ClientCookie/") r = ClientCookie.urlopen(req)See
ClientCookie.__doc__
for how to change
User-Agent
, which is a special case, because
urllib2.OpenerDirector
automatically adds it.
John J. Lee, May 2003.