This is Info file pylibi, produced by Makeinfo-1.55 from the input file lib.texi. This file describes the built-in types, exceptions and functions and the standard modules that come with the Python system. It assumes basic knowledge about the Python language. For an informal introduction to the language, see the Python Tutorial. The Python Reference Manual gives a more formal definition of the language. (These manuals are not yet available in INFO or Texinfo format.) Copyright 1991-1995 by Stichting Mathematisch Centrum, Amsterdam, The Netherlands. All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the names of Stichting Mathematisch Centrum or CWI or Corporation for National Research Initiatives or CNRI not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. While CWI is the initial source for this software, a modified version is made available by the Corporation for National Research Initiatives (CNRI) at the Internet address ftp://ftp.python.org. STICHTING MATHEMATISCH CENTRUM AND CNRI DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM OR CNRI BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. File: pylibi, Node: Installing your CGI script on a Unix system, Next: Testing your CGI script, Prev: Caring about security, Up: cgi Installing your CGI script on a Unix system ------------------------------------------- Read the documentation for your HTTP server and check with your local system administrator to find the directory where CGI scripts should be installed; usually this is in a directory `cgi-bin' in the server tree. Make sure that your script is readable and executable by "others"; the Unix file mode should be 755 (use `chmod 755 filename'). Make sure that the first line of the script contains `#!' starting in column 1 followed by the pathname of the Python interpreter, for instance: #!/usr/local/bin/python Make sure the Python interpreter exists and is executable by "others". Make sure that any files your script needs to read or write are readable or writable, respectively, by "others" - their mode should be 644 for readable and 666 for writable. This is because, for security reasons, the HTTP server executes your script as user "nobody", without any special privileges. It can only read (write, execute) files that everybody can read (write, execute). The current directory at execution time is also different (it is usually the server's cgi-bin directory) and the set of environment variables is also different from what you get at login. in particular, don't count on the shell's search path for executables (`$PATH') or the Python module search path (`$PYTHONPATH') to be set to anything interesting. If you need to load modules from a directory which is not on Python's default module search path, you can change the path in your script, before importing other modules, e.g.: import sys sys.path.insert(0, "/usr/home/joe/lib/python") sys.path.insert(0, "/usr/local/lib/python") (This way, the directory inserted last will be searched first!) Instructions for non-Unix systems will vary; check your HTTP server's documentation (it will usually have a section on CGI scripts). File: pylibi, Node: Testing your CGI script, Next: Debugging CGI scripts, Prev: Installing your CGI script on a Unix system, Up: cgi Testing your CGI script ----------------------- Unfortunately, a CGI script will generally not run when you try it from the command line, and a script that works perfectly from the command line may fail mysteriously when run from the server. There's one reason why you should still test your script from the command line: if it contains a syntax error, the python interpreter won't execute it at all, and the HTTP server will most likely send a cryptic error to the client. Assuming your script has no syntax errors, yet it does not work, you have no choice but to read the next section: File: pylibi, Node: Debugging CGI scripts, Next: Common problems and solutions, Prev: Testing your CGI script, Up: cgi Debugging CGI scripts --------------------- First of all, check for trivial installation errors - reading the section above on installing your CGI script carefully can save you a lot of time. If you wonder whether you have understood the installation procedure correctly, try installing a copy of this module file (`cgi.py') as a CGI script. When invoked as a script, the file will dump its environment and the contents of the form in HTML form. Give it the right mode etc, and send it a request. If it's installed in the standard `cgi-bin' directory, it should be possible to send it a request by entering a URL into your browser of the form: http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home If this gives an error of type 404, the server cannot find the script - perhaps you need to install it in a different directory. If it gives another error (e.g. 500), there's an installation problem that you should fix before trying to go any further. If you get a nicely formatted listing of the environment and form content (in this example, the fields should be listed as "addr" with value "At Home" and "name" with value "Joe Blow"), the `cgi.py' script has been installed correctly. If you follow the same procedure for your own script, you should now be able to debug it. The next step could be to call the `cgi' module's test() function from your script: replace its main code with the single statement cgi.test() This should produce the same results as those gotten from installing the `cgi.py' file itself. When an ordinary Python script raises an unhandled exception (e.g. because of a typo in a module name, a file that can't be opened, etc.), the Python interpreter prints a nice traceback and exits. While the Python interpreter will still do this when your CGI script raises an exception, most likely the traceback will end up in one of the HTTP server's log file, or be discarded altogether. Fortunately, once you have managed to get your script to execute *some* code, it is easy to catch exceptions and cause a traceback to be printed. The `test()' function below in this module is an example. Here are the rules: 1. Import the traceback module (before entering the try-except!) 2. Make sure you finish printing the headers and the blank line early 3. Assign `sys.stderr' to `sys.stdout' 4. Wrap all remaining code in a try-except statement 5. In the except clause, call `traceback.print_exc()' For example: import sys import traceback print "Content-type: text/html" print sys.stderr = sys.stdout try: ...your code here... except: print "\n\n
"
     		traceback.print_exc()
Notes: The assignment to `sys.stderr' is needed because the traceback
prints to `sys.stderr'.  The `print "nn
"' statement is necessary to
disable the word wrapping in HTML.
If you suspect that there may be a problem in importing the traceback
module, you can use an even more robust approach (which only uses
built-in modules):
     	import sys
     	sys.stderr = sys.stdout
     	print "Content-type: text/plain"
     	print
     	...your code here...
This relies on the Python interpreter to print the traceback.  The
content type of the output is set to plain text, which disables all
HTML processing.  If your script works, the raw HTML will be displayed
by your client.  If it raises an exception, most likely after the first
two lines have been printed, a traceback will be displayed.  Because no
HTML interpretation is going on, the traceback will readable.
File: pylibi,  Node: Common problems and solutions,  Prev: Debugging CGI scripts,  Up: cgi
Common problems and solutions
-----------------------------
   * Most HTTP servers buffer the output from CGI scripts until the
     script is completed.  This means that it is not possible to
     display a progress report on the client's display while the script
     is running.
   * Check the installation instructions above.
   * Check the HTTP server's log files.  (`tail -f logfile' in a
     separate window may be useful!)
   * Always check a script for syntax errors first, by doing something
     like `python script.py'.
   * When using any of the debugging techniques, don't forget to add
     `import sys' to the top of the script.
   * When invoking external programs, make sure they can be found.
     Usually, this means using absolute path names - `$PATH' is usually
     not set to a very useful value in a CGI script.
   * When reading or writing external files, make sure they can be read
     or written by every user on the system.
   * Don't try to give a CGI script a set-uid mode.  This doesn't work
     on most systems, and is a security liability as well.
File: pylibi,  Node: urllib,  Next: httplib,  Prev: cgi,  Up: Internet and WWW
Standard Module `urllib'
========================
This module provides a high-level interface for fetching data across
the World-Wide Web.  In particular, the `urlopen' function is similar
to the built-in function `open', but accepts URLs (Universal Resource
Locators) instead of filenames.  Some restrictions apply -- it can only
open URLs for reading, and no seek operations are available.
it defines the following public functions:
 - function of module urllib: urlopen (URL)
     Open a network object denoted by a URL for reading.  If the URL
     does not have a scheme identifier, or if it has `file:' as its
     scheme identifier, this opens a local file; otherwise it opens a
     socket to a server somewhere on the network.  If the connection
     cannot be made, or if the server returns an error code, the
     `IOError' exception is raised.  If all went well, a file-like
     object is returned.  This supports the following methods:
     `read()', `readline()', `readlines()', `fileno()', `close()' and
     `info()'.  Except for the last one, these methods have the same
     interface as for file objects -- see the section on File Objects
     earlier in this manual.  (It's not a built-in file object,
     however, so it can't be used at those few places where a true
     built-in file object is required.)
     The `info()' method returns an instance of the class
     `rfc822.Message' containing the headers received from the server,
     if the protocol uses such headers (currently the only supported
     protocol that uses this is HTTP).  See the description of the
     `rfc822' module.
 - function of module urllib: urlretrieve (URL)
     Copy a network object denoted by a URL to a local file, if
     necessary.  If the URL points to a local file, or a valid cached
     copy of the object exists, the object is not copied.  Return a
     tuple (FILENAME, HEADERS) where FILENAME is the local file name
     under which the object can be found, and HEADERS is either `None'
     (for a local object) or whatever the `info()' method of the object
     returned by `urlopen()' returned (for a remote object, possibly
     cached).  Exceptions are the same as for `urlopen()'.
 - function of module urllib: urlcleanup ()
     Clear the cache that may have been built up by previous calls to
     `urlretrieve()'.
 - function of module urllib: quote (STRING[, ADDSAFE])
     Replace special characters in STRING using the `%xx' escape.
     Letters, digits, and the characters "`_,.-'" are never quoted.
     The optional ADDSAFE parameter specifies additional characters
     that should not be quoted -- its default value is `'/''.
     Example: `quote('/~conolly/')' yields `'/%7econnolly/''.
 - function of module urllib: unquote (STRING)
     Replace `%xx' escapes by their single-character equivalent.
     Example: `unquote('/%7Econnolly/')' yields `'/~connolly/''.
Restrictions:
   * Currently, only the following protocols are supported: HTTP,
     (versions 0.9 and 1.0), Gopher (but not Gopher-+), FTP, and local
     files.
   * The caching feature of `urlretrieve()' has been disabled until I
     find the time to hack proper processing of Expiration time headers.
   * There should be a function to query whether a particular URL is in
     the cache.
   * For backward compatibility, if a URL appears to point to a local
     file but the file can't be opened, the URL is re-interpreted using
     the FTP protocol.  This can sometimes cause confusing error
     messages.
   * The `urlopen()' and `urlretrieve()' functions can cause
     arbitrarily long delays while waiting for a network connection to
     be set up.  This means that it is difficult to build an interactive
     web client using these functions without using threads.
   * The data returned by `urlopen()' or `urlretrieve()' is the raw
     data returned by the server.  This may be binary data (e.g. an
     image), plain text or (for example) HTML.  The HTTP protocol
     provides type information in the reply header, which can be
     inspected by looking at the `Content-type' header.  For the Gopher
     protocol, type information is encoded in the URL; there is
     currently no easy way to extract it.  If the returned data is
     HTML, you can use the module `htmllib' to parse it.
   * Although the `urllib' module contains (undocumented) routines to
     parse and unparse URL strings, the recommended interface for URL
     manipulation is in module `urlparse'.
File: pylibi,  Node: httplib,  Next: ftplib,  Prev: urllib,  Up: Internet and WWW
Standard Module `httplib'
=========================
This module defines a class which implements the client side of the
HTTP protocol.  It is normally not used directly -- the module `urllib'
uses it to handle URLs that use HTTP.
The module defines one class, `HTTP'.  An `HTTP' instance represents
one transaction with an HTTP server.  It should be instantiated passing
it a host and optional port number.  If no port number is passed, the
port is extracted from the host string if it has the form `host:port',
else the default HTTP port (80) is used.  If no host is passed, no
connection is made, and the `connect' method should be used to connect
to a server.  For example, the following calls all create instances
that connect to the server at the same host and port:
     >>> h1 = httplib.HTTP('www.cwi.nl')
     >>> h2 = httplib.HTTP('www.cwi.nl:80')
     >>> h3 = httplib.HTTP('www.cwi.nl', 80)
Once an `HTTP' instance has been connected to an HTTP server, it should
be used as follows:
  1. 1.  Make exactly one call to the `putrequest()' method.
  2. 2.  Make zero or more calls to the `putheader()' method.
  3. 3.  Call the `endheaders()' method (this can be omitted if step 4
     makes no calls).
  4. 4.  Optional calls to the `send()' method.
  5. 5.  Call the `getreply()' method.
  6. 6.  Call the `getfile()' method and read the data off the file
     object that it returns.
* Menu:
* HTTP Objects::
* HTTP Example::
File: pylibi,  Node: HTTP Objects,  Next: HTTP Example,  Prev: httplib,  Up: httplib
HTTP Objects
------------
`HTTP' instances have the following methods:
 - Method on HTTP: set_debuglevel (LEVEL)
     Set the debugging level (the amount of debugging output printed).
     The default debug level is `0', meaning no debugging output is
     printed.
 - Method on HTTP: connect (HOST[, PORT])
     Connect to the server given by HOST and PORT.  See the intro for
     the default port.  This should be called directly only if the
     instance was instantiated without passing a host.
 - Method on HTTP: send (DATA)
     Send data to the server.  This should be used directly only after
     the `endheaders()' method has been called and before `getreply()'
     has been called.
 - Method on HTTP: putrequest (REQUEST, SELECTOR)
     This should be the first call after the connection to the server
     has been made.  It sends a line to the server consisting of the
     REQUEST string, the SELECTOR string, and the HTTP version
     (`HTTP/1.0').
 - Method on HTTP: putheader (HEADER, ARGUMENT[, ...])
     Send an RFC-822 style header to the server.  It sends a line to the
     server consisting of the header, a colon and a space, and the first
     argument.  If more arguments are given, continuation lines are
     sent, each consisting of a tab and an argument.
 - Method on HTTP: endheaders ()
     Send a blank line to the server, signalling the end of the headers.
 - Method on HTTP: getreply ()
     Complete the request by shutting down the sending end of the
     socket, read the reply from the server, and return a triple
     (REPLYCODE, MESSAGE, HEADERS).  Here REPLYCODE is the integer
     reply code from the request (e.g. `200' if the request was handled
     properly); MESSAGE is the message string corresponding to the
     reply code; and HEADER is an instance of the class
     `rfc822.Message' containing the headers received from the server.
     See the description of the `rfc822' module.
 - Method on HTTP: getfile ()
     Return a file object from which the data returned by the server
     can be read, using the `read()', `readline()' or `readlines()'
     methods.
File: pylibi,  Node: HTTP Example,  Prev: HTTP Objects,  Up: httplib
Example
-------
Here is an example session:
     >>> import httplib
     >>> h = httplib.HTTP('www.cwi.nl')
     >>> h.putrequest('GET', '/index.html')
     >>> h.putheader('Accept', 'text/html')
     >>> h.putheader('Accept', 'text/plain')
     >>> h.endheaders()
     >>> errcode, errmsg, headers = h.getreply()
     >>> print errcode # Should be 200
     >>> f = h.getfile()
     >>> data f.read() # Get the raw HTML
     >>> f.close()
     >>>
File: pylibi,  Node: ftplib,  Next: gopherlib,  Prev: httplib,  Up: Internet and WWW
Standard Module `ftplib'
========================
This module defines the class `FTP' and a few related items.  The `FTP'
class implements the client side of the FTP protocol.  You can use this
to write Python programs that perform a variety of automated FTP jobs,
such as mirroring other ftp servers.  It is also used by the module
`urllib' to handle URLs that use FTP.  For more information on FTP
(File Transfer Protocol), see Internet RFC 959.
Here's a sample session using the `ftplib' module:
     >>> from ftplib import FTP
     >>> ftp = FTP('ftp.cwi.nl')   # connect to host, default port
     >>> ftp.login()               # user anonymous, passwd user@hostname
     >>> ftp.retrlines('LIST')     # list directory contents
     total 24418
     drwxrwsr-x   5 ftp-usr  pdmaint     1536 Mar 20 09:48 .
     dr-xr-srwt 105 ftp-usr  pdmaint     1536 Mar 21 14:32 ..
     -rw-r--r--   1 ftp-usr  pdmaint     5305 Mar 20 09:48 INDEX
      .
      .
      .
     >>> ftp.quit()
The module defines the following items:
 - function of module ftplib: FTP ([HOST[, USER, PASSWD, ACCT]])
     Return a new instance of the `FTP' class.  When HOST is given, the
     method call `connect(HOST)' is made.  When USER is given,
     additionally the method call `login(USER, PASSWD, ACCT)' is made
     (where PASSWD and ACCT default to the empty string when not given).
 - data of module ftplib: all_errors
     The set of all exceptions (as a tuple) that methods of `FTP'
     instances may raise as a result of problems with the FTP connection
     (as opposed to programming errors made by the caller).  This set
     includes the four exceptions listed below as well as
     `socket.error' and `IOError'.
 - exception of module ftplib: error_reply
     Exception raised when an unexpected reply is received from the
     server.
 - exception of module ftplib: error_temp
     Exception raised when an error code in the range 400-499 is
     received.
 - exception of module ftplib: error_perm
     Exception raised when an error code in the range 500-599 is
     received.
 - exception of module ftplib: error_proto
     Exception raised when a reply is received from the server that does
     not begin with a digit in the range 1-5.
* Menu:
* FTP Objects::
File: pylibi,  Node: FTP Objects,  Prev: ftplib,  Up: ftplib
FTP Objects
-----------
FTP instances have the following methods:
 - Method on FTP object: set_debuglevel (LEVEL)
     Set the instance's debugging level.  This controls the amount of
     debugging output printed.  The default, 0, produces no debugging
     output.  A value of 1 produces a moderate amount of debugging
     output, generally a single line per request.  A value of 2 or
     higher produces the maximum amount of debugging output, logging
     each line sent and received on the control connection.
 - Method on FTP object: connect (HOST[, PORT])
     Connect to the given host and port.  The default port number is
     21, as specified by the FTP protocol specification.  It is rarely
     needed to specify a different port number.  This function should
     be called only once for each instance; it should not be called at
     all if a host was given when the instance was created.  All other
     methods can only be used after a connection has been made.
 - Method on FTP object: getwelcome ()
     Return the welcome message sent by the server in reply to the
     initial connection.  (This message sometimes contains disclaimers
     or help information that may be relevant to the user.)
 - Method on FTP object: login ([USER[, PASSWD[, ACCT]]])
     Log in as the given USER.  The PASSWD and ACCT parameters are
     optional and default to the empty string.  If no USER is
     specified, it defaults to `anonymous'.  If USER is `anonymous',
     the default PASSWD is `REALUSER@HOST' where REALUSER is the real
     user name (glanced from the `LOGNAME' or `USER' environment
     variable) and HOST is the hostname as returned by
     `socket.gethostname()'.  This function should be called only once
     for each instance, after a connection has been established; it
     should not be called at all if a host and user were given when the
     instance was created.  Most FTP commands are only allowed after the
     client has logged in.
 - Method on FTP object: abort ()
     Abort a file transfer that is in progress.  Using this does not
     always work, but it's worth a try.
 - Method on FTP object: sendcmd (COMMAND)
     Send a simple command string to the server and return the response
     string.
 - Method on FTP object: voidcmd (COMMAND)
     Send a simple command string to the server and handle the response.
     Return nothing if a response code in the range 200-299 is received.
     Raise an exception otherwise.
 - Method on FTP object: retrbinary (COMMAND, CALLBACK, MAXBLOCKSIZE)
     Retrieve a file in binary transfer mode.  COMMAND should be an
     appropriate `RETR' command, i.e. `"RETR FILENAME"'.  The CALLBACK
     function is called for each block of data received, with a single
     string argument giving the data block.  The MAXBLOCKSIZE argument
     specifies the maximum block size (which may not be the actual size
     of the data blocks passed to CALLBACK).
 - Method on FTP object: retrlines (COMMAND[, CALLBACK])
     Retrieve a file or directory listing in ASCII transfer mode.
     varcommand should be an appropriate `RETR' command (see
     `retrbinary()' or a `LIST' command (usually just the string
     `"LIST"').  The CALLBACK function is called for each line, with
     the trailing CRLF stripped.  The default CALLBACK prints the line
     to `sys.stdout'.
 - Method on FTP object: storbinary (COMMAND, FILE, BLOCKSIZE)
     Store a file in binary transfer mode.  COMMAND should be an
     appropriate `STOR' command, i.e. `"STOR FILENAME"'.  FILE is an
     open file object which is read until EOF using its `read()' method
     in blocks of size BLOCKSIZE to provide the data to be stored.
 - Method on FTP object: storlines (COMMAND, FILE)
     Store a file in ASCII transfer mode.  COMMAND should be an
     appropriate `STOR' command (see `storbinary()').  Lines are read
     until EOF from the open file object FILE using its `readline()'
     method to privide the data to be stored.
 - Method on FTP object: nlst (ARGUMENT[, ...])
     Return a list of files as returned by the `NLST' command.  The
     optional varargument is a directory to list (default is the current
     server directory).  Multiple arguments can be used to pass
     non-standard options to the `NLST' command.
 - Method on FTP object: dir (ARGUMENT[, ...])
     Return a directory listing as returned by the `LIST' command, as a
     list of lines.  The optional varargument is a directory to list
     (default is the current server directory).  Multiple arguments can
     be used to pass non-standard options to the `LIST' command.  If the
     last argument is a function, it is used as a CALLBACK function as
     for `retrlines()'.
 - Method on FTP object: rename (FROMNAME, TONAME)
     Rename file FROMNAME on the server to TONAME.
 - Method on FTP object: cwd (PATHNAME)
     Set the current directory on the server.
 - Method on FTP object: mkd (PATHNAME)
     Create a new directory on the server.
 - Method on FTP object: pwd ()
     Return the pathname of the current directory on the server.
 - Method on FTP object: quit ()
     Send a `QUIT' command to the server and close the connection.
     This is the "polite" way to close a connection, but it may raise an
     exception of the server reponds with an error to the `QUIT'
     command.
 - Method on FTP object: close ()
     Close the connection unilaterally.  This should not be applied to
     an already closed connection (e.g. after a successful call to
     `quit()'.
File: pylibi,  Node: gopherlib,  Next: nntplib,  Prev: ftplib,  Up: Internet and WWW
Standard Module `gopherlib'
===========================
This module provides a minimal implementation of client side of the the
Gopher protocol.  It is used by the module `urllib' to handle URLs that
use the Gopher protocol.
The module defines the following functions:
 - function of module gopherlib: send_selector (SELECTOR, HOST[, PORT])
     Send a SELECTOR string to the gopher server at HOST and PORT
     (default 70).  Return an open file object from which the returned
     document can be read.
 - function of module gopherlib: send_query (SELECTOR, QUERY, HOST[,
          PORT])
     Send a SELECTOR string and a QUERY string to a gopher server at
     HOST and PORT (default 70).  Return an open file object from which
     the returned document can be read.
Note that the data returned by the Gopher server can be of any type,
depending on the first character of the selector string.  If the data
is text (first character of the selector is `0'), lines are terminated
by CRLF, and the data is terminated by a line consisting of a single
`.', and a leading `.' should be stripped from lines that begin with
`..'.  Directory listings (first charactger of the selector is `1') are
transferred using the same protocol.
File: pylibi,  Node: nntplib,  Next: urlparse,  Prev: gopherlib,  Up: Internet and WWW
Standard Module `nntplib'
=========================
This module defines the class `NNTP' which implements the client side
of the NNTP protocol.  It can be used to implement a news reader or
poster, or automated news processors.  For more information on NNTP
(Network News Transfer Protocol), see Internet RFC 977.
Here are two small examples of how it can be used.  To list some
statistics about a newsgroup and print the subjects of the last 10
articles:
     >>> s = NNTP('news.cwi.nl')
     >>> resp, count, first, last, name = s.group('comp.lang.python')
     >>> print 'Group', name, 'has', count, 'articles, range', first, 'to', last
     Group comp.lang.python has 59 articles, range 3742 to 3803
     >>> resp, subs = s.xhdr('subject', first + '-' + last)
     >>> for id, sub in subs[-10:]: print id, sub
     ...
     3792 Re: Removing elements from a list while iterating...
     3793 Re: Who likes Info files?
     3794 Emacs and doc strings
     3795 a few questions about the Mac implementation
     3796 Re: executable python scripts
     3797 Re: executable python scripts
     3798 Re: a few questions about the Mac implementation
     3799 Re: PROPOSAL: A Generic Python Object Interface for Python C Modules
     3802 Re: executable python scripts
     3803 Re: POSIX wait and SIGCHLD
     >>> s.quit()
     '205 news.cwi.nl closing connection.  Goodbye.'
     >>>
To post an article from a file (this assumes that the article has valid
headers):
     >>> s = NNTP('news.cwi.nl')
     >>> f = open('/tmp/article')
     >>> s.post(f)
     '240 Article posted successfully.'
     >>> s.quit()
     '205 news.cwi.nl closing connection.  Goodbye.'
     >>>
The module itself defines the following items:
 - function of module nntplib: NNTP (HOST[, PORT])
     Return a new instance of the `NNTP' class, representing a
     connection to the NNTP server running on host HOST, listening at
     port PORT.  The default PORT is 119.
 - exception of module nntplib: error_reply
     Exception raised when an unexpected reply is received from the
     server.
 - exception of module nntplib: error_temp
     Exception raised when an error code in the range 400-499 is
     received.
 - exception of module nntplib: error_perm
     Exception raised when an error code in the range 500-599 is
     received.
 - exception of module nntplib: error_proto
     Exception raised when a reply is received from the server that does
     not begin with a digit in the range 1-5.
* Menu:
* NNTP Objects::
File: pylibi,  Node: NNTP Objects,  Prev: nntplib,  Up: nntplib
NNTP Objects
------------
NNTP instances have the following methods.  The RESPONSE that is
returned as the first item in the return tuple of almost all methods is
the server's response: a string beginning with a three-digit code.  If
the server's response indicates an error, the method raises one of the
above exceptions.
 - Method on NNTP object: getwelcome ()
     Return the welcome message sent by the server in reply to the
     initial connection.  (This message sometimes contains disclaimers
     or help information that may be relevant to the user.)
 - Method on NNTP object: set_debuglevel (LEVEL)
     Set the instance's debugging level.  This controls the amount of
     debugging output printed.  The default, 0, produces no debugging
     output.  A value of 1 produces a moderate amount of debugging
     output, generally a single line per request or response.  A value
     of 2 or higher produces the maximum amount of debugging output,
     logging each line sent and received on the connection (including
     message text).
 - Method on NNTP object: newgroups (DATE, TIME)
     Send a `NEWGROUPS' command.  The DATE argument should be a string
     of the form `"YYMMDD"' indicating the date, and TIME should be a
     string of the form `"HHMMSS"' indicating the time.  Return a pair
     `(RESPONSE, GROUPS)' where GROUPS is a list of group names that
     are new since the given date and time.
 - Method on NNTP object: newnews (GROUP, DATE, TIME)
     Send a `NEWNEWS' command.  Here, GROUP is a group name or `"*"',
     and DATE and TIME have the same meaning as for `newgroups()'.
     Return a pair `(RESPONSE, ARTICLES)' where ARTICLES is a list of
     article ids.
 - Method on NNTP object: list ()
     Send a `LIST' command.  Return a pair `(RESPONSE, LIST)' where
     LIST is a list of tuples.  Each tuple has the form `(GROUP, LAST,
     FIRST, FLAG)', where GROUP is a group name, LAST and FIRST are the
     last and first article numbers (as strings), and FLAG is `'y'' if
     posting is allowed, `'n'' if not, and `'m'' if the newsgroup is
     moderated.  (Note the ordering: LAST, FIRST.)
 - Method on NNTP object: group (NAME)
     Send a `GROUP' command, where NAME is the group name.  Return a
     tuple `(RESPONSE, COUNT, FIRST, LAST, NAME)' where COUNT is the
     (estimated) number of articles in the group, FIRST is the first
     article number in the group, LAST is the last article number in
     the group, and NAME is the group name.  The numbers are returned
     as strings.
 - Method on NNTP object: help ()
     Send a `HELP' command.  Return a pair `(RESPONSE, LIST)' where
     LIST is a list of help strings.
 - Method on NNTP object: stat (ID)
     Send a `STAT' command, where ID is the message id (enclosed in `<'
     and `>') or an article number (as a string).  Return a triple
     `(varresponse, NUMBER, ID)' where NUMBER is the article number (as
     a string) and ID is the article id  (enclosed in `<' and `>').
 - Method on NNTP object: next ()
     Send a `NEXT' command.  Return as for `stat()'.
 - Method on NNTP object: last ()
     Send a `LAST' command.  Return as for `stat()'.
 - Method on NNTP object: head (ID)
     Send a `HEAD' command, where ID has the same meaning as for
     `stat()'.  Return a pair `(RESPONSE, LIST)' where LIST is a list
     of the article's headers (an uninterpreted list of lines, without
     trailing newlines).
 - Method on NNTP object: body (ID)
     Send a `BODY' command, where ID has the same meaning as for
     `stat()'.  Return a pair `(RESPONSE, LIST)' where LIST is a list
     of the article's body text (an uninterpreted list of lines,
     without trailing newlines).
 - Method on NNTP object: article (ID)
     Send a `ARTICLE' command, where ID has the same meaning as for
     `stat()'.  Return a pair `(RESPONSE, LIST)' where LIST is a list
     of the article's header and body text (an uninterpreted list of
     lines, without trailing newlines).
 - Method on NNTP object: slave ()
     Send a `SLAVE' command.  Return the server's RESPONSE.
 - Method on NNTP object: xhdr (HEADER, STRING)
     Send an `XHDR' command.  This command is not defined in the RFC
     but is a common extension.  The HEADER argument is a header
     keyword, e.g. `"subject"'.  The STRING argument should have the
     form `"FIRST-LAST"' where FIRST and LAST are the first and last
     article numbers to search.  Return a pair `(RESPONSE, LIST)',
     where LIST is a list of pairs `(ID, TEXT)', where ID is an article
     id (as a string) and TEXT is the text of the requested header for
     that article.
 - Method on NNTP object: post (FILE)
     Post an article using the `POST' command.  The FILE argument is an
     open file object which is read until EOF using its `readline()'
     method.  It should be a well-formed news article, including the
     required headers.  The `post()' method automatically escapes lines
     beginning with `.'.
 - Method on NNTP object: ihave (ID, FILE)
     Send an `IHAVE' command.  If the response is not an error, treat
     FILE exactly as for the `post()' method.
 - Method on NNTP object: quit ()
     Send a `QUIT' command and close the connection.  Once this method
     has been called, no other methods of the NNTP object should be
     called.
File: pylibi,  Node: urlparse,  Next: sgmllib,  Prev: nntplib,  Up: Internet and WWW
Standard Module `urlparse'
==========================
This module defines a standard interface to break URL strings up in
components (addessing scheme, network location, path etc.), to combine
the components back into a URL string, and to convert a "relative URL"
to an absolute URL given a "base URL".
The module has been designed to match the current Internet draft on
Relative Uniform Resource Locators (and discovered a bug in an earlier
draft!).
It defines the following functions:
 - function of module urlparse: urlparse (URLSTRING[, DEFAULT_SCHEME[,
          ALLOW_FRAGMENTS]])
     Parse a URL into 6 components, returning a 6-tuple: (addressing
     scheme, network location, path, parameters, query, fragment
     identifier).  This corresponds to the general structure of a URL:
     `SCHEME://NETLOC/PATH;PARAMETERS?QUERY#FRAGMENT'.  Each tuple item
     is a string, possibly empty.  The components are not broken up in
     smaller parts (e.g. the network location is a single string), and
     % escapes are not expanded.  The delimiters as shown above are not
     part of the tuple items, except for a leading slash in the PATH
     component, which is retained if present.
     Example:
          urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
     yields the tuple
          ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '')
     If the DEFAULT_SCHEME argument is specified, it gives the default
     addressing scheme, to be used only if the URL string does not
     specify one.  The default value for this argument is the empty
     string.
     If the ALLOW_FRAGMENTS argument is zero, fragment identifiers are
     not allowed, even if the URL's addressing scheme normally does
     support them.  The default value for this argument is `1'.
 - function of module urlparse: urlunparse (TUPLE)
     Construct a URL string from a tuple as returned by `urlparse'.
     This may result in a slightly different, but equivalent URL, if the
     URL that was parsed originally had redundant delimiters, e.g. a ?
     with an empty query (the draft states that these are equivalent).
 - function of module urlparse: urljoin (BASE, URL[, ALLOW_FRAGMENTS])
     Construct a full ("absolute") URL by combining a "base URL" (BASE)
     with a "relative URL" (URL).  Informally, this uses components of
     the base URL, in particular the addressing scheme, the network
     location and (part of) the path, to provide missing components in
     the relative URL.
     Example:
          urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
     yields the string
          'http://www.cwi.nl/%7Eguido/FAQ.html'
     The ALLOW_FRAGMENTS argument has the same meaning as for
     `urlparse'.
File: pylibi,  Node: sgmllib,  Next: htmllib,  Prev: urlparse,  Up: Internet and WWW
Standard Module `sgmllib'
=========================
This module defines a class `SGMLParser' which serves as the basis for
parsing text files formatted in SGML (Standard Generalized Mark-up
Language).  In fact, it does not provide a full SGML parser -- it only
parses SGML insofar as it is used by HTML, and the module only exists
as a base for the `htmllib' module.
In particular, the parser is hardcoded to recognize the following
constructs:
   * Opening and closing tags of the form "`'" and
     "`'", respectively.
   * Numeric character references of the form "`&#NAME;'".
   * Entity references of the form "`&NAME;'".
   * SGML comments of the form "`'".  Note that spaces,
     tabs, and newlines are allowed between the trailing "`>'" and the
     immediately preceeding "`--'".
The `SGMLParser' class must be instantiated without arguments.  It has
the following interface methods:
 - Method on SGMLParser: reset ()
     Reset the instance.  Loses all unprocessed data.  This is called
     implicitly at instantiation time.
 - Method on SGMLParser: setnomoretags ()
     Stop processing tags.  Treat all following input as literal input
     (CDATA).  (This is only provided so the HTML tag `' can
     be implemented.)
 - Method on SGMLParser: setliteral ()
     Enter literal mode (CDATA mode).
 - Method on SGMLParser: feed (DATA)
     Feed some text to the parser.  It is processed insofar as it
     consists of complete elements; incomplete data is buffered until
     more data is fed or `close()' is called.
 - Method on SGMLParser: close ()
     Force processing of all buffered data as if it were followed by an
     end-of-file mark.  This method may be redefined by a derived class
     to define additional processing at the end of the input, but the
     redefined version should always call `SGMLParser.close()'.
 - Method on SGMLParser: handle_starttag (TAG, METHOD, ATTRIBUTES)
     This method is called to handle start tags for which either a
     `start_TAG()' or `do_TAG()' method has been defined.  The `tag'
     argument is the name of the tag converted to lower case, and the
     `method' argument is the bound method which should be used to
     support semantic interpretation of the start tag.  The ATTRIBUTES
     argument is a list of (NAME, VALUE) pairs containing the
     attributes found inside the tag's `<>' brackets.  The NAME has
     been translated to lower case and double quotes and backslashes in
     the VALUE have been interpreted.  For instance, for the tag `<A
     HREF="http://www.cwi.nl/">', this method would be called as
     `unknown_starttag('a', [('href', 'http://www.cwi.nl/')])'.  The
     base implementation simply calls `method' with `attributes' as the
     only argument.
 - Method on SGMLParser: handle_endtag (TAG, METHOD)
     This method is called to handle endtags for which an `end_TAG()'
     method has been defined.  The `tag' argument is the name of the
     tag converted to lower case, and the `method' argument is the
     bound method which should be used to support semantic
     interpretation of the end tag.  If no `end_TAG()' method is
     defined for the closing element, this handler is not called.  The
     base implementation simply calls `method'.
 - Method on SGMLParser: handle_data (DATA)
     This method is called to process arbitrary data.  It is intended
     to be overridden by a derived class; the base class implementation
     does nothing.
 - Method on SGMLParser: handle_charref (REF)
     This method is called to process a character reference of the form
     "`&#REF;'".  In the base implementation, REF must be a decimal
     number in the range 0-255.  It translates the character to ASCII
     and calls the method `handle_data()' with the character as
     argument.  If REF is invalid or out of range, the method
     `unknown_charref(REF)' is called to handle the error.  A subclass
     must override this method to provide support for named character
     entities.
 - Method on SGMLParser: handle_entityref (REF)
     This method is called to process a general entity reference of the
     form "`&REF;'" where REF is an general entity reference.  It looks
     for REF in the instance (or class) variable `entitydefs' which
     should be a mapping from entity names to corresponding
     translations.  If a translation is found, it calls the method
     `handle_data()' with the translation; otherwise, it calls the
     method `unknown_entityref(REF)'.  The default `entitydefs' defines
     translations for `&amp;', `&apos', `&gt;', `&lt;', and `&quot;'.
 - Method on SGMLParser: handle_comment (COMMENT)
     This method is called when a comment is encountered.  The
     `comment' argument is a string containing the text between the
     "`<!--'" and "`-->'" delimiters, but not the delimiters
     themselves.  For example, the comment "`<!--text-->'" will cause
     this method to be called with the argument `'text''.  The default
     method does nothing.
 - Method on SGMLParser: report_unbalanced (TAG)
     This method is called when an end tag is found which does not
     correspond to any open element.
 - Method on SGMLParser: unknown_starttag (TAG, ATTRIBUTES)
     This method is called to process an unknown start tag.  It is
     intended to be overridden by a derived class; the base class
     implementation does nothing.
 - Method on SGMLParser: unknown_endtag (TAG)
     This method is called to process an unknown end tag.  It is
     intended to be overridden by a derived class; the base class
     implementation does nothing.
 - Method on SGMLParser: unknown_charref (REF)
     This method is called to process unresolvable numeric character
     references.  It is intended to be overridden by a derived class;
     the base class implementation does nothing.
 - Method on SGMLParser: unknown_entityref (REF)
     This method is called to process an unknown entity reference.  It
     is intended to be overridden by a derived class; the base class
     implementation does nothing.
Apart from overriding or extending the methods listed above, derived
classes may also define methods of the following form to define
processing of specific tags.  Tag names in the input stream are case
independent; the TAG occurring in method names must be in lower case:
 - Method on SGMLParser: start_TAG (ATTRIBUTES)
     This method is called to process an opening tag TAG.  It has
     preference over `do_TAG()'.  The ATTRIBUTES argument has the same
     meaning as described for `handle_starttag()' above.
 - Method on SGMLParser: do_TAG (ATTRIBUTES)
     This method is called to process an opening tag TAG that does not
     come with a matching closing tag.  The ATTRIBUTES argument has the
     same meaning as described for `handle_starttag()' above.
 - Method on SGMLParser: end_TAG ()
     This method is called to process a closing tag TAG.
Note that the parser maintains a stack of open elements for which no
end tag has been found yet.  Only tags processed by `start_TAG()' are
pushed on this stack.  Definition of an `end_TAG()' method is optional
for these tags.  For tags processed by `do_TAG()' or by
`unknown_tag()', no `end_TAG()' method must be defined; if defined, it
will not be used.  If both `start_TAG()' and `do_TAG()' methods exist
for a tag, the `start_TAG()' method takes precedence.
File: pylibi,  Node: htmllib,  Next: formatter,  Prev: sgmllib,  Up: Internet and WWW
Standard Module `htmllib'
=========================
This module defines a class which can serve as a base for parsing text
files formatted in the HyperText Mark-up Language (HTML).  The class is
not directly concerned with I/O -- it must be provided with input in
string form via a method, and makes calls to methods of a "formatter"
object in order to produce output.  The `HTMLParser' class is designed
to be used as a base class for other classes in order to add
functionality, and allows most of its methods to be extended or
overridden.  In turn, this class is derived from and extends the
`SGMLParser' class defined in module `sgmllib'.  Two implementations of
formatter objects are provided in the `formatter' module; refer to the
documentation for that module for information on the formatter
interface.
The following is a summary of the interface defined by
`sgmllib.SGMLParser':
   * The interface to feed data to an instance is through the `feed()'
     method, which takes a string argument.  This can be called with as
     little or as much text at a time as desired; `p.feed(a);
     p.feed(b)' has the same effect as `p.feed(a+b)'.  When the data
     contains complete HTML tags, these are processed immediately;
     incomplete elements are saved in a buffer.  To force processing of
     all unprocessed data, call the `close()' method.
     For example, to parse the entire contents of a file, use:
          parser.feed(open('myfile.html').read())
          parser.close()
   * The interface to define semantics for HTML tags is very simple:
     derive a class and define methods called `start_TAG()',
     `end_TAG()', or `do_TAG()'.  The parser will call these at
     appropriate moments: `start_TAG' or `do_TAG' is called when an
     opening tag of the form `<TAG ...>' is encountered; `end_TAG' is
     called when a closing tag of the form `<TAG>' is encountered.  If
     an opening tag requires a corresponding closing tag, like `<H1>'
     ... `</H1>', the class should define the `start_TAG' method; if a
     tag requires no closing tag, like `<P>', the class should define
     the `do_TAG' method.
The module defines a single class:
 - function of module htmllib: HTMLParser (FORMATTER)
     This is the basic HTML parser class.  It supports all entity names
     required by the HTML 2.0 specification (RFC 1866).  It also defines
     handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
In addition to tag methods, the `HTMLParser' class provides some
additional methods and instance variables for use within tag methods.
 - data of HTMLParser method: formatter
     This is the formatter instance associated with the parser.
 - data of HTMLParser method: nofill
     Boolean flag which should be true when whitespace should not be
     collapsed, or false when it should be.  In general, this should
     only be true when character data is to be treated as
     "preformatted" text, as within a `<PRE>' element.  The default
     value is false.  This affects the operation of `handle_data()' and
     `save_end()'.
 - Method on HTMLParser: anchor_bgn (HREF, NAME, TYPE)
     This method is called at the start of an anchor region.  The
     arguments correspond to the attributes of the `<A>' tag with the
     same names.  The default implementation maintains a list of
     hyperlinks (defined by the `href' argument) within the document.
     The list of hyperlinks is available as the data attribute
     `anchorlist'.
 - Method on HTMLParser: anchor_end ()
     This method is called at the end of an anchor region.  The default
     implementation adds a textual footnote marker using an index into
     the list of hyperlinks created by `anchor_bgn()'.
 - Method on HTMLParser: handle_image (SOURCE, ALT[, ISMAP[, ALIGN[,
          WIDTH[, HEIGHT]]]])
     This method is called to handle images.  The default implementation
     simply passes the `alt' value to the `handle_data()' method.
 - Method on HTMLParser: save_bgn ()
     Begins saving character data in a buffer instead of sending it to
     the formatter object.  Retrieve the stored data via `save_end()'
     Use of the `save_bgn()' / `save_end()' pair may not be nested.
 - Method on HTMLParser: save_end ()
     Ends buffering character data and returns all data saved since the
     preceeding call to `save_bgn()'.  If `nofill' flag is false,
     whitespace is collapsed to single spaces.  A call to this method
     without a preceeding call to `save_bgn()' will raise a `TypeError'
     exception.