This is Info file pylibi, produced by Makeinfo-1.55 from the input file lib.texi. This file describes the built-in types, exceptions and functions and the standard modules that come with the Python system. It assumes basic knowledge about the Python language. For an informal introduction to the language, see the Python Tutorial. The Python Reference Manual gives a more formal definition of the language. (These manuals are not yet available in INFO or Texinfo format.) Copyright 1991-1995 by Stichting Mathematisch Centrum, Amsterdam, The Netherlands. All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the names of Stichting Mathematisch Centrum or CWI or Corporation for National Research Initiatives or CNRI not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. While CWI is the initial source for this software, a modified version is made available by the Corporation for National Research Initiatives (CNRI) at the Internet address ftp://ftp.python.org. STICHTING MATHEMATISCH CENTRUM AND CNRI DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM OR CNRI BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. File: pylibi, Node: Installing your CGI script on a Unix system, Next: Testing your CGI script, Prev: Caring about security, Up: cgi Installing your CGI script on a Unix system ------------------------------------------- Read the documentation for your HTTP server and check with your local system administrator to find the directory where CGI scripts should be installed; usually this is in a directory `cgi-bin' in the server tree. Make sure that your script is readable and executable by "others"; the Unix file mode should be 755 (use `chmod 755 filename'). Make sure that the first line of the script contains `#!' starting in column 1 followed by the pathname of the Python interpreter, for instance: #!/usr/local/bin/python Make sure the Python interpreter exists and is executable by "others". Make sure that any files your script needs to read or write are readable or writable, respectively, by "others" - their mode should be 644 for readable and 666 for writable. This is because, for security reasons, the HTTP server executes your script as user "nobody", without any special privileges. It can only read (write, execute) files that everybody can read (write, execute). The current directory at execution time is also different (it is usually the server's cgi-bin directory) and the set of environment variables is also different from what you get at login. in particular, don't count on the shell's search path for executables (`$PATH') or the Python module search path (`$PYTHONPATH') to be set to anything interesting. If you need to load modules from a directory which is not on Python's default module search path, you can change the path in your script, before importing other modules, e.g.: import sys sys.path.insert(0, "/usr/home/joe/lib/python") sys.path.insert(0, "/usr/local/lib/python") (This way, the directory inserted last will be searched first!) Instructions for non-Unix systems will vary; check your HTTP server's documentation (it will usually have a section on CGI scripts). File: pylibi, Node: Testing your CGI script, Next: Debugging CGI scripts, Prev: Installing your CGI script on a Unix system, Up: cgi Testing your CGI script ----------------------- Unfortunately, a CGI script will generally not run when you try it from the command line, and a script that works perfectly from the command line may fail mysteriously when run from the server. There's one reason why you should still test your script from the command line: if it contains a syntax error, the python interpreter won't execute it at all, and the HTTP server will most likely send a cryptic error to the client. Assuming your script has no syntax errors, yet it does not work, you have no choice but to read the next section: File: pylibi, Node: Debugging CGI scripts, Next: Common problems and solutions, Prev: Testing your CGI script, Up: cgi Debugging CGI scripts --------------------- First of all, check for trivial installation errors - reading the section above on installing your CGI script carefully can save you a lot of time. If you wonder whether you have understood the installation procedure correctly, try installing a copy of this module file (`cgi.py') as a CGI script. When invoked as a script, the file will dump its environment and the contents of the form in HTML form. Give it the right mode etc, and send it a request. If it's installed in the standard `cgi-bin' directory, it should be possible to send it a request by entering a URL into your browser of the form: http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home If this gives an error of type 404, the server cannot find the script - perhaps you need to install it in a different directory. If it gives another error (e.g. 500), there's an installation problem that you should fix before trying to go any further. If you get a nicely formatted listing of the environment and form content (in this example, the fields should be listed as "addr" with value "At Home" and "name" with value "Joe Blow"), the `cgi.py' script has been installed correctly. If you follow the same procedure for your own script, you should now be able to debug it. The next step could be to call the `cgi' module's test() function from your script: replace its main code with the single statement cgi.test() This should produce the same results as those gotten from installing the `cgi.py' file itself. When an ordinary Python script raises an unhandled exception (e.g. because of a typo in a module name, a file that can't be opened, etc.), the Python interpreter prints a nice traceback and exits. While the Python interpreter will still do this when your CGI script raises an exception, most likely the traceback will end up in one of the HTTP server's log file, or be discarded altogether. Fortunately, once you have managed to get your script to execute *some* code, it is easy to catch exceptions and cause a traceback to be printed. The `test()' function below in this module is an example. Here are the rules: 1. Import the traceback module (before entering the try-except!) 2. Make sure you finish printing the headers and the blank line early 3. Assign `sys.stderr' to `sys.stdout' 4. Wrap all remaining code in a try-except statement 5. In the except clause, call `traceback.print_exc()' For example: import sys import traceback print "Content-type: text/html" print sys.stderr = sys.stdout try: ...your code here... except: print "\n\n
" traceback.print_exc() Notes: The assignment to `sys.stderr' is needed because the traceback prints to `sys.stderr'. The `print "nn"' statement is necessary to disable the word wrapping in HTML. If you suspect that there may be a problem in importing the traceback module, you can use an even more robust approach (which only uses built-in modules): import sys sys.stderr = sys.stdout print "Content-type: text/plain" print ...your code here... This relies on the Python interpreter to print the traceback. The content type of the output is set to plain text, which disables all HTML processing. If your script works, the raw HTML will be displayed by your client. If it raises an exception, most likely after the first two lines have been printed, a traceback will be displayed. Because no HTML interpretation is going on, the traceback will readable. File: pylibi, Node: Common problems and solutions, Prev: Debugging CGI scripts, Up: cgi Common problems and solutions ----------------------------- * Most HTTP servers buffer the output from CGI scripts until the script is completed. This means that it is not possible to display a progress report on the client's display while the script is running. * Check the installation instructions above. * Check the HTTP server's log files. (`tail -f logfile' in a separate window may be useful!) * Always check a script for syntax errors first, by doing something like `python script.py'. * When using any of the debugging techniques, don't forget to add `import sys' to the top of the script. * When invoking external programs, make sure they can be found. Usually, this means using absolute path names - `$PATH' is usually not set to a very useful value in a CGI script. * When reading or writing external files, make sure they can be read or written by every user on the system. * Don't try to give a CGI script a set-uid mode. This doesn't work on most systems, and is a security liability as well. File: pylibi, Node: urllib, Next: httplib, Prev: cgi, Up: Internet and WWW Standard Module `urllib' ======================== This module provides a high-level interface for fetching data across the World-Wide Web. In particular, the `urlopen' function is similar to the built-in function `open', but accepts URLs (Universal Resource Locators) instead of filenames. Some restrictions apply -- it can only open URLs for reading, and no seek operations are available. it defines the following public functions: - function of module urllib: urlopen (URL) Open a network object denoted by a URL for reading. If the URL does not have a scheme identifier, or if it has `file:' as its scheme identifier, this opens a local file; otherwise it opens a socket to a server somewhere on the network. If the connection cannot be made, or if the server returns an error code, the `IOError' exception is raised. If all went well, a file-like object is returned. This supports the following methods: `read()', `readline()', `readlines()', `fileno()', `close()' and `info()'. Except for the last one, these methods have the same interface as for file objects -- see the section on File Objects earlier in this manual. (It's not a built-in file object, however, so it can't be used at those few places where a true built-in file object is required.) The `info()' method returns an instance of the class `rfc822.Message' containing the headers received from the server, if the protocol uses such headers (currently the only supported protocol that uses this is HTTP). See the description of the `rfc822' module. - function of module urllib: urlretrieve (URL) Copy a network object denoted by a URL to a local file, if necessary. If the URL points to a local file, or a valid cached copy of the object exists, the object is not copied. Return a tuple (FILENAME, HEADERS) where FILENAME is the local file name under which the object can be found, and HEADERS is either `None' (for a local object) or whatever the `info()' method of the object returned by `urlopen()' returned (for a remote object, possibly cached). Exceptions are the same as for `urlopen()'. - function of module urllib: urlcleanup () Clear the cache that may have been built up by previous calls to `urlretrieve()'. - function of module urllib: quote (STRING[, ADDSAFE]) Replace special characters in STRING using the `%xx' escape. Letters, digits, and the characters "`_,.-'" are never quoted. The optional ADDSAFE parameter specifies additional characters that should not be quoted -- its default value is `'/''. Example: `quote('/~conolly/')' yields `'/%7econnolly/''. - function of module urllib: unquote (STRING) Replace `%xx' escapes by their single-character equivalent. Example: `unquote('/%7Econnolly/')' yields `'/~connolly/''. Restrictions: * Currently, only the following protocols are supported: HTTP, (versions 0.9 and 1.0), Gopher (but not Gopher-+), FTP, and local files. * The caching feature of `urlretrieve()' has been disabled until I find the time to hack proper processing of Expiration time headers. * There should be a function to query whether a particular URL is in the cache. * For backward compatibility, if a URL appears to point to a local file but the file can't be opened, the URL is re-interpreted using the FTP protocol. This can sometimes cause confusing error messages. * The `urlopen()' and `urlretrieve()' functions can cause arbitrarily long delays while waiting for a network connection to be set up. This means that it is difficult to build an interactive web client using these functions without using threads. * The data returned by `urlopen()' or `urlretrieve()' is the raw data returned by the server. This may be binary data (e.g. an image), plain text or (for example) HTML. The HTTP protocol provides type information in the reply header, which can be inspected by looking at the `Content-type' header. For the Gopher protocol, type information is encoded in the URL; there is currently no easy way to extract it. If the returned data is HTML, you can use the module `htmllib' to parse it. * Although the `urllib' module contains (undocumented) routines to parse and unparse URL strings, the recommended interface for URL manipulation is in module `urlparse'. File: pylibi, Node: httplib, Next: ftplib, Prev: urllib, Up: Internet and WWW Standard Module `httplib' ========================= This module defines a class which implements the client side of the HTTP protocol. It is normally not used directly -- the module `urllib' uses it to handle URLs that use HTTP. The module defines one class, `HTTP'. An `HTTP' instance represents one transaction with an HTTP server. It should be instantiated passing it a host and optional port number. If no port number is passed, the port is extracted from the host string if it has the form `host:port', else the default HTTP port (80) is used. If no host is passed, no connection is made, and the `connect' method should be used to connect to a server. For example, the following calls all create instances that connect to the server at the same host and port: >>> h1 = httplib.HTTP('www.cwi.nl') >>> h2 = httplib.HTTP('www.cwi.nl:80') >>> h3 = httplib.HTTP('www.cwi.nl', 80) Once an `HTTP' instance has been connected to an HTTP server, it should be used as follows: 1. 1. Make exactly one call to the `putrequest()' method. 2. 2. Make zero or more calls to the `putheader()' method. 3. 3. Call the `endheaders()' method (this can be omitted if step 4 makes no calls). 4. 4. Optional calls to the `send()' method. 5. 5. Call the `getreply()' method. 6. 6. Call the `getfile()' method and read the data off the file object that it returns. * Menu: * HTTP Objects:: * HTTP Example:: File: pylibi, Node: HTTP Objects, Next: HTTP Example, Prev: httplib, Up: httplib HTTP Objects ------------ `HTTP' instances have the following methods: - Method on HTTP: set_debuglevel (LEVEL) Set the debugging level (the amount of debugging output printed). The default debug level is `0', meaning no debugging output is printed. - Method on HTTP: connect (HOST[, PORT]) Connect to the server given by HOST and PORT. See the intro for the default port. This should be called directly only if the instance was instantiated without passing a host. - Method on HTTP: send (DATA) Send data to the server. This should be used directly only after the `endheaders()' method has been called and before `getreply()' has been called. - Method on HTTP: putrequest (REQUEST, SELECTOR) This should be the first call after the connection to the server has been made. It sends a line to the server consisting of the REQUEST string, the SELECTOR string, and the HTTP version (`HTTP/1.0'). - Method on HTTP: putheader (HEADER, ARGUMENT[, ...]) Send an RFC-822 style header to the server. It sends a line to the server consisting of the header, a colon and a space, and the first argument. If more arguments are given, continuation lines are sent, each consisting of a tab and an argument. - Method on HTTP: endheaders () Send a blank line to the server, signalling the end of the headers. - Method on HTTP: getreply () Complete the request by shutting down the sending end of the socket, read the reply from the server, and return a triple (REPLYCODE, MESSAGE, HEADERS). Here REPLYCODE is the integer reply code from the request (e.g. `200' if the request was handled properly); MESSAGE is the message string corresponding to the reply code; and HEADER is an instance of the class `rfc822.Message' containing the headers received from the server. See the description of the `rfc822' module. - Method on HTTP: getfile () Return a file object from which the data returned by the server can be read, using the `read()', `readline()' or `readlines()' methods. File: pylibi, Node: HTTP Example, Prev: HTTP Objects, Up: httplib Example ------- Here is an example session: >>> import httplib >>> h = httplib.HTTP('www.cwi.nl') >>> h.putrequest('GET', '/index.html') >>> h.putheader('Accept', 'text/html') >>> h.putheader('Accept', 'text/plain') >>> h.endheaders() >>> errcode, errmsg, headers = h.getreply() >>> print errcode # Should be 200 >>> f = h.getfile() >>> data f.read() # Get the raw HTML >>> f.close() >>> File: pylibi, Node: ftplib, Next: gopherlib, Prev: httplib, Up: Internet and WWW Standard Module `ftplib' ======================== This module defines the class `FTP' and a few related items. The `FTP' class implements the client side of the FTP protocol. You can use this to write Python programs that perform a variety of automated FTP jobs, such as mirroring other ftp servers. It is also used by the module `urllib' to handle URLs that use FTP. For more information on FTP (File Transfer Protocol), see Internet RFC 959. Here's a sample session using the `ftplib' module: >>> from ftplib import FTP >>> ftp = FTP('ftp.cwi.nl') # connect to host, default port >>> ftp.login() # user anonymous, passwd user@hostname >>> ftp.retrlines('LIST') # list directory contents total 24418 drwxrwsr-x 5 ftp-usr pdmaint 1536 Mar 20 09:48 . dr-xr-srwt 105 ftp-usr pdmaint 1536 Mar 21 14:32 .. -rw-r--r-- 1 ftp-usr pdmaint 5305 Mar 20 09:48 INDEX . . . >>> ftp.quit() The module defines the following items: - function of module ftplib: FTP ([HOST[, USER, PASSWD, ACCT]]) Return a new instance of the `FTP' class. When HOST is given, the method call `connect(HOST)' is made. When USER is given, additionally the method call `login(USER, PASSWD, ACCT)' is made (where PASSWD and ACCT default to the empty string when not given). - data of module ftplib: all_errors The set of all exceptions (as a tuple) that methods of `FTP' instances may raise as a result of problems with the FTP connection (as opposed to programming errors made by the caller). This set includes the four exceptions listed below as well as `socket.error' and `IOError'. - exception of module ftplib: error_reply Exception raised when an unexpected reply is received from the server. - exception of module ftplib: error_temp Exception raised when an error code in the range 400-499 is received. - exception of module ftplib: error_perm Exception raised when an error code in the range 500-599 is received. - exception of module ftplib: error_proto Exception raised when a reply is received from the server that does not begin with a digit in the range 1-5. * Menu: * FTP Objects:: File: pylibi, Node: FTP Objects, Prev: ftplib, Up: ftplib FTP Objects ----------- FTP instances have the following methods: - Method on FTP object: set_debuglevel (LEVEL) Set the instance's debugging level. This controls the amount of debugging output printed. The default, 0, produces no debugging output. A value of 1 produces a moderate amount of debugging output, generally a single line per request. A value of 2 or higher produces the maximum amount of debugging output, logging each line sent and received on the control connection. - Method on FTP object: connect (HOST[, PORT]) Connect to the given host and port. The default port number is 21, as specified by the FTP protocol specification. It is rarely needed to specify a different port number. This function should be called only once for each instance; it should not be called at all if a host was given when the instance was created. All other methods can only be used after a connection has been made. - Method on FTP object: getwelcome () Return the welcome message sent by the server in reply to the initial connection. (This message sometimes contains disclaimers or help information that may be relevant to the user.) - Method on FTP object: login ([USER[, PASSWD[, ACCT]]]) Log in as the given USER. The PASSWD and ACCT parameters are optional and default to the empty string. If no USER is specified, it defaults to `anonymous'. If USER is `anonymous', the default PASSWD is `REALUSER@HOST' where REALUSER is the real user name (glanced from the `LOGNAME' or `USER' environment variable) and HOST is the hostname as returned by `socket.gethostname()'. This function should be called only once for each instance, after a connection has been established; it should not be called at all if a host and user were given when the instance was created. Most FTP commands are only allowed after the client has logged in. - Method on FTP object: abort () Abort a file transfer that is in progress. Using this does not always work, but it's worth a try. - Method on FTP object: sendcmd (COMMAND) Send a simple command string to the server and return the response string. - Method on FTP object: voidcmd (COMMAND) Send a simple command string to the server and handle the response. Return nothing if a response code in the range 200-299 is received. Raise an exception otherwise. - Method on FTP object: retrbinary (COMMAND, CALLBACK, MAXBLOCKSIZE) Retrieve a file in binary transfer mode. COMMAND should be an appropriate `RETR' command, i.e. `"RETR FILENAME"'. The CALLBACK function is called for each block of data received, with a single string argument giving the data block. The MAXBLOCKSIZE argument specifies the maximum block size (which may not be the actual size of the data blocks passed to CALLBACK). - Method on FTP object: retrlines (COMMAND[, CALLBACK]) Retrieve a file or directory listing in ASCII transfer mode. varcommand should be an appropriate `RETR' command (see `retrbinary()' or a `LIST' command (usually just the string `"LIST"'). The CALLBACK function is called for each line, with the trailing CRLF stripped. The default CALLBACK prints the line to `sys.stdout'. - Method on FTP object: storbinary (COMMAND, FILE, BLOCKSIZE) Store a file in binary transfer mode. COMMAND should be an appropriate `STOR' command, i.e. `"STOR FILENAME"'. FILE is an open file object which is read until EOF using its `read()' method in blocks of size BLOCKSIZE to provide the data to be stored. - Method on FTP object: storlines (COMMAND, FILE) Store a file in ASCII transfer mode. COMMAND should be an appropriate `STOR' command (see `storbinary()'). Lines are read until EOF from the open file object FILE using its `readline()' method to privide the data to be stored. - Method on FTP object: nlst (ARGUMENT[, ...]) Return a list of files as returned by the `NLST' command. The optional varargument is a directory to list (default is the current server directory). Multiple arguments can be used to pass non-standard options to the `NLST' command. - Method on FTP object: dir (ARGUMENT[, ...]) Return a directory listing as returned by the `LIST' command, as a list of lines. The optional varargument is a directory to list (default is the current server directory). Multiple arguments can be used to pass non-standard options to the `LIST' command. If the last argument is a function, it is used as a CALLBACK function as for `retrlines()'. - Method on FTP object: rename (FROMNAME, TONAME) Rename file FROMNAME on the server to TONAME. - Method on FTP object: cwd (PATHNAME) Set the current directory on the server. - Method on FTP object: mkd (PATHNAME) Create a new directory on the server. - Method on FTP object: pwd () Return the pathname of the current directory on the server. - Method on FTP object: quit () Send a `QUIT' command to the server and close the connection. This is the "polite" way to close a connection, but it may raise an exception of the server reponds with an error to the `QUIT' command. - Method on FTP object: close () Close the connection unilaterally. This should not be applied to an already closed connection (e.g. after a successful call to `quit()'. File: pylibi, Node: gopherlib, Next: nntplib, Prev: ftplib, Up: Internet and WWW Standard Module `gopherlib' =========================== This module provides a minimal implementation of client side of the the Gopher protocol. It is used by the module `urllib' to handle URLs that use the Gopher protocol. The module defines the following functions: - function of module gopherlib: send_selector (SELECTOR, HOST[, PORT]) Send a SELECTOR string to the gopher server at HOST and PORT (default 70). Return an open file object from which the returned document can be read. - function of module gopherlib: send_query (SELECTOR, QUERY, HOST[, PORT]) Send a SELECTOR string and a QUERY string to a gopher server at HOST and PORT (default 70). Return an open file object from which the returned document can be read. Note that the data returned by the Gopher server can be of any type, depending on the first character of the selector string. If the data is text (first character of the selector is `0'), lines are terminated by CRLF, and the data is terminated by a line consisting of a single `.', and a leading `.' should be stripped from lines that begin with `..'. Directory listings (first charactger of the selector is `1') are transferred using the same protocol. File: pylibi, Node: nntplib, Next: urlparse, Prev: gopherlib, Up: Internet and WWW Standard Module `nntplib' ========================= This module defines the class `NNTP' which implements the client side of the NNTP protocol. It can be used to implement a news reader or poster, or automated news processors. For more information on NNTP (Network News Transfer Protocol), see Internet RFC 977. Here are two small examples of how it can be used. To list some statistics about a newsgroup and print the subjects of the last 10 articles: >>> s = NNTP('news.cwi.nl') >>> resp, count, first, last, name = s.group('comp.lang.python') >>> print 'Group', name, 'has', count, 'articles, range', first, 'to', last Group comp.lang.python has 59 articles, range 3742 to 3803 >>> resp, subs = s.xhdr('subject', first + '-' + last) >>> for id, sub in subs[-10:]: print id, sub ... 3792 Re: Removing elements from a list while iterating... 3793 Re: Who likes Info files? 3794 Emacs and doc strings 3795 a few questions about the Mac implementation 3796 Re: executable python scripts 3797 Re: executable python scripts 3798 Re: a few questions about the Mac implementation 3799 Re: PROPOSAL: A Generic Python Object Interface for Python C Modules 3802 Re: executable python scripts 3803 Re: POSIX wait and SIGCHLD >>> s.quit() '205 news.cwi.nl closing connection. Goodbye.' >>> To post an article from a file (this assumes that the article has valid headers): >>> s = NNTP('news.cwi.nl') >>> f = open('/tmp/article') >>> s.post(f) '240 Article posted successfully.' >>> s.quit() '205 news.cwi.nl closing connection. Goodbye.' >>> The module itself defines the following items: - function of module nntplib: NNTP (HOST[, PORT]) Return a new instance of the `NNTP' class, representing a connection to the NNTP server running on host HOST, listening at port PORT. The default PORT is 119. - exception of module nntplib: error_reply Exception raised when an unexpected reply is received from the server. - exception of module nntplib: error_temp Exception raised when an error code in the range 400-499 is received. - exception of module nntplib: error_perm Exception raised when an error code in the range 500-599 is received. - exception of module nntplib: error_proto Exception raised when a reply is received from the server that does not begin with a digit in the range 1-5. * Menu: * NNTP Objects:: File: pylibi, Node: NNTP Objects, Prev: nntplib, Up: nntplib NNTP Objects ------------ NNTP instances have the following methods. The RESPONSE that is returned as the first item in the return tuple of almost all methods is the server's response: a string beginning with a three-digit code. If the server's response indicates an error, the method raises one of the above exceptions. - Method on NNTP object: getwelcome () Return the welcome message sent by the server in reply to the initial connection. (This message sometimes contains disclaimers or help information that may be relevant to the user.) - Method on NNTP object: set_debuglevel (LEVEL) Set the instance's debugging level. This controls the amount of debugging output printed. The default, 0, produces no debugging output. A value of 1 produces a moderate amount of debugging output, generally a single line per request or response. A value of 2 or higher produces the maximum amount of debugging output, logging each line sent and received on the connection (including message text). - Method on NNTP object: newgroups (DATE, TIME) Send a `NEWGROUPS' command. The DATE argument should be a string of the form `"YYMMDD"' indicating the date, and TIME should be a string of the form `"HHMMSS"' indicating the time. Return a pair `(RESPONSE, GROUPS)' where GROUPS is a list of group names that are new since the given date and time. - Method on NNTP object: newnews (GROUP, DATE, TIME) Send a `NEWNEWS' command. Here, GROUP is a group name or `"*"', and DATE and TIME have the same meaning as for `newgroups()'. Return a pair `(RESPONSE, ARTICLES)' where ARTICLES is a list of article ids. - Method on NNTP object: list () Send a `LIST' command. Return a pair `(RESPONSE, LIST)' where LIST is a list of tuples. Each tuple has the form `(GROUP, LAST, FIRST, FLAG)', where GROUP is a group name, LAST and FIRST are the last and first article numbers (as strings), and FLAG is `'y'' if posting is allowed, `'n'' if not, and `'m'' if the newsgroup is moderated. (Note the ordering: LAST, FIRST.) - Method on NNTP object: group (NAME) Send a `GROUP' command, where NAME is the group name. Return a tuple `(RESPONSE, COUNT, FIRST, LAST, NAME)' where COUNT is the (estimated) number of articles in the group, FIRST is the first article number in the group, LAST is the last article number in the group, and NAME is the group name. The numbers are returned as strings. - Method on NNTP object: help () Send a `HELP' command. Return a pair `(RESPONSE, LIST)' where LIST is a list of help strings. - Method on NNTP object: stat (ID) Send a `STAT' command, where ID is the message id (enclosed in `<' and `>') or an article number (as a string). Return a triple `(varresponse, NUMBER, ID)' where NUMBER is the article number (as a string) and ID is the article id (enclosed in `<' and `>'). - Method on NNTP object: next () Send a `NEXT' command. Return as for `stat()'. - Method on NNTP object: last () Send a `LAST' command. Return as for `stat()'. - Method on NNTP object: head (ID) Send a `HEAD' command, where ID has the same meaning as for `stat()'. Return a pair `(RESPONSE, LIST)' where LIST is a list of the article's headers (an uninterpreted list of lines, without trailing newlines). - Method on NNTP object: body (ID) Send a `BODY' command, where ID has the same meaning as for `stat()'. Return a pair `(RESPONSE, LIST)' where LIST is a list of the article's body text (an uninterpreted list of lines, without trailing newlines). - Method on NNTP object: article (ID) Send a `ARTICLE' command, where ID has the same meaning as for `stat()'. Return a pair `(RESPONSE, LIST)' where LIST is a list of the article's header and body text (an uninterpreted list of lines, without trailing newlines). - Method on NNTP object: slave () Send a `SLAVE' command. Return the server's RESPONSE. - Method on NNTP object: xhdr (HEADER, STRING) Send an `XHDR' command. This command is not defined in the RFC but is a common extension. The HEADER argument is a header keyword, e.g. `"subject"'. The STRING argument should have the form `"FIRST-LAST"' where FIRST and LAST are the first and last article numbers to search. Return a pair `(RESPONSE, LIST)', where LIST is a list of pairs `(ID, TEXT)', where ID is an article id (as a string) and TEXT is the text of the requested header for that article. - Method on NNTP object: post (FILE) Post an article using the `POST' command. The FILE argument is an open file object which is read until EOF using its `readline()' method. It should be a well-formed news article, including the required headers. The `post()' method automatically escapes lines beginning with `.'. - Method on NNTP object: ihave (ID, FILE) Send an `IHAVE' command. If the response is not an error, treat FILE exactly as for the `post()' method. - Method on NNTP object: quit () Send a `QUIT' command and close the connection. Once this method has been called, no other methods of the NNTP object should be called. File: pylibi, Node: urlparse, Next: sgmllib, Prev: nntplib, Up: Internet and WWW Standard Module `urlparse' ========================== This module defines a standard interface to break URL strings up in components (addessing scheme, network location, path etc.), to combine the components back into a URL string, and to convert a "relative URL" to an absolute URL given a "base URL". The module has been designed to match the current Internet draft on Relative Uniform Resource Locators (and discovered a bug in an earlier draft!). It defines the following functions: - function of module urlparse: urlparse (URLSTRING[, DEFAULT_SCHEME[, ALLOW_FRAGMENTS]]) Parse a URL into 6 components, returning a 6-tuple: (addressing scheme, network location, path, parameters, query, fragment identifier). This corresponds to the general structure of a URL: `SCHEME://NETLOC/PATH;PARAMETERS?QUERY#FRAGMENT'. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (e.g. the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the tuple items, except for a leading slash in the PATH component, which is retained if present. Example: urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') yields the tuple ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '') If the DEFAULT_SCHEME argument is specified, it gives the default addressing scheme, to be used only if the URL string does not specify one. The default value for this argument is the empty string. If the ALLOW_FRAGMENTS argument is zero, fragment identifiers are not allowed, even if the URL's addressing scheme normally does support them. The default value for this argument is `1'. - function of module urlparse: urlunparse (TUPLE) Construct a URL string from a tuple as returned by `urlparse'. This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had redundant delimiters, e.g. a ? with an empty query (the draft states that these are equivalent). - function of module urlparse: urljoin (BASE, URL[, ALLOW_FRAGMENTS]) Construct a full ("absolute") URL by combining a "base URL" (BASE) with a "relative URL" (URL). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL. Example: urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') yields the string 'http://www.cwi.nl/%7Eguido/FAQ.html' The ALLOW_FRAGMENTS argument has the same meaning as for `urlparse'. File: pylibi, Node: sgmllib, Next: htmllib, Prev: urlparse, Up: Internet and WWW Standard Module `sgmllib' ========================= This module defines a class `SGMLParser' which serves as the basis for parsing text files formatted in SGML (Standard Generalized Mark-up Language). In fact, it does not provide a full SGML parser -- it only parses SGML insofar as it is used by HTML, and the module only exists as a base for the `htmllib' module. In particular, the parser is hardcoded to recognize the following constructs: * Opening and closing tags of the form "`'" and "` '", respectively. * Numeric character references of the form "`NAME;'". * Entity references of the form "`&NAME;'". * SGML comments of the form "`'". Note that spaces, tabs, and newlines are allowed between the trailing "`>'" and the immediately preceeding "`--'". The `SGMLParser' class must be instantiated without arguments. It has the following interface methods: - Method on SGMLParser: reset () Reset the instance. Loses all unprocessed data. This is called implicitly at instantiation time. - Method on SGMLParser: setnomoretags () Stop processing tags. Treat all following input as literal input (CDATA). (This is only provided so the HTML tag `' can be implemented.) - Method on SGMLParser: setliteral () Enter literal mode (CDATA mode). - Method on SGMLParser: feed (DATA) Feed some text to the parser. It is processed insofar as it consists of complete elements; incomplete data is buffered until more data is fed or `close()' is called. - Method on SGMLParser: close () Force processing of all buffered data as if it were followed by an end-of-file mark. This method may be redefined by a derived class to define additional processing at the end of the input, but the redefined version should always call `SGMLParser.close()'. - Method on SGMLParser: handle_starttag (TAG, METHOD, ATTRIBUTES) This method is called to handle start tags for which either a `start_TAG()' or `do_TAG()' method has been defined. The `tag' argument is the name of the tag converted to lower case, and the `method' argument is the bound method which should be used to support semantic interpretation of the start tag. The ATTRIBUTES argument is a list of (NAME, VALUE) pairs containing the attributes found inside the tag's `<>' brackets. The NAME has been translated to lower case and double quotes and backslashes in the VALUE have been interpreted. For instance, for the tag `', this method would be called as `unknown_starttag('a', [('href', 'http://www.cwi.nl/')])'. The base implementation simply calls `method' with `attributes' as the only argument. - Method on SGMLParser: handle_endtag (TAG, METHOD) This method is called to handle endtags for which an `end_TAG()' method has been defined. The `tag' argument is the name of the tag converted to lower case, and the `method' argument is the bound method which should be used to support semantic interpretation of the end tag. If no `end_TAG()' method is defined for the closing element, this handler is not called. The base implementation simply calls `method'. - Method on SGMLParser: handle_data (DATA) This method is called to process arbitrary data. It is intended to be overridden by a derived class; the base class implementation does nothing. - Method on SGMLParser: handle_charref (REF) This method is called to process a character reference of the form "`REF;'". In the base implementation, REF must be a decimal number in the range 0-255. It translates the character to ASCII and calls the method `handle_data()' with the character as argument. If REF is invalid or out of range, the method `unknown_charref(REF)' is called to handle the error. A subclass must override this method to provide support for named character entities. - Method on SGMLParser: handle_entityref (REF) This method is called to process a general entity reference of the form "`&REF;'" where REF is an general entity reference. It looks for REF in the instance (or class) variable `entitydefs' which should be a mapping from entity names to corresponding translations. If a translation is found, it calls the method `handle_data()' with the translation; otherwise, it calls the method `unknown_entityref(REF)'. The default `entitydefs' defines translations for `&', `&apos', `>', `<', and `"'. - Method on SGMLParser: handle_comment (COMMENT) This method is called when a comment is encountered. The `comment' argument is a string containing the text between the "`'" delimiters, but not the delimiters themselves. For example, the comment "`'" will cause this method to be called with the argument `'text''. The default method does nothing. - Method on SGMLParser: report_unbalanced (TAG) This method is called when an end tag is found which does not correspond to any open element. - Method on SGMLParser: unknown_starttag (TAG, ATTRIBUTES) This method is called to process an unknown start tag. It is intended to be overridden by a derived class; the base class implementation does nothing. - Method on SGMLParser: unknown_endtag (TAG) This method is called to process an unknown end tag. It is intended to be overridden by a derived class; the base class implementation does nothing. - Method on SGMLParser: unknown_charref (REF) This method is called to process unresolvable numeric character references. It is intended to be overridden by a derived class; the base class implementation does nothing. - Method on SGMLParser: unknown_entityref (REF) This method is called to process an unknown entity reference. It is intended to be overridden by a derived class; the base class implementation does nothing. Apart from overriding or extending the methods listed above, derived classes may also define methods of the following form to define processing of specific tags. Tag names in the input stream are case independent; the TAG occurring in method names must be in lower case: - Method on SGMLParser: start_TAG (ATTRIBUTES) This method is called to process an opening tag TAG. It has preference over `do_TAG()'. The ATTRIBUTES argument has the same meaning as described for `handle_starttag()' above. - Method on SGMLParser: do_TAG (ATTRIBUTES) This method is called to process an opening tag TAG that does not come with a matching closing tag. The ATTRIBUTES argument has the same meaning as described for `handle_starttag()' above. - Method on SGMLParser: end_TAG () This method is called to process a closing tag TAG. Note that the parser maintains a stack of open elements for which no end tag has been found yet. Only tags processed by `start_TAG()' are pushed on this stack. Definition of an `end_TAG()' method is optional for these tags. For tags processed by `do_TAG()' or by `unknown_tag()', no `end_TAG()' method must be defined; if defined, it will not be used. If both `start_TAG()' and `do_TAG()' methods exist for a tag, the `start_TAG()' method takes precedence. File: pylibi, Node: htmllib, Next: formatter, Prev: sgmllib, Up: Internet and WWW Standard Module `htmllib' ========================= This module defines a class which can serve as a base for parsing text files formatted in the HyperText Mark-up Language (HTML). The class is not directly concerned with I/O -- it must be provided with input in string form via a method, and makes calls to methods of a "formatter" object in order to produce output. The `HTMLParser' class is designed to be used as a base class for other classes in order to add functionality, and allows most of its methods to be extended or overridden. In turn, this class is derived from and extends the `SGMLParser' class defined in module `sgmllib'. Two implementations of formatter objects are provided in the `formatter' module; refer to the documentation for that module for information on the formatter interface. The following is a summary of the interface defined by `sgmllib.SGMLParser': * The interface to feed data to an instance is through the `feed()' method, which takes a string argument. This can be called with as little or as much text at a time as desired; `p.feed(a); p.feed(b)' has the same effect as `p.feed(a+b)'. When the data contains complete HTML tags, these are processed immediately; incomplete elements are saved in a buffer. To force processing of all unprocessed data, call the `close()' method. For example, to parse the entire contents of a file, use: parser.feed(open('myfile.html').read()) parser.close() * The interface to define semantics for HTML tags is very simple: derive a class and define methods called `start_TAG()', `end_TAG()', or `do_TAG()'. The parser will call these at appropriate moments: `start_TAG' or `do_TAG' is called when an opening tag of the form ` ' is encountered; `end_TAG' is called when a closing tag of the form ` ' is encountered. If an opening tag requires a corresponding closing tag, like ` ' ... `
', the class should define the `start_TAG' method; if a tag requires no closing tag, like `', the class should define the `do_TAG' method. The module defines a single class: - function of module htmllib: HTMLParser (FORMATTER) This is the basic HTML parser class. It supports all entity names required by the HTML 2.0 specification (RFC 1866). It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements. In addition to tag methods, the `HTMLParser' class provides some additional methods and instance variables for use within tag methods. - data of HTMLParser method: formatter This is the formatter instance associated with the parser. - data of HTMLParser method: nofill Boolean flag which should be true when whitespace should not be collapsed, or false when it should be. In general, this should only be true when character data is to be treated as "preformatted" text, as within a `
' element. The default value is false. This affects the operation of `handle_data()' and `save_end()'. - Method on HTMLParser: anchor_bgn (HREF, NAME, TYPE) This method is called at the start of an anchor region. The arguments correspond to the attributes of the `' tag with the same names. The default implementation maintains a list of hyperlinks (defined by the `href' argument) within the document. The list of hyperlinks is available as the data attribute `anchorlist'. - Method on HTMLParser: anchor_end () This method is called at the end of an anchor region. The default implementation adds a textual footnote marker using an index into the list of hyperlinks created by `anchor_bgn()'. - Method on HTMLParser: handle_image (SOURCE, ALT[, ISMAP[, ALIGN[, WIDTH[, HEIGHT]]]]) This method is called to handle images. The default implementation simply passes the `alt' value to the `handle_data()' method. - Method on HTMLParser: save_bgn () Begins saving character data in a buffer instead of sending it to the formatter object. Retrieve the stored data via `save_end()' Use of the `save_bgn()' / `save_end()' pair may not be nested. - Method on HTMLParser: save_end () Ends buffering character data and returns all data saved since the preceeding call to `save_bgn()'. If `nofill' flag is false, whitespace is collapsed to single spaces. A call to this method without a preceeding call to `save_bgn()' will raise a `TypeError' exception.