REXX Tutorial

A REXX client for the WWW

After having created a standalone application with its own high level protocol definition we now jump into public protocols already used in the TCP/IP world. One of those protocols is the HyperText Transfer Protocol, abbreviated as 'HTTP'. The specifications for this and other protocols are freely available in the Internet in so called 'RFCs' (abbreviation for Request For Comments). See the bibliography at the end of this tutorial where these documents can be found in the Internet.

The following sample program connects to a WWW server and retrieves information about the document specified as an 'URL' (Uniform Resource Locator) on the command line. As an example of the information available for the document it will print the date when the document was last modified.

In RFC 1945 which describes the HTTP protocol we can find the following information needed for this task:

The HEAD command can be sent in two formats: the simple request or the full request. The full request format of the HEAD command is defined as follows:



    HEAD documentname HTTP/1.0<CRLF>
    request header<CRLF>
For our purpose we don't need to pass additional options in the request header field so we can leave this field blank. However we may not omit the closing CRLF character pair terminating the request header field otherwise the server would not accept it as a valid command. The full request sent to a server will return a full response in the format:


    HTTP/1.0 statuscode reasonphrase<CRLF>
    response body<CRLF>
The HTTP specification lists several information fields for the response body that can appear in any order. Currently we are only interested in the Last-Modified field and ignore all other fields.

The following line shows a sample HEAD command sent to a server with the appropriate response:



    HEAD / HTTP/1.0<CRLF><CRLF>
Response from server:


    HTTP/1.0 200 OK<CRLF>
    Server: GoServe/2.45<CRLF>
    Date: Thu, 18 Jul 1996 15:40:47 GMT<CRLF>
    Content-Type: text/html<CRLF>
    Content-Length: 1081<CRLF>
    Content-Transfer-Encoding: binary<CRLF>
    Last-Modified: Thu, 19 Oct 1995 16:27:52 GMT<CRLF>
Since we are only interested in the date when the document has been last modified we have to search the response for this keyword. During development of this sample I discovered that most web servers use the exact string as shown above to identify this field, some other servers however don't. To be able to find the date in responses from all servers we can simply uppercase the whole string before searching the last-modified field.

This is already everything we need to know for our program. This is the implementation of the main program:



    /* SHOWDATE.CMD - IBM REXX Sample Program               */

    Parse Arg



    /* Load REXX Socket library if not already loaded       */

    If RxFuncQuery("SockLoadFuncs") Then

     Do

       Call RxFuncAdd "SockLoadFuncs","RXSOCK","SockLoadFuncs"
       Call SockLoadFuncs

     End



    /* retrieve the header of the document specified by URL */

    Header = GetHeader(URL)



    If Length(Header) \= 0 Then

      Do

        /* header could be read, find date                  */

        DocDate = GetModificationDate(Header)

        Say "Document date is:" DocDate

      End

    Else

      Say "Document information could not be retrieved."


    Exit

The 'Connect' function to connect to the server is exactly the same as already seen in the remote control application except that it now uses port number 80 if no port was specified by the caller:


    /********************************************************/

    /*                                                      */

    /* Function:  Connect                                   */

    /* Purpose:   Create a socket and connect it to server. */

    /* Arguments: Server - server name, may contain port no.*/

    /* Returns:   Socket number if successful, -1 otherwise */

    /*                                                      */

    /********************************************************/

    Connect: Procedure

      Parse Arg Server



      /* if the servername has a port address specified     */

      /* then use this one, otherwise use the default http  */

      /* port 80                                            */

      Parse Var Server Server ":" Port

      If Port = "" Then

        Port = 80



      /* resolve server name alias to dotted IP address     */

      rc = SockGetHostByName(Server, "Host.!")

      If rc = 0 Then

       Do

         Say "Unable to resolve server:" Server

         Return -1

       End



      /* create a TCP socket                                */

      Socket = SockSocket("AF_INET", "SOCK_STREAM", "0")

      If Socket < 0 Then

       Do

         Say "Unable to create socket"
         Return -1

       End



      /* connect the new socket to the specified server     */

      Host.!family = "AF_INET"
      Host.!port = Port

      rc = SockConnect(Socket, "Host.!")

      If rc < 0 Then

       Do

         Say "Unable to connect to server:" Server

         Call Close Socket

         Return -1

       End



      Return Socket

The 'SendCommand' function expects a single line command from the caller. As needed by the HTTP protocol two pairs of CRLF are appended to the command string to classify the command as a full request. After the command has been sent the function receives the response from the server until no more characters can be read and returns the response:


    /********************************************************/

    /*                                                      */

    /* Function:  SendCommand                               */

    /* Purpose:   Send a command via the specified socket   */

    /*            and return the full response to caller.   */

    /* Arguments: Socket - active socket number             */

    /*            Command - command string                  */

    /* Returns:   Response from server or empty string if   */

    /*            failed.                                   */

    /*                                                      */

    /********************************************************/

    SendCommand: Procedure

      Parse Arg Socket, Command



      /* append two pairs of CRLF to end the command string */

      Command = Command || "0D0A0D0A"x



      BytesSent = SockSend(Socket, Command)

      Response = ""
      Do Forever

        BytesRcvd = SockRecv(Socket, "RcvData", 1024)

        If BytesRcvd <= 0 Then

          Leave

        Response = Response || RcvData

      End



      Return Response

The 'Close' function is already well known from the previous samples:


    /********************************************************/

    /*                                                      */

    /* Procedure: Close                                     */

    /* Purpose:   Close the specified socket.               */

    /* Arguments: Socket - active socket number             */

    /* Returns:   nothing                                   */

    /*                                                      */

    /********************************************************/

    Close: Procedure

      Parse Arg Socket

      Call SockShutDown Socket, 2

      Call SockClose Socket

      Return

The 'GetHeader' function isolates the server name and document name from the passed URL, connects to the server, retrieves the full header information and closes the connection again, returning the full header to the caller:


    /********************************************************/

    /*                                                      */

    /* Function:  GetHeader                                 */

    /* Purpose:   Request the header for the specified URL  */

    /*            from the network.                         */

    /* Arguments: URL - fully specified document locator    */

    /* Returns:   Full header of specified document or      */

    /*            empty string if failed (also if no header */

    /*            exists).                                  */

    /*                                                      */

    /********************************************************/

    GetHeader: Procedure

      Parse Arg URL



      /* Isolate server name and document name, document    */

      /* name is always preceded with a slash               */

      Parse Var URL "http://" Server "/" Document

      Document = "/" || Document



      Socket = Connect(Server)

      If Socket = -1 Then

        Return ""


      Command = "HEAD" Document "HTTP/1.0"
      Header = SendCommand(Socket, Command)

      Call Close Socket

      Return Header

Finally the function 'GetModificationDate' searches the full header (which is passed in a single string) for the last modification date. As already mentioned we search only the uppercased header to avoid problems with some web servers. To find the last modification date it looks for the keyword "LAST-MODIFIED:" and a trailing linefeed character ("0A"x). The extracted modification date now could still contain leading or trailing blanks or carriage return characters that will be removed before the result is returned to the caller. Searching only for the linefeed character as a delimiter ensures that the program will also work with web servers that use only the UNIX style line separation character:


    /********************************************************/

    /*                                                      */

    /* Function:  GetModificationDate                       */

    /* Purpose:   Find the last-modified date in the passed */

    /*            header and return just the date.          */

    /* Arguments: Header - full header of document          */

    /* Returns:   Date string when document was last        */

    /*            modified or empty string if date was not  */

    /*            found.                                    */

    /*                                                      */

    /********************************************************/

    GetModificationDate: Procedure

      Parse Arg Header



      /* isolate date string and strip all unwanted chars   */

      Parse Upper Var Header "LAST-MODIFIED:" ModDate "0A"x

      ModDate = Strip(ModDate)

      ModDate = Strip(ModDate,,"0D"x)



      Return ModDate


[ IBM REXX homepage | Previos page | Next page | Tutorial Index | Object REXX homepage ]
[ IBM homepage | Order | Search | Contact IBM | Help | (C) | (TM) ]
This page is at http://www2.hursley.ibm.com/rexxtut/socktut6.htm