The following sample program connects to a WWW server and retrieves information about the document specified as an 'URL' (Uniform Resource Locator) on the command line. As an example of the information available for the document it will print the date when the document was last modified.
In RFC 1945 which describes the HTTP protocol we can find the following information needed for this task:
The HEAD command can be sent in two formats: the simple request or the full request. The full request format of the HEAD command is defined as follows:
HEAD documentname HTTP/1.0<CRLF> request header<CRLF>For our purpose we don't need to pass additional options in the request header field so we can leave this field blank. However we may not omit the closing CRLF character pair terminating the request header field otherwise the server would not accept it as a valid command. The full request sent to a server will return a full response in the format:
HTTP/1.0 statuscode reasonphrase<CRLF> response body<CRLF>The HTTP specification lists several information fields for the response body that can appear in any order. Currently we are only interested in the Last-Modified field and ignore all other fields.
The following line shows a sample HEAD command sent to a server with the appropriate response:
HEAD / HTTP/1.0<CRLF><CRLF>Response from server:
HTTP/1.0 200 OK<CRLF> Server: GoServe/2.45<CRLF> Date: Thu, 18 Jul 1996 15:40:47 GMT<CRLF> Content-Type: text/html<CRLF> Content-Length: 1081<CRLF> Content-Transfer-Encoding: binary<CRLF> Last-Modified: Thu, 19 Oct 1995 16:27:52 GMT<CRLF>Since we are only interested in the date when the document has been last modified we have to search the response for this keyword. During development of this sample I discovered that most web servers use the exact string as shown above to identify this field, some other servers however don't. To be able to find the date in responses from all servers we can simply uppercase the whole string before searching the last-modified field.
This is already everything we need to know for our program. This is the implementation of the main program:
/* SHOWDATE.CMD - IBM REXX Sample Program */ Parse Arg /* Load REXX Socket library if not already loaded */ If RxFuncQuery("SockLoadFuncs") Then Do Call RxFuncAdd "SockLoadFuncs","RXSOCK","SockLoadFuncs" Call SockLoadFuncs End /* retrieve the header of the document specified by URL */ Header = GetHeader(URL) If Length(Header) \= 0 Then Do /* header could be read, find date */ DocDate = GetModificationDate(Header) Say "Document date is:" DocDate End Else Say "Document information could not be retrieved." ExitThe 'Connect' function to connect to the server is exactly the same as already seen in the remote control application except that it now uses port number 80 if no port was specified by the caller:
/********************************************************/ /* */ /* Function: Connect */ /* Purpose: Create a socket and connect it to server. */ /* Arguments: Server - server name, may contain port no.*/ /* Returns: Socket number if successful, -1 otherwise */ /* */ /********************************************************/ Connect: Procedure Parse Arg Server /* if the servername has a port address specified */ /* then use this one, otherwise use the default http */ /* port 80 */ Parse Var Server Server ":" Port If Port = "" Then Port = 80 /* resolve server name alias to dotted IP address */ rc = SockGetHostByName(Server, "Host.!") If rc = 0 Then Do Say "Unable to resolve server:" Server Return -1 End /* create a TCP socket */ Socket = SockSocket("AF_INET", "SOCK_STREAM", "0") If Socket < 0 Then Do Say "Unable to create socket" Return -1 End /* connect the new socket to the specified server */ Host.!family = "AF_INET" Host.!port = Port rc = SockConnect(Socket, "Host.!") If rc < 0 Then Do Say "Unable to connect to server:" Server Call Close Socket Return -1 End Return SocketThe 'SendCommand' function expects a single line command from the caller. As needed by the HTTP protocol two pairs of CRLF are appended to the command string to classify the command as a full request. After the command has been sent the function receives the response from the server until no more characters can be read and returns the response:
/********************************************************/ /* */ /* Function: SendCommand */ /* Purpose: Send a command via the specified socket */ /* and return the full response to caller. */ /* Arguments: Socket - active socket number */ /* Command - command string */ /* Returns: Response from server or empty string if */ /* failed. */ /* */ /********************************************************/ SendCommand: Procedure Parse Arg Socket, Command /* append two pairs of CRLF to end the command string */ Command = Command || "0D0A0D0A"x BytesSent = SockSend(Socket, Command) Response = "" Do Forever BytesRcvd = SockRecv(Socket, "RcvData", 1024) If BytesRcvd <= 0 Then Leave Response = Response || RcvData End Return ResponseThe 'Close' function is already well known from the previous samples:
/********************************************************/ /* */ /* Procedure: Close */ /* Purpose: Close the specified socket. */ /* Arguments: Socket - active socket number */ /* Returns: nothing */ /* */ /********************************************************/ Close: Procedure Parse Arg Socket Call SockShutDown Socket, 2 Call SockClose Socket ReturnThe 'GetHeader' function isolates the server name and document name from the passed URL, connects to the server, retrieves the full header information and closes the connection again, returning the full header to the caller:
/********************************************************/ /* */ /* Function: GetHeader */ /* Purpose: Request the header for the specified URL */ /* from the network. */ /* Arguments: URL - fully specified document locator */ /* Returns: Full header of specified document or */ /* empty string if failed (also if no header */ /* exists). */ /* */ /********************************************************/ GetHeader: Procedure Parse Arg URL /* Isolate server name and document name, document */ /* name is always preceded with a slash */ Parse Var URL "http://" Server "/" Document Document = "/" || Document Socket = Connect(Server) If Socket = -1 Then Return "" Command = "HEAD" Document "HTTP/1.0" Header = SendCommand(Socket, Command) Call Close Socket Return HeaderFinally the function 'GetModificationDate' searches the full header (which is passed in a single string) for the last modification date. As already mentioned we search only the uppercased header to avoid problems with some web servers. To find the last modification date it looks for the keyword "LAST-MODIFIED:" and a trailing linefeed character ("0A"x). The extracted modification date now could still contain leading or trailing blanks or carriage return characters that will be removed before the result is returned to the caller. Searching only for the linefeed character as a delimiter ensures that the program will also work with web servers that use only the UNIX style line separation character:
/********************************************************/ /* */ /* Function: GetModificationDate */ /* Purpose: Find the last-modified date in the passed */ /* header and return just the date. */ /* Arguments: Header - full header of document */ /* Returns: Date string when document was last */ /* modified or empty string if date was not */ /* found. */ /* */ /********************************************************/ GetModificationDate: Procedure Parse Arg Header /* isolate date string and strip all unwanted chars */ Parse Upper Var Header "LAST-MODIFIED:" ModDate "0A"x ModDate = Strip(ModDate) ModDate = Strip(ModDate,,"0D"x) Return ModDate