Previous Page TOC Next Page See Page



— 5 —
Retrieving Web-Based Information


Thus far I have discussed the HTTP protocol, using DDE and OLE automation to control a few popular Web browsers, and using HTML custom controls. This chapter builds on the material about the HTTP protocol found in Chapter 2, "HTTP: How to Speak on the Web".

This chapter briefly reviews a few elements of the HTTP protocol. The HTTP request message methods are discussed, then the dsSocket OLE control is introduced. This control, in combination with the Windows Sockets (WinSock) interface, allows programs to connect to other machines on a TCP/IP network (the Internet, for example). With the dsSocket control you can create server and client applications using any Internet protocol. The control allows you to connect to a server or to listen on a TCP/IP network and send and receive data over the network. The control creates the TCP/IP packet messages, the Visual Basic program controls what data is sent. A demo version of this control is provided on the CD-ROM accompanying this book. You can also download the latest version of the control from the Dolphin Systems Web site.


http://www.dolphinsys.com

Next, the HTTP OLE control found in Microsoft's Internet Control Pack is introduced. This control is a client-side control that uses the HTTP protocol for retrieving data from Web servers or for POSTing information to a Web resource. The latest version of the Internet Control Pack can be downloaded from the Microsoft Web site.


http://www.microsoft.com/icp/

Finally, the chapter concludes by presenting a few examples using these two OLE controls. These examples demonstrate how easy it is to retrieve information from the Web using the OLE controls. They don't include any elaborate error or timeout handling so they should only be used in a controlled environment.

HTTP's GET and HEAD Methods Reviewed


For complete coverage of HTTP messaging, refer to Chapter 2 or to the HTTP Internet-Draft document that Chapter 2 is based on. The example programs later in this chapter use both of these methods to retrieve information from HTTP servers.

The GET and HEAD methods are used for retrieving information from an HTTP server. The GET method retrieves the entire content of a resource on the server. The HEAD method retrieves only the header information about a resource on the server. The two methods both have their places when creating Visual Basic applications to access Web-based information.

The GET Method


Applications that utilize or retrieve Web-based resources use the GET method the majority of the time. The HTTP protocol provides the GET method as a way of retrieving data resources (such as HTML documents and images) from network servers. When a client application (also referred to as a user agent) needs to parse information from an HTML file or to display an image file, the GET request method is used.

The HTTP/1.0 protocol defines two different GET requests. One is known as the simple request. This style of request is used to retrieve only the contents of a resource. No header information about the resource is returned from the server. When an HTTP/1.0-compliant server receives a simple request message, it must not send any header information within its reply message.

The other style is known as the full request message. This type of request message can itself contain header information that refines the request. When an HTTP/1.0-compliant server receives a full request message, it should reply with a full response message containing header information regarding the resource being retrieved in addition to the contents of the resource.

The formats for GET request messages are

Simple-Request = "GET" SP <Request-URI> CRLF
Full-Request = <Request-Line> *(<General-Header> | 
                     <Request-Header> | <Entity-Header>)
    CRLF
    [ <Entity-Body> ]

Request-Line = "GET" SP <Request-URI> SP <HTTP-Version> CRLF

The HEAD Method


The HTTP/1.0 specification also defines a HEAD request message. The HEAD request instructs an HTTP server to return only the header information about the specified resource. The contents of the resource itself are not returned. This method can be used when designing user agents that catalog Web documents, for example, to obtain information about a document without having to retrieve the entire document.

To use the HEAD method, you must issue a full-request message. The format for the message is

Full-Request = <Request-Line> *(<General-Header> | 
               <Request-Header> | <Entity-Header>)
    CRLF
    [ <Entity-Body> ]

Request-Line = "HEAD" SP <Request-URI> SP <HTTP-Version> CRLF

When the server responds to a HEAD message, it does not return an <Entity-Body> element (which, for a GET request, would contain the contents of the resource). Only the response message header information will be returned.

Introduction to dsSocket OCX


There are many TCP/IP controls available to the Visual Basic programmer. They range from controls that are not protocol-specific to controls that work only with specific protocols (such as FTP, NNTP, and SMTP).

The dsSocket custom control, produced by Dolphin Systems, is a Windows socket control (TCP/IP communications take place utilizing connections known as sockets). It is not protocol-specific. Instead, it allows you to create any type of application to communicate over a TCP/IP network. Applications developed using the control can, of course, use any of the Internet protocols, but they are not limited to a specific protocol.

Each connection you wish to establish using dsSocket must have its own instance of the dsSocket control on the Visual Basic form. For example, if the application will run as both a client and a server application and you expect to be able to listen and talk at the same time, you must have two dsSocket controls on your form. As a further example, some Web browsers use multiple connections to construct Web pages. If a Web page contains many pictures, for instance, the browser can use one connection for each picture it has to retrieve. This way, it can retrieve and display these images concurrently—it doesn't have to wait for each image to be read in a sequential manner. Coding this situation would again require one dsSocket instance for each concurrent connection to be made.

This chapter discusses using the dsSocket control to retrieve data from HTTP servers. The programs will, obviously, use the HTTP protocol discussed in Chapter 2 to retrieve this data. Other possible uses of the dsSocket control include using the SMTP protocol to send Internet mail from within Win/CGI applications (which are introduced in Chapter 6, "The Win/CGI Interface").

This section discusses some of the properties, methods, and events of the dsSocket control. The help file included with the control provides a complete reference to the entire set of properties, methods, and events. This discussion is limited to those that will be used in this book.

Properties


The control's properties provide the parameters to be used during the communication session. The control can be set up to act as either a server or client, depending on the property settings. The control also uses an Action property that, incredibly enough, causes the connections and communication to happen.

The custom property page allows you to set up most of the important properties in one place. The property page is shown in Figure 5.1.

Figure 5.1. The custom property page for the dsSocket control.

RemoteHost and RemoteDotAddr

These properties contain the IP address and (if available) the host name for the remote computer. If the RemoteHost property is set, either a name resolution service (which translates a host name into an IP address) must be available or a matching entry must exist in the Windows hosts file. This is necessary because connections are actually made using the standard IP dot address format.

By using either the FwdLookup method or setting the Action property to 6 (SOCK_ACTION_FWDLOOKUP), the value in RemoteHost is translated using forward lookup to an IP address and stored in the RemoteDotAddr property. Likewise, using the RevLookup method or setting Action to 7 (SOCK_ACTION_REVLOOKUP), converts the RemoteDotAddr property to a host name using reverse lookup and stores it in RemoteHost. If the name services are not available, these actions simply do not fill the corresponding properties.

Whenever multiple connections to the same server are required, a great deal of performance can be gained by using the RemoteDotAddr property. Internet hosts are typically named using host names (such as www.myserver.com) instead of IP addresses. To resolve the host name into an IP address, use the forward lookup method described previously. This translates the host name into an IP address to be stored in the RemoteDotAddr property. This only has to be done once for each assignment to the RemoteHost property.

RemotePort

This integer property specifies the port to be connected to on the remote computer. There are a number of well-known port assignments. These define standard port numbers to be used by specific services. For example, the well-known port for HTTP servers is 80.

The value specified must correspond to a port number on which the computer specified by RemoteHost/RemoteDotAddr is actively listening. If this property is set to zero, the ServiceName property is used to determine the port number on which to connect.

LocalDotAddr and LocalName

These properties set and retrieve the IP address and host name, respectively, of the machine on which the control is operating. The default values for these properties are determined by the TCP/IP settings for the machine. These settings are found in the Network Properties page (either in Control Panel or the Network Neighborhood folders). The properties are available only at run-time. Setting these properties to an empty string causes the property to be set to the default values.

LocalPort

When creating a server application, you should set the LocalPort property to the port number on which the application will listen. If you are implementing a standard protocol server (such as FTP or SMTP), you should use the well-known port number. Likewise, if you are creating both the client and server applications and they are not using a standard protocol, you should avoid using any of the well-known port numbers. This will prevent applications that are using a standard protocol connecting to your server accidentally.

DataSize

This property defines the maximum number of bytes that will be transferred to the Receive event. The default value is 2048, the minimum value is 100, and the maximum value is 32767.

LineMode and EOLChar

The LineMode property determines how data is received by the control. When LineMode is set to True, the control's Receive event is fired whenever when the character specified by the EOLChar property is received. If LineMode is True and the character specified in EOLChar is not found in the incoming data, the Receive event is fired after the number of bytes specified in the DataSize property are received.

If LineMode is False, data is transferred to the control as it is received from the network.

Action

Once the connection properties have been set up using the other properties, the Action property is used to change the state of the communication socket. Table 5.1 lists the available values for the property.

Table 5.1. Action Property Values.

Constant

Value

Action

SOCK_ACTION_CLOSE

1

Closes an open connection

SOCK_ACTION_CONNECT

2

Establishes (opens) a connection

SOCK_ACTION_LISTEN

3

Listen for incoming connection

SOCK_ACTION_FWDLOOKUP

6

Convert host name to IP address

SOCK_ACTION_REVLOOKUP

7

Convert IP address to host name


The three values we'll use most often are SOCK_ACTION_CONNECT, SOCK_ACTION_LISTEN, and SOCK_ACTION_CLOSE.

The SOCK_ACTION_CONNECT value is used when creating a client application. When the Action property is assigned this value, the control connects to the server specified in the RemoteHost or RemoteDotAddr properties. The port (or socket) used is specified by the Port property. Assigning this value is identical to using the Connect method.

The SOCK_ACTION_LISTEN value is used when creating a server application. When the Action property is assigned this value, the control begins listening on the port specified by the Port property for any incoming connections. Assigning this value is identical to using the Listen method.

The SOCK_ACTION_CLOSE value is used to close an open connection. Assigning this value is identical to using the Close method.

Send

This string property sets the data to be sent to the remote machine. The control must be connected to the remote machine or a run-time error occurs. The data is sent immediately unless the network is not ready. If the network is not ready, a run-time error occurs. In this case, the SendReady event is fired when the network is available, at which time the program should set the Send property again.

State

This property specifies the current status of the control. The property is set after setting the Action property. If an error occurs (firing the Exception event), the value of this property does not change to reflect the error condition. The possible values for the property are listed in Table 5.2

Table 5.2. State Property Values.

Constant

Value

State

SOCK_STATE_CLOSED

1

There is no open connection

SOCK_STATE_CONNECTED

2

There is an open connection

SOCK_STATE_LISTENING

3

The control is Listening for a connection

SOCK_STATE_CONNECTING

4

Waiting for a connection to be completed

SOCK_STATE_ERROR

5

There is an error

SOCK_STATE_CLOSING

6

The connection is closing

SOCK_STATE_UNKNOWN

7

The status is unknown

SOCK_STATE_BUSY

8

The network is busy



Methods


This section describes the methods available to the dsSocket control. All of these methods duplicate functions that can also be activated by setting the Action property.

Connect

Using the Connect method starts an attempt to connect to the machine specified by the RemoteHost/RemoteDotAddr on the Port specified. The connection is not completed until the Connect event is fired. This method is synonymous with setting the Action property to SOCK_ACTION_CONNECT.

Close

The Close method closes an open connection. Using this method is identical to setting the Action property to SOCK_ACTION_CLOSE.

Listen

This method opens a port to listen for incoming connections. The Accept event fires when an incoming connection request is received. If the control is actively listening, it cannot be used for sending data until the Listen is canceled. This method is identical to setting the Action property to SOCK_ACTION_LISTEN.

FwdLookup and RevLookup

These methods are used for doing forward and reverse name resolution. These concepts have previously been discussed with the Action, RemoteDotAddr, and RemoteHost properties. Using these methods is synonymous with setting the Action property to SOCK_ACTION_FWDLOOKUP and SOCK_ACTION_REVLOOKUP, respectively.

Events


The dsSocket events are fired whenever the state of the control changes. If there is no code for a specific event, the change of state is simply ignored. This section discusses the events in the order in which they take place when issuing an HTTP GET request (which is demonstrated by the example programs at the end of the chapter).

Connect

This event is fired after a connection is established with the remote machine. The connection sequence begins either by setting the Action property to SOCK_ACTION_CONNECT or by invoking the Connect method. Once the connection is established, the control can then exchange data with the server machine.

SendReady

This event indicates that the network is ready to receive data. The event is fired after a connection is established and also after the network changes from a not-ready state to a ready state.

It is possible that, when attempting to send data, the network is not in a position to transfer data. If this occurs, the Exception event will fire with the error code SOCK_ERR_WOULDBLOCK (error number 21035). The program must then wait for the SendReady event to fire, at which time the program can re-send the data.

Receive

The Receive event fires whenever the control receives data from the machine to which it is connected. If the LineMode property is set to True, the event will not fire until the character specified in EOLChar is received. This feature is useful when communicating with line-mode protocols such as HTTP, SMTP and POP. All messages in these protocols have carriage return and line-feed characters as the last characters of each line.

The data received is passed to the event using the event's ReceiveData parameter. The length of the data passed never exceeds the value of the DataSize property.

The usual processing for the Receive event is to append the received data to some data container (such as a string variable or a file) until a certain condition is met or the connection closes. The HTTP examples presented at the end of the chapter write the received data to a file and also append it to a textbox on the form. This continues until the HTTP server closes the connection, signalling the end of the resource being retrieved.

Close

This event occurs whenever an open connection closes. The event provides two parameters, ErrorCode and ErrorDesc, that specify whether the socket closed with errors. If there were no errors, the value of ErrorCode is zero and the value of ErrorDesc is "Socket closed." If an error did occur while the socket was closing, the ErrorCode parameter indicates the error number and the ErrorDesc provides a string description of the error.

Exception

The Exception event fires whenever an asynchronous error occurs. The event provides the ErrorCode and ErrorDesc parameters as described for the Close event.

If an active connection is aborted, the Exception event is fired instead of the Close event to signify that the connection closed prematurely.

Errors occurring while setting control properties are still trapped using the standard Err and Error variables.

Listen

The Listen event occurs whenever the Action property has been set to SOCK_ACTION_LISTEN or the Listen method was invoked and the control is now set to actively listen on the network. The event signifies that the control is now able to receive incoming connections from client machines.

Accept

This event fires whenever a control that is actively listening receives a connection request. This event passes a SocketID parameter which should be assigned to the Socket property of an available dsSocket control. This topic is covered in the control's manual and won't be used in this book.

Microsoft's Internet Control Pack HTTP Client Control


Microsoft has recently introduced a set of controls known as the ActiveX Internet Control Pack (ICP). These controls are each designed to handle a specific Internet protocol. The controls hide the implementation details of the protocol from the programmer. In so doing, a program can be written to access a specific type of server with just a few lines of code. This section discusses the HTTP client control.

Unlike accessing HTTP servers using the dsSocket control, the ICP's HTTP control handles creating and sending the request messages. The Visual Basic program merely specifies the URL to be retrieved, invokes the GetDoc method, then uses the DocOutput event to receive the incoming document. In addition to this, if the program is merely going to save the resource to a local file, the program simply sets the Filename property before invoking GetDoc. Then the HTTP control automatically writes the received data to the file specified without firing the DocOutput event. No further data processing is required of the program.

The HTTP control provides no mechanism for parsing or displaying the retrieved data. It is up to the application to determine what to do with the data that is received. Microsoft has provided the HTML control to act as a complete HTML browser.

The Internet Control Pack was still in beta testing at the time of this writing. The material covered here may not be accurate for later releases of these controls. As an example, the documentation shipped with the ICP listed a method called PerformRequest. This method, however, was not actually implemented in the control. Most of the discussion here is of a general enough nature to be fairly accurate.

Properties


The HTTP control's properties specify the server to connect to, the document to be retrieved, and the means to be used in retrieving that document.

RemoteHost and RemotePort

The RemoteHost property specifies either the host name or the IP address of the machine containing the document to be retrieved. The value can be set to either addressing style (IP dot address or host name).

The RemotePort property specifies the port number to connect to on the remote machine. For the HTTP control, the default port number is 80.

Document

This string property sets or returns the name of the document to be retrieved. By combining the Document property with the RemoteHost property, you can construct the URL for the document.

URL

This property can be used instead of the RemoteHost and Document to specify the document to be retrieved. For the HTTP control, the protocol identifier can be omitted from the URL. If this is the case, it will default to http:.

Method

This property specifies the HTTP request method to be used for the transfer. These methods are discussed in detail in Chapter 2. The values available are listed in Table 5.3.

Table 5.3. Method Property Values.

Constant

Value

Method

prcGet

1

Get (default)

prcHead

2

Head

prcPost

3

Post

prcPut

4

Put



NotificationMode

This property is similar to the LineMode property found in the dsSocket control. Instead of searching for an end-of-line character, however, setting the NotificationMode property to zero (the default) delays notification of data received until the entire response message is received. If the property is set to one, an event is fired for each received piece of data. For most operations, it is best to leave the property at its default value and parse the retrieved data after it is completely received.

DocOutput

The DocOutput property provides a reference to the control's DocOutput object. The DocOutput object contains information about the document being received. A reference to the DocOutput object is also passed as a parameter to the DocOutout event.

The properties of this object include BytesTotal and BytesTransferred, which provide the total size of the file and the number of bytes that have currently been transferred, respectively.

The object also contains a DocHeader collection. The collection is referred to by the Headers property of the DocOutput object. This collection contains the HTTP header fields returned by the server. The items in the collection have only two properties: Name and Value. The collection is used by referring to the collection's Item collection. The Headers collection has a Count property which provides a count of the number of headers that exist.

For example, a header field may be accessed with

Debug.Print HTTP1.DocOutput.Headers.Item("content-type").Value

which prints the value of the Content-Type header field to the Visual Basic Debug window. Or, to print all the headers to the debug window, use

with HTTP1.DocOutput.Headers
    for x% = 1 to .Count
        Debug.Print.Item(x%).Name & ": " & .Item(x%).Value
    next
end with

Another useful property of the DocOutput object is FileName. If this property is assigned a valid filename and the HTTP control's GetDoc method is invoked, the document retrieved is stored in the specified file. No further processing is required by the program, the HTTP control handles the entire process automatically. This property is illustrated in one of the examples at the end of this chapter.

The State property indicates the status of the current transfer. The property is read only and is always set to one of the values listed in Table 5.4.

Table 5.4. State Property Values.

Constant

Value

State

icDocNone

0

No transfer is in progress

icDocBegin

1

Transfer is being initiated

icDocHeaders

2

Headers are transferred (or requested)

icDocData

3

Data is available (or requested)

icDocError

4

An error has occurred

icDocEnd

5

Transfer is complete



Methods


The GetDoc method discussed next is invoked most often. This method provides the means for retrieving HTTP server-based documents. A few other methods are available but don't warrant discussion here. The second method discussed (GetData) is a method of the DocOutput object contained within the HTTP control.

GetDoc

Invoking this method initiates a request message to retrieve a document. The document to be retrieved can be specified using the method's optional parameters. If the parameters are not specified, the values of the URL, Headers, and DocOutput's OutputFile properties are used. The syntax for this method is

object.GetDoc [URL,] [Headers,] [OutputFile]

The URL parameter specifies the complete URL for the document to be retrieved. The URL includes the server's host name or IP address, the port number, and the document name. The Headers parameter contains a reference to the Headers collection of the HTTP control's DocInput object. It is of type DocHeaders. The Headers parameter is used to specify request message header fields for the request being issued. If the OuputFile parameter is provided, it specifies a local file into which the document being retrieved is placed. Similar to the OutputFile property of the DocOutput object, if this parameter is specified the DocOutput event is not fired when data is received. Instead, the data is placed into the specified file.

When data is received by the HTTP control in response to a GetDoc, the DocOutput event is fired. The program should then check the State property of the DocOutput object to determine the control's status. If the status indicates that data is available, the GetData method of the DocOutput object is invoked to actually retrieve the data.

GetData

The DocOutput object provides a GetData method which is used to retrieve the data from the receive buffer. This method can only be invoked from within the DocOutput event. When data is available (State = icDocData) this method is used to fetch the data into a variable. The syntax is

object.GetData data, [type]

The data parameter is where the retrieved data is stored. The variable can be of any data type. The optional Type parameter is a Long which specifies the type of data to be retrieved. The values use the intrinsic variables for specifying Visual Basic data types (vbInteger and vbString, for example).

Events


When using the HTTP control, there are a few events that are used most often. Of course, depending on the application you're developing, you may need to use all of the events the control provides. The examples at the end of this chapter are not meant to be commercial-quality applications. They only use most of the control's the events to provide feedback to the user — in a "real" application the code should include more robust error and timeout handling.

The events discussed in this section are DocOutput, Error, StateChanged, and TimeOut.

DocOutput

The HTTP control's DocOutput event is analogous to the dsSocket control's receive event. The DocOutput event is fired whenever data has been received by the HTTP control.

The event passes one parameter, also named DocOutput, to the Visual Basic program. This DocOutput parameter is simply a reference to the control's DocOutput property. The State property of this DocOutput object should be examined to determine the status of the data stream. The possible values for the State property are listed in Table 5.4.

There is a property of the HTTP control named NotificationMode which controls how often this event is fired. If NotificationMode is set to zero (the default value), the DocOutput event does not fire until an entire portion of the HTTP response message is received by the control. For example, the event fires once after all the header fields have been received. In this case, the DocOutput.State property is set to icDocHeaders and the Headers collection of DocOutput is available. The event fires again when all the entity body portion of the response message has been received. The State in this case will be icDocData and the GetData method of DocOutput should be invoked to retrieve the data.

If NotificationMode is set to one, however, the DocOutput event fires continuously as data is received by the control.

Error

This event occurs whenever there is an error in background data processing. The event provides quite a few parameters describing the error and that determine how the appropriate error message will be displayed.

StateChanged

This event fires whenever there is a change-of-state in the control. The event provides a parameter named State which provides the current state. The possible values of this parameter are explained in Table 5.5.

This event is useful for updating status feedback controls such as status bars. It can also be useful when developing tightly controlled state-machine driven applications.

Table 5.4. State Parameter Values.

Constant

Value

State

prcConnecting

1

Connect requested, waiting for acknowledgement from the server

prcResolvingHost

2

Resolving host name to IP address

prcHostResolved

3

Resolved the host name

prcConnected

4

Connection established

prcDisconnecting

5

Close/disconnect has been initiated

prcDisconnected

6

Not connected



Off and Surfing: Some Basic Examples


The examples presented in this chapter are fairly simple. However, they do illustrate the key concepts that are necessary when creating a user agent that accesses Web-based information. These concepts include parsing the provided URL to extract the remote host name, the remote port number, and the resource name, as well as properly formatting the HTTP request message to allow the Web server to properly process the request.

Both examples allow the user to enter a URL and retrieve the resource it represents. The user can choose the GET, HEAD, or POST request method and whether or not to save the resource to a local file. The first example utilizes the dsSocket control and is the more complex of the two. This should not be surprising if you have read the previous two sections of this chapter. The second example utilizes the Microsoft HTTP Client control to accomplish the same feats. The code required, however, is much smaller than with the dsSocket control.

These examples don't include a lot of error handling or timeout processing. You should only use them in a tightly controlled environment until you feel comfortable with them. For example, as I write I am running the WebSight HTTP server and Visual Basic on a laptop. All my URL's reference localhost which is defined in the laptop's HOSTS file as IP address 127.0.0.1. This is the standard "loopback" IP address that refers to the local machine.

Example Using the dsSocket Control


The example using the dsSocket control (see Figure 5.2) is based on the dsWeb sample project that is shipped with the dsSocket control. I have added quite a bit to the code, however, to demonstrate the available HTTP/1.0 protocol features, as well as the ability to save the resource locally.

Figure 5.2. The dsSocket example application's form.

Designing the Form

The form layout is shown in Figure 5.2. Beyond the standard Visual Basic controls, the project requires the dsSocket control and the Microsoft Common Dialog control. The form definition is given in Listing 5.1. Place the controls on your form and assign the properties as specified in Listing 5.1.

Listing 5.1. Form Definition

Begin VB.Form frmMain
   BorderStyle     =   3  'Fixed Dialog
   Caption         =   "World Wide Web Resource Viewer/Grabber"
   ClientHeight    =   6045
   ClientLeft      =   1155
   ClientTop       =   420
   ClientWidth     =   5175
   ForeColor       =   &H80000008&
   Height          =   6450
   KeyPreview      =   -1  'True
   Left            =   1095
   LinkTopic       =   "Form1"
   MaxButton       =   0   'False
   MinButton       =   0   'False
   ScaleHeight     =   6045
   ScaleWidth      =   5175
   ShowInTaskbar   =   0   'False
   Top             =   75
   Width           =   5295
   Begin VB.TextBox Text1
      Height          =   4035
      Left            =   120
      MultiLine       =   -1  'True
      ScrollBars      =   3  'Both
      TabIndex        =   5
      Top             =   1500
      Width           =   4875
   End
   Begin VB.CommandButton btnGetFile
      Appearance      =   0  'Flat
      BackColor       =   &H80000005&
      Caption         =   "Save Resource Locally"
      Height          =   300
      Index           =   1
      Left            =   3000
      TabIndex        =   9
      Top             =   540
      Width           =   1995
   End
   Begin VB.ComboBox cboMethod
      Appearance      =   0  'Flat
      Height          =   315
      Left            =   3420
      Style           =   2  'Dropdown List
      TabIndex        =   7
      Top             =   1020
      Width           =   1575
   End
   Begin VB.CheckBox chkProtocol
      Caption         =   "Use HTTP/1.0 Format"
      Height          =   255
      Left            =   120
      TabIndex        =   6
      Top             =   1080
      Value           =   1  'Checked
      Width           =   2355
   End
   Begin VB.TextBox txtStatus
      Height          =   285
      Left            =   120
      TabIndex        =   3
      Top             =   5640
      Width           =   4935
   End
   Begin VB.CommandButton btnGetFile
      Appearance      =   0  'Flat
      BackColor       =   &H80000005&
      Caption         =   "View Resource"
      Height          =   300
      Index           =   0
      Left            =   1140
      TabIndex        =   2
      Top             =   540
      Width           =   1545
   End
   Begin VB.TextBox txtURL
      Height          =   285
      Left            =   1125
      TabIndex        =   0
      Text            =   "http://localhost/"
      Top             =   120
      Width           =   3870
   End
   Begin MSComDlg.CommonDialog dlgFileSave
      Left            =   5220
      Top             =   600
      _Version        =   65536
      _ExtentX        =   847
      _ExtentY        =   847
      _StockProps     =   0
      CancelError     =   -1  'True
      DialogTitle     =   "Save Resource As..."
   End
   Begin VB.Label Label2
      Alignment       =   1  'Right Justify
      Caption         =   "Method: "
      Height          =   255
      Left            =   2640
      TabIndex        =   8
      Top             =   1080
      Width           =   735
   End
   Begin dsSocketLib.dsSocket DSSocket1
      Height          =   420
      Left            =   90
      TabIndex        =   4
      Top             =   360
      Width           =   420
      _version        =   65542
      _extentx        =   741
      _extenty        =   741
      _stockprops     =   64
      localport       =   0
      remotehost      =   ""
      remoteport      =   0
      servicename     =   ""
      remotedotaddr   =   ""
      linger          =   -1  'True
      timeout         =   10
      linemode        =   0   'False
      eolchar         =   10
      bindconnect     =   0   'False
      sockettype      =   0
   End
   Begin VB.Label Label1
      Alignment       =   1  'Right Justify
      Appearance      =   0  'Flat
      BackColor       =   &H00C0C0C0&
      Caption         =   "URL :"
      ForeColor       =   &H80000008&
      Height          =   195
      Left            =   240
      TabIndex        =   1
      Top             =   135
      Width           =   825
   End
End

The combo box cboMethod allows the user to select which of the HTTP/1.0 request methods to use. The GET and HEAD methods require no user input apart from the URL. The list items are added to the combo box in the Form_Load event. Then the cboMethod.ListIndex property is set to the "GET" item.

The POST method requires that the user enter data in the large textbox. The data entered when performing a POST should be form encoded, meaning that it looks like the data sent when a Web browser performs a POST request. This format contains the field name, an equal sign, and the field value. If more than one field is present, the fields are separated by ampersand characters. The line

Name1=Craig&Name2=Susan&Fetch=Water

contains three fields, Name1, Name2, and Fetch; each with their respective values. When the line is sent to the HTTP server, it is terminated with a NULL character (Chr$(0)).

The check box chkProtocol allows the user to select between using the HTTP/1.0 protocol and the HTTP/0.9 protocol. The versions of the HTTP protocol prior to HTTP/1.0 provide only the GET request method. Therefore, when the check box is not checked, the dropdown listbox is disabled. The program only uses a simple request message.

The textbox txtURL is where the user enters the address of the resource to be retrieved. The URL should be in the form

http://www.myhost.com/resource.htm

or (to retrieve the server's default document)

http://www.myhost.com/

to be a valid URL. There are two functions that parse the entered URL into a host name (www.myhost.com) and a filename (resource.htm). These routines are discussed in the next section.

The textbox txtStatus is used to provide status information to the user. This includes informing the user that a connection has been established, that the program is waiting to receive data from the server, or that an Exception event has occurred.

The GetHostFromURL() and GetFileFromURL() Functions

These two routines are taken from the dsWeb sample that ships with the dsSocket control. They are used to parse the host and filenames from a URL. The routines depend on the URL being valid. If the URL is invalid, the routines return an empty string.

The GetHostFromURL() (Listing 5.2) retrieves the host name from the URL. The host name is the portion of the URL that occurs between the "//" and the first "/" characters. If the "//" is not present, GetHostFromURL() considers the URL to be invalid and returns an empty string.

Listing 5.2. GetHostFromURL() Function

Private Function GetHostFromURL(szURL As String) As String
    '   parse out the hostname from a valid URL
    '   the URL should be of the format: http://www.microsoft.com/index.html
    '   the returned hostname would then be: www.microsoft.com
    Dim szHost      As String
    Dim lPos%
    szHost = szURL
    '   invalid URL
    If InStr(szHost, "//") = 0 Then
        GetHostFromURL = ""
        Exit Function
    End If
    szHost = Mid(szHost, InStr(szHost, "//") + 2)
    lPos% = InStr(szHost, "/")
    If lPos% = 0 Then
        GetHostFromURL = szHost
        Exit Function
    Else
        GetHostFromURL = Left(szHost, lPos% - 1)
        Exit Function
    End If
End Function

The GetFileFromURL() function parses the resource's filename from the supplied URL. If the name is not provided, an empty string is returned. The function is shown in Listing 5.3. The routine first validates the URL. The URL must contain the "//" characters. A temporary string (szFile) is used to hold the filename. The portion of the szURL parameter up to and including the "//" is removed from the string. Finally, the function searches the string for the first occurrence of a single slash ("/"). If one is found, the portion of the string following the slash is returned. If a slash is not found, an empty string is returned. This situation is possible for URLs attempting to retrieve the default resource of a Web server.

Listing 5.3. GetFileFromURL() Function

Private Function GetFileFromURL(szURL As String) As String
    '   parse out the filename from a valid URL
    '   the URL should be of the format: http://www.microsoft.com/index.html
    '   the returned filename would then be: index.html
    Dim szFile      As String
    szFile = szURL
    '   invalid URL
    If InStr(szFile, "//") = 0 Then
        GetFileFromURL = ""
        Exit Function
    End If
    szFile = Mid(szFile, InStr(szFile, "//") + 2)
    If InStr(szFile, "/") = 0 Then
        GetFileFromURL = ""
        Exit Function
    Else
        GetFileFromURL = Mid(szFile, InStr(szFile, "/") + 1)
        Exit Function
    End If
End Function

The Form's Declarations Section

The declarations section for the form defines some constants and the form-level variables the program uses. These are shown in Listing 5.4 but the variables won't be discussed until the sections in which they are used are covered.

Listing 5.4. Declarations

Option Explicit
'   Declare the constants used to set the Action property
'   and check the State of the socket
Const SOCK_STATE_CLOSED = 1
Const SOCK_STATE_CONNECTED = 2
Const SOCK_STATE_LISTENING = 3
Const SOCK_STATE_CONNECTING = 4
Const SOCK_STATE_ERROR = 5
Const SOCK_STATE_CLOSING = 6
Const SOCK_STATE_UNKNOWN = 7
Const SOCK_STATE_BUSY = 8
Const SOCK_ACTION_CLOSE = 1
Const SOCK_ACTION_CONNECT = 2
Const SOCK_ACTION_LISTEN = 3
Dim lBytesRcvd  As Long
Dim bGettingContent As Integer
Dim iFileHandle As Integer
Dim fSaveToFile As Integer

Miscellaneous Routines

This section contains the code for the many miscellaneous routines contained in the application. These are all event procedures and don't warrant much discussion. They are all contained in Listing 5.5.

Listing 5.5. Various Event Procedures

Private Sub cboMethod_Click()
    'don't allow local save for HEAD method
    If cboMethod.ListIndex = 1 Then
        btnGetFile(1).Enabled = False
    Else
        btnGetFile(1).Enabled = True
    End If
End Sub
Private Sub chkProtocol_Click()
    'the request method can only be specified if
    'we're using HTTP/1.0
    If chkProtocol.Value = 1 Then
        cboMethod.Enabled = True
    Else
        cboMethod.Enabled = False
    End If
End Sub
Private Sub txtURL_KeyPress(KeyAscii As Integer)
    If KeyAscii = vbKeyReturn Then
        Call btnGetFile_Click(0)
        KeyAscii = 0
    End If
End Sub
Private Sub Form_KeyPress(KeyAscii As Integer)
    If KeyAscii = vbKeyEscape Then
        If dsSocket1.State = SOCK_STATE_CONNECTED Then
            dsSocket1.Close
            txtStatus = "Connection cancelled..."
        End If
        KeyAscii = 0
    End If
End Sub
Private Sub Form_Load()
    'add the methods to the combo box
    cboMethod.AddItem "GET"
    cboMethod.AddItem "HEAD"
    cboMethod.AddItem "POST"
    cboMethod.ListIndex = 0
End Sub

The cboMethod_Click procedure disables the Save Resource Locally command button if the selected method is HEAD. This is necessary because the contents of the resource are not returned in response to a HEAD request—the server returns only the response message header fields.

The chkProtocol_Click procedure disables the cboMethod combobox if the checkbox is not checked. This is necessary because the versions of the HTTP protocol prior to HTTP/1.0 only support the GET method.

The txtURL_KeyPress event is used to trigger the View Resource command button if the Enter key is pressed. Although this could also have been accomplished by setting the command button's Default property to True, I chose this method to prevent firing the command button by pressing the Enter key while in the Text1 textbox. Recall that for a POST request text must be entered into Text1.

The Form_KeyPress event is used to abort an open connection if the Escape key is pressed. If the HTTP server is not responding, it is necessary to close the connection. Otherwise, further attempts to retrieve resources would result in an error because the dsSocket control would already have an open connection.

Finally, the Form_Load event simply adds the available methods to the cboMethod combobox and sets the ListIndex property of the combobox to the first item (GET).

The Command Buttons in Action

The two command buttons on the form are the principal means the user has to retrieve the resource specified in the URL textbox. The user can also press the Enter key while the URL textbox has focus. This action activates the View Resource button.

The command buttons are part of a control array. The Save Resource Locally button (Index = 1) utilizes the same code as the View Resource button (Index = 0) and merely adds the code that handles the opening of the output file into which the resource will be saved.

The code is given in Listing 5.6. It's pretty straightforward so I'll just highlight the basics of what's going on here.

Listing 5.6. The btnGetFile Click Event

Private Sub btnGetFile_Click(index As Integer)
    Dim szHost As String, tmp$
    On Error Resume Next
    If index = 1 Then
        fSaveToFile = True
        iFileHandle = FreeFile
        'the file name to use is after the last "/"
        tmp$ = GetFileFromURL(txtURL)
        While InStr(tmp$, "/")
            tmp$ = Mid$(tmp$, InStr(tmp$, "/") + 1)
        Wend
        dlgFileSave.filename = tmp$
        dlgFileSave.Flags = cdlOFNExplorer + cdlOFNLongNames + _
                cdlOFNNoReadOnlyReturn + cdlOFNOverwritePrompt
        dlgFileSave.Action = 2
        If Err Then
            fSaveToFile = False
            If Err <> cdlCancel Then
                MsgBox "An error occurred opening the file: " & Error$
            End If
        Else
            Kill dlgFileSave.filename
            Err = 0             'clear any errors caused during Kill
            Open dlgFileSave.filename For Binary Access Write As #iFileHandle
            If Err Then
                Err = 0
                fSaveToFile = False
                MsgBox "An error occurred opening the file: " & Error$
            End If
 End If
        bGettingContent = False
    Else
        fSaveToFile = False
    End If
    lBytesRcvd = 0
    '   set to line mode for incoming data
    dsSocket1.LineMode = True
    dsSocket1.EOLChar = 10
    szHost = GetHostFromURL(txtURL)
    If (szHost = "") Then
        MsgBox "Invalid URL supplied."
        Exit Sub
    End If
    dsSocket1.RemoteHost = szHost
    dsSocket1.RemotePort = 80       ' use the default port
    txtStatus = "Connecting..."
    '   connect the socket, Connect or SendReady event will
    '   signify it's ok to send data
    dsSocket1.Connect
    If (Err > 0) Then
        MsgBox "Error connecting to host" & Chr(13) & Format(Err) & ":" & Error
    End If
End Sub

If the Index parameter is 1, the user has clicked the Save Resource Locally button. In this case, the code sets the fSaveToFile flag to True. This flag is used in other places to determine if data should be written to the file specified by iFileHandle. After a value is obtained for iFileHandle, the filename of the resource is parsed from the URL. Because the value returned by GetFileFromURL() may contain some server-based directory information, the routine uses only the portion of the filename after the last slash character.

The program next prompts the user for a local file in which to save the resource. This is done using the Visual Basic Common Dialog control (dlgFileSave). The Save As dialog will be used. The parsed filename the code just obtained is used as the initial filename for the common dialog. The Flags property is set to use the Explorer style dialog (cdlOFNExplorer), to allow long filenames (cdlOFNLongNames), to not allow read-only files (cdlOFNNoReadOnlyReturn), and to verify an overwrite attempt if the user selected an existing file (cdlOFNOverwritePrompt).

The Action property is then set to 2 to open the Save As dialog. The dlgFileSave control's CancelError property is set to True to produce a run-time error if the user presses the Cancel button on the dialog. If there is no error set after setting the Action property, the Filename property contains the full path to the file the user specified. If Err has a value greater than zero but not equal to cdlCancel (the value assigned when the Cancel button is pressed), a message box is displayed. If an error occurs, the fSaveToFile flag is set back to False.

If the user has chosen a valid filename, the program next attempts to open that file for output. The first step is to delete the file. This is done using the Kill statement. If the file does not exist, an error occurs. In this situation, however, it doesn't really matter because the program is just going to overwrite it anyway. So the code resets Err to 0. Next, the code opens the file for binary write access. Binary is used because the user may not be retrieving a character-based resource. The URL may specify a GIF image or a sound file for example. The use of binary access allows the program to store any type of data within the file.

Finally, the flag bGettingContent is reset to False. This flag is used in the dsSocket's Receive event.

If the Index parameter was not 1, the fSaveToFile flag is set to False.

The program uses a variable called lBytesReceived to keep track of the number of bytes that have been received. This variable is reset to zero.

The next two lines of code set up the receive mode of the socket control. Most HTTP data is transmitted in complete lines that end with line feed characters (Chr$(10)). The LineMode property is set to True. This causes the Receive event to be fired only after the character specified by the EOLChar property is received. The EOLChar property is set to 10.

The next step is to get the host name from the URL. This is done using GetHostFromURL(). If an invalid URL was specified, a message box is opened and the procedure is aborted. The host name is assigned to the RemoteHost property, and the RemotePort property is set to the default HTTP port (80).

Finally, the Connect method of the dsSocket control is invoked. If the user specified a host name that cannot be resolved to an IP address, a run-time error occurs. The last section of code displays such an error.

Note that the Connect method returns immediately. If a timeout occurs, it will be handled in the Exception event. The only errors returned by the Connect method are those that prevent a connection from taking place. These include invalid IP addresses as well as the server refusing the connection.

When the program successfully connects to the specified server, the Connect and SendReady events fire. The SendReady event is where the program constructs and transmits the HTTP request message.

Miscellaneous dsSocket Events

Several of the dsSocket events are used to present status information to the user. They represent changes of protocol state during the retrieval process. All of them update the txtStatus textbox with a description of the socket's current state. Listing 5.7 contains the code for these events.

Listing 5.7. The Miscellaneous dsSocket Events

Private Sub dsSocket1_Connect()
    txtStatus = "Connected..."
End Sub
Private Sub dsSocket1_Close(ErrorCode As Integer, ErrorDesc As String)
    txtStatus = "Connection closed..." & lBytesRcvd & " bytes received."
    If fSaveToFile Then
        bGettingContent = False
        Close #iFileHandle
    End If
End Sub
Private Sub dsSocket1_Exception(ErrorCode As Integer, ErrorDesc As String)
    txtStatus = "dsSocket1 Error : " & ErrorCode & ":" & ErrorDesc
End Sub

The first event encountered is the Connect event. This event is fired when the connection to the server has been completed. The event merely changes the text in the status box to "Connected..."

The Close event is fired when the HTTP server closes the connection. This signifies the end of the server's response message. The code updates the status box to inform the user that the retrieval has been completed. It also provides the total number of bytes received from the server. If the resource was being saved to a local file, the file is closed.

Any communication errors occurring after the dsSocket control has connected to the server cause the Exception event to be fired. The code in this event merely displays the error information in the status box. A more complete application would attempt to make sense of the error condition and take appropriate action.

The dsSocket SendReady Event

After the control has connected to the server and the network is prepared to send data, the SendReady event is fired. This signifies that the program can now proceed to send the HTTP request message. The code for the event is given in Listing 5.8.

Listing 5.8. The dsSocket SendReady Event

Private Sub dsSocket1_SendReady()
    Dim szFile      As String
    On Error Resume Next
    szFile = GetFileFromURL(txtURL)
    '   send the URL request
    If chkProtocol.Value Then
        dsSocket1.Send = cboMethod.Text & " /" & szFile & " HTTP/1.0" & vbCrLf
        'use the Accept header to make sure the server will send us everything:
        dsSocket1.Send = "Accept: */*" & vbCrLf
        If cboMethod.Text = "POST" Then
            dsSocket1.Send = "Content-Type: application/x-www-form-urlencoded" _
                              & vbCrLf
            dsSocket1.Send = "Content-Length: " & _
                        Trim$(Str$(Len(Text1.Text) + 2)) & vbCrLf & vbCrLf
            dsSocket1.Send = Text1.Text & Chr$(0) & vbCrLf
        Else
            dsSocket1.Send = vbCrLf
        End If
    Else
        bGettingContent = True
        dsSocket1.Send = "GET /" & szFile & vbCrLf
    End If
    Text1.Text = ""
    txtStatus = "Waiting for reply..."
End Sub

Once the housekeeping tasks of defining variables, setting the error handling method, and updated the status box are out of the way, the real work of this application begins.

The first step is to parse the resource's filename from the URL the user entered. This is accomplished using GetFileFromURL().

The code then checks the state of the chkProtocol checkbox. If the checkbox is checked, the user has indicated that the request message should conform to the HTTP/1.0 protocol (the request message format is reviewed earlier in this chapter). Otherwise, a simple request message will be transmitted.

Outgoing characters are assigned to the dsSocket's Send property. For the HTTP/1.0 message, the cboMethod combobox's current setting is used as the request method. The code then appends a slash (/) and the resource's filename. This is followed by the string "HTTP/1.0," which indicates an HTTP/1.0 request is being made. All lines in HTTP messages end with the carriage return/line feed combination (vbCrLf).

Next, the request is modified by adding a request header field to the message. The Accept header field informs the server which content types are acceptable to the client. In this application, the server should return the resource regardless of its content type. Therefore, the code sets the value of the Accept header to */*. The vbCrLf is added to indicate the end of this header field.

If the method specified in cboMethod is POST there are a few more pieces to transmit. First, for a POST request the client must specify the Content-Type and Content-Length entity body header fields. The code assumes Content-Type is x-www-form-urlencoded (which is a standard method of encoding HTML form data). The value of the Content-Length header field is the length of the text in the Text1 test box plus two (one for the NULL character and one for the line feed). After these entity body header fields are sent, an extra vbCrLf is transmitted. This separates the request headers from the entity body. Finally, the contents of the Text1 textbox are sent, followed by a NULL character (chr$(0)) and vbCrLf.

If the user left the chkProtocol checkbox un-checked, the program constructs a simple request message. This consists of the GET string followed by the filename of the resource, and ending up with vbCrLf.

Once the message is completed, the Text1 textbox is cleared and the status box is updated to Waiting for reply. . .. The program is now awaiting the firing of the Receive event which occurs after the first complete line of the response message is received from the server.

The dsSocket Receive Event

The Receive event is where the incoming data is processed, presented to the user, and, possibly, saved to disk. It's also the final piece of code for the application. The code in the Receive event is very basic. As Listing 5.9 shows, the only parameter for the event is a string called ReceiveData. This string contains the data just received from the server. Remember that the control is working in line mode so it only fires the event after a complete line is received.

The first step is to update the status box to reflect the fact that data is being received. Next, the ReceiveData is appended to the contents of Text1 for display purposes.

The HTTP/1.0 response message contains header information that precedes the actual body of the resource. When the program is saving the retrieved resource locally, only the body of the resource is stored to the local file, not the header information. If the HTTP header information were stored in an image file, the file would be invalid and unusable. The bGettingContent flag is used to signify that all the header information has been received and the program is now receiving the <Entity-Body> portion of the HTTP response message.

The code checks the state of bGettingContent and fSaveToFile. If both are true, the ReceiveData is written to the file specified by iFileHandle. The line of code that starts with "If ReceiveData = vbCrLf" is where the program decides that the incoming data is now the entity body portion of the response message. Recall that the response message format requires that the header portion and the entity body be separated with an empty line. This is received at the client as vbCrLf by itself. Once this string is received, the remainder of the response message is assumed to be the entity body. The bGettingContent is therefore set to True to indicate this state.

Finally, the value of lBytesReceived is updated to include the latest data.

Listing 5.9. The dsSocket Receive Event

Private Sub dsSocket1_Receive(ReceiveData As String)
    txtStatus = "Receiving data..."
    '   display the incoming html data
    Text1.Text = Text1.Text & ReceiveData
    If bGettingContent And fSaveToFile Then
        Put #iFileHandle, , ReceiveData
    End If
    If ReceiveData = vbCrLf And bGettingContent = False Then
        bGettingContent = True
    End If
    '   add the byte count to the total
    lBytesRcvd = lBytesRcvd + Len(ReceiveData)
End Sub

Testing the dsSocket Application

Now that the form has been set up and all of the code entered, it's time to test the application. Select Run | Start from the Visual Basic menu. The application loads and the form is displayed.

The first step is to enter a valid URL in the URL textbox. The URL should point to an HTML document for now. Leave the check box checked and the method dropdown set for GET. Click the View Resource button. The status box is updated as the retrieval proceeds. Once the document retrieval starts, the textbox begins filling with the HTML text. After the document has been fully received, the server closes the connection. This is signaled by changing the text in the status box. Figure 5.3 shows an example of how the screen will look.

Figure 5.3. A screen from the dsSocket example.

Note the text at the beginning of the textbox up to the first blank line. These lines are the response message header fields that were returned by the server.

After the document has been received, click Save Resource Locally. When the Save As dialog box appears, enter a valid filename to use when saving the resource. Click the OK button. The document appears in the textbox as before. This time, however, the program also stores the HTML text in the file you specified. To verify this, open the Windows Explorer and locate the file. View the file by whatever means you wish. If you retrieved an HTML document and have a Web browser such as Netscape Navigator or the Internet Explorer installed, you can double-click the filename to load it.

Next, change the method to HEAD. The Save Resource Locally command button should be disabled. Click View Resource. This time only the header lines are returned by the server. These should match the header lines received in our previous retrievals. Figure 5.4 illustrates this.

Figure 5.4. An example of using the HEAD method.

The POST method is a little more complicated to use. You must enter into the textbox the field names and values that the resource is expecting to receive when you perform a POST. The best way to find these out is to view the source of an HTML form that also POSTs to the resource. For example, the sample Win/CGI application presented in Chapter 6, "The Win/CGI Interface," uses three fields: Name1, Name2, and Fetch. When entering data into the textbox, the fields are separated from their values using the equal sign. The field/value pairs are separated using an ampersand character. Spaces appearing within field values should be replaced with a plus sign. The text entered would then resemble Name1=Craig&Name2=Susan&Fetch=Water. If you know of a resource that accepts POST requests, and you know which fields the resource is expecting, you can enter the string into the textbox and retrieve the resource. Typically, the resource informs you (through the document which is received by the dsSocket control) if any fields are missing or invalid.

Example Using the Internet Control Pack HTTP Client Control


If you waded through the example in the previous section, you're probably wishing there wasn't another example. However, you'll be pleasantly surprised at how little code is necessary when using the HTTP Client control. Unlike the dsSocket control, the HTTP Client control is specifically designed to retrieve resources from an HTTP server. The control takes care of making the connection to the server, creating the request message and receiving the data returned. It also parses the header information from the content information. Because your application doesn't need to generate the HTTP request method, the code contained in the dsSocket's SendReady event of the previous example is unnecessary.

To create this example, start with the form from the previous example. Copy the form files and the project file to a different directory. Open the new copy of the project file in Visual Basic. The first step is to delete the dsSocket control from the form. You can leave the code if you desire. This allows you to compare the code required for the two controls.

Next you need to add the HTTP Client control to the project. Open the Custom Controls dialog using the Tools | Custom Controls menu. Scroll through the list until you locate Microsoft HTTP Client Control. Select it using the checkbox. You can also remove the dsSocket entry as long as the control has been deleted from the form. Click OK.



If you don't find this entry in the list, the Internet Control Pack has not been properly installed. You must install the Internet Control Pack to use this example. A copy of the control pack is included on the CD-ROM accompanying this book.

The code used in this example also requires us to provide a reference to the Microsoft Internet Support Objects. To set this up, open the References dialog using the Tools | References menu. Scroll through the list until you locate Microsoft Internet Support Objects. Select this item and click OK.

Add the HTTP Client control to the form. Change its name from the default to HTTP1 (just to have a shorter name).

The HTTP Client control does not provide a way to use the message format of previous versions of the HTTP protocol. Therefore, you can delete the chkProtocol checkbox from the form. It does provide a Method property that allows you to specify the HTTP request method to be used when retrieving a resource. So leave the cboMethod dropdown listbox on the form.



Despite many attempts, up to the time of this writing I have been unable to make a POST request work using the HTTP Client control. Even when the Method property is set to prcPost, the control always sends GET in the request message. I'm assuming this is a bug in the control that will be fixed in a future release. I have left the POST entry in the cboMethod dropdown listbox in anticipation of this.

In this example you'll use a listbox to display the header fields. Add a standard listbox to the form and name it lstHeaders.

The remainder of this section provides the code for any new routines (such as the HTTP control's event procedures) and also for any routines that have changed (such as Form_KeyPress). The rest of the routines will be used unmodified.

The Form_KeyPress Procedure

The Form_KeyPress event is where the Esc key press is captured. Pressing this key causes the document retrieval to be aborted. Listing 5.10 provides the new code for this event procedure.

The code has not changed much from the previous example. Now the HTTP1.State property is compared against the constant prcConnected to determine if the control is currently connected. If it is connected, the Cancel method of the HTTP control is invoked to cancel the transfer.

Note that the HTTP control also provides a built-in timeout mechanism. This can be used to automatically cancel the transfer if the server has not responded in a timely fashion. This feature is discussed in the following section covering the btnGetFile_Click event.

Listing 5.10. Form_KeyPress for the HTTP Control Example

Private Sub Form_KeyPress(KeyAscii As Integer)
    If KeyAscii = vbKeyEscape Then
        If HTTP1.State = prcConnected Then
            HTTP1.Cancel
            txtStatus = "Connection cancelled..."
        End If
        KeyAscii = 0
    End If
End Sub

The btnGetFile Click Event

This event procedure is where the bulk of the work in this program is performed. The code is very similar to the dsSocket example's code but does not require as many lines. Listing 5.11 provides the code.

If the user clicks the Save Resource Locally command button (Index= 1), the code must obtain a filename to use when storing the resource to disk. The HTTP control can automatically send the retrieved document to a file if the DocOutput object's Filename property is set to a valid filename on the local machine. It is not necessary for the program to actually open a file and place the received data into it as was necessary when using the dsSocket control.

The code in Listing 5.11 obtains the filename from the user in the same manner as in the previous example. However, instead of opening the file for output, the code simply assigns HTTP1.DocOutput.Filename to the filename provided.

If the user clicks the View Resource command button (Index = 0), the HTTP1.DocOutput.Filename property is set to an empty string.

The HTTP control provides a URL property which accepts the URL of the resource to be retrieved. You don't need to parse the URL into host name, port number, and filename. The HTTP control handles this for you. The code simply assigns the contents of the txtURL textbox to the HTTP control's URL property.

The next two lines of code set up and enable the timeout feature of the control. The TimeOut and EnableTimer arrays provide for two predefined entries (prcConnectTimeout and prcReceiveTimeout) as well as user-defined entries (any value equal to or greater than prcUserTimeout). If one of the timers is enabled for a period equal to the value of the corresponding TimeOut property, the TimeOut event fires. This event has a parameter named Event, which specifies which of the timeouts has occurred. The prcConnectTimeout timer is disabled after a connection is made; the prcReceiveTimeout is disabled when the control receives data. The user-defined timeouts must be disabled with code to prevent the TimeOut event from firing. In this example, only the prcReceiveTimeout is used.

Next, the Method property is set to match the method chosen in the cboMethod dropdown listbox.

Finally, the GetDoc method is invoked. This starts the document transfer process. Note that the code tests for the use of the POST method but doesn't do anything differently. This is because, as mentioned in the Note above, I was unable to make the POST method work with the version of the HTTP control I had at the time of this writing.

Listing 5.11. btnGetFile_Click for the HTTP control example.

Private Sub btnGetFile_Click(index As Integer)
    Dim tmp$
    On Error Resume Next
    If index = 1 Then
        'the file name to use is after the last "/"
        tmp$ = GetFileFromURL(txtURL)
        While InStr(tmp$, "/")
            tmp$ = Mid$(tmp$, InStr(tmp$, "/") + 1)
        Wend
        dlgFileSave.filename = tmp$
        dlgFileSave.Flags = cdlOFNExplorer + cdlOFNLongNames + _
                cdlOFNNoReadOnlyReturn + cdlOFNOverwritePrompt
        dlgFileSave.Action = 2
        If Err Then
            If Err <> cdlCancel Then
                MsgBox "An error occurred opening the file: " & Error$
            End If
        Else
            HTTP1.DocOutput.filename = dlgFileSave.filename
        End If
    Else
        HTTP1.DocOutput.filename = ""
    End If
    'assign the URL
    HTTP1.URL = txtURL
    'set up the timer to catch timeouts
    HTTP1.Timeout(prcReceiveTimeout) = 120 * 1000
    HTTP1.EnableTimer(prcReceiveTimeout) = True
    'set up the request method
    HTTP1.Method = cboMethod.ListIndex + 1
    If cboMethod.Text = "POST" Then
        'some code must go here to output the data
        '
        HTTP1.GetDoc
    Else
        HTTP1.GetDoc
    End If
    txtStatus = "Connecting..."
End Sub

The HTTP1_DocOutput Event

The HTTP1_DocOutput event is fired when the control has received data. The code is presented in Listing 5.12. The DocOutput parameter passed to the event is a reference to the HTTP control's DocOutput object. It can be used in place of HTTP1.DocOutput.

Note that a property named NotificationMode determines when the DocOutput event is fired. The default setting is 0, which indicates that the event should be fired only after all data has been received by the control. Setting this property to 1 causes the event to be fired continuously while data is received from the server. This is similar to the operation of the dsSocket Receive event.

The State property of the DocOutput object specifies the current state of the document transfer. The event code uses the Select Case construct to decide which code to execute.

When State = icDocHeaders, the header fields have been received. The DocOutput.Headers collection contains all the header fields received. The collection provides a Count property as well as an Item array. The properties of the Item array are Name, which is the header field, and Value, which is the value of the field. The code in the DocOutput event iterates through the Item array and adds each header field received to the lstHeaders listbox.

The icDocBegin and icDocEnd states mark the beginning and the end of the document transfer. These are used to update the status box text with an appropriate message. The BytesTransferred property of the DocOutput object provides a count of the number of bytes received. This property is valid throughout the document retrieval property, but I've used it only to provide the final byte count when the document transfer has been completed.

The icDocData state occurs when document content data has been received. Like the previous example, the code simply appends the data received to the Text1 textbox.

Finally, the icDocError state indicates that an error occurred during the transfer.

Listing 5.12. The HTTP1_DocOutput event.

Private Sub HTTP1_DocOutput(ByVal DocOutput As DocOutput)
    Dim i%, vtData
    Select Case DocOutput.State
    Case icDocHeaders
        For i% = 1 To DocOutput.Headers.Count
            lstHeaders.AddItem DocOutput.Headers.Item(i%).Name _
               & ": " & DocOutput.Headers.Item(i%).Value
        Next
    Case icDocBegin
        Text1.Text = ""
        lstHeaders.Clear
    Case icDocEnd
        txtStatus = "Done... " & Str$(DocOutput.BytesTransferred) & _
              " bytes received"
    Case icDocData
        DocOutput.GetData vtData
        Text1.Text = Text1.Text & vtData
    Case icDocError
        MsgBox "Reply Code: " & HTTP1.ReplyCode
    End Select
End Sub

The Error, StateChanged, and Timeout Events

These three events are used to provide feedback to the user about the status of the transfer. The code for all three is contained in Listing 5.13.

Note that the prcDisconnected state is not handled. The StateChanged event fires with this state after the DocOutput's icDocEnd state occurs. If text were written to the status box now, it would overwrite what was written during the DocOutput event.

Listing 5.13. The Error, StateChanged, and Timeout events.

Private Sub HTTP1_Error(Number As Integer, Description As String,
     Scode As Long, Source As String, HelpFile As String, HelpContext As Long,
     CancelDisplay As Boolean)
    If HTTP1.ReplyCode Then
        MsgBox "Server Reply Code: " & HTTP1.ReplyCode _
               & Chr$(13) & HTTP1.ReplyString
    Else
        MsgBox "Error: " & Description
    End If
End Sub
Private Sub HTTP1_StateChanged(ByVal State As Integer)
    Select Case State
    Case prcConnecting
        txtStatus = "Connecting..."
    Case prcResolvingHost
        txtStatus = "Resolving host name..."
    Case prcHostResolved
        txtStatus = "Host name resolved..."
    Case prcConnected
        txtStatus = "Connected..."
    Case prcDisconnecting
        txtStatus = "Disconnecting..."
    Case prcDisconnected
    End Select
End Sub
Private Sub HTTP1_Timeout(ByVal event As Integer, Continue As Boolean)
    Select Case event
    Case prcConnectTimeout
    Case prcReceiveTimeout
        MsgBox "Server did not respond. Try again later."
    End Select
End Sub

Testing the HTTP Control Application

This application (with the exception of POST requests as noted above) should operate just like the dsSocket control's example. See the section titled "Testing the dsSocket Application."

Summary


This chapter covered the basics of retrieving information from the Web using custom controls. The dsSocket control is a generic TCP/IP control that can be used whenever you want to have complete control over the HTTP messages that are sent to the server. The Microsoft HTTP Client control is a specialized control that sacrifices ultimate control for ease-of-use. Both these controls have a place in your programming toolbox as you develop Web-based applications.

The information presented in this chapter is pretty basic. The first few sections were designed to serve as a reference to the dsSocket and HTTP Client controls. The last section presented basic but working examples that retrieve information from Web servers. Later chapters will build on the knowledge you gained in this chapter by creating useful Web-based information gathering tools (such as the QuoteWatcher application in Chapter 13, "QuoteWatcher: An Interactive Web Agent") and spiders.

Previous Page Page Top TOC Next Page See Page