Previous Page TOC Next Page See Page



— 15 —
LinkChecker: A Spider that Checks for Broken Links


If I had a dime for each 404 Not Found, This page has moved to http://www.new-and-better.com, or simply, This URL does not have a DNS entry message, I'd be sipping rum on a beach in St. Thomas instead of writing this book.

One of the problems people have with the Web is the so-called invalid link. This crops up in search engines where the links are out of date, in user home pages that list a thousand "cool" sites (and the maintainer never checks them more than once), or other sources of external links.

When you add a link to a page maintained by someone else, that person has no idea that you've added a link. Therefore, when the other site moves, its author/administrator can't be expected to notify everyone with such links to inform them of the page's new location.

And because it's not uncommon for a site to change servers, the invalid link message is very prevalent, and for the foreseeable future, it's here to stay.

This chapter presents a Windows client tool that verifies links of a specified URL. Given a URL, this program verifies that all the hyperlinked URLs actually exist on the Web.

So, first let's look at the different types of invalid links that exist.

Bad Links


So far, I've been generically calling "invalid links" all links that don't take us where we want to go. This is because for each time I typed "invalid links," I didn't want to go through this list. The following are the types of invalid links and what could causes them:

  1. The Unable to locate the Server error—This most often happens when a host name is mistyped in the browser URL field. Figure 15.1 shows what happens when the browser goes to resolve the DNS name and no entry is found for the host name.


  2. Figure 15.1. Unable to Locate Server error in Netscape.

  3. File Not Found—This error occurs when the host name is OK, but the filename or directory contained in the URL doesn't exist. The server replies with a small page that looks like Figure 15.2.


  4. Figure 15.2. File Not Found error.

  5. This link has moved error—This error is perhaps the most difficult to diagnose. The requested host and file both exist, but it serves as a pointer to a new location. The syntax involved in the HTML code for a This link has changed situation is not standard—usually just a brief explanation and a hypertext link to the new page.


  6. Figure 15.3. This link has moved.

With these conditions in mind, you can write a simple Visual Basic application that takes a root URL and checks the links on the page. You can also make sure that the referenced images are present.

Designing the Link Checker Application


The usual flow of the application would look like this: The user enters a URL to validate. The app loads that URL into a Sax Webster control (first described in Chapter 4, "Using Web Browser Custom Controls") and parses all of its anchors using the control's GetLinkCount and GetLinkURL methods as described in Chapter 14, "WebSearcher: A Simple Search Tool." Then the Microsoft Internet Control Pack's HTTP client control is used to retrieve the HTTP header information for the anchors (see Chapter 2, "HTTP: How To Speak on the Web," for information on HTTP messages, and Chapter 5, "Retrieving Information From the Web," for information on the HTTP client control).

If you don't have access to the Sax Webster control, there's a short section at the end of this chapter describing how to accomplish the link checking using the Microsoft Internet Control Pack's HTML control (also introduced in Chapter 4). Using the Microsoft control requires a good deal more code so this chapter only describes how to modify the Webster-based code to work with the Microsoft control.

Each response received by the HTTP client control is then checked for the previously listed error conditions, and if present, the URL should be marked as not valid. If a valid HTTP header information response is received for the link, the link is marked as valid. Also, if the Web server associated with the link provides the Last-Modified HTTP response header field, the field's value is displayed in the grid.



The Last-Modified HTTP response message header field specifies the date and time the resource represented by the requested URL was last updated. However, not all HTTP servers provide this field when returning information to HTTP client applications.

You'll also want the user to be able to specify whether to check only links local to that site or all links referenced within the page. You add this functionality via a frame and two option buttons.

The form has a tab control (see Figures 15.4 and 15.5) that allows the user to select either Link View or Web View: the Link View is where the grid is placed, the Web View is where the Webster control is placed. There is also a checkbox on the Web View to enable and disable loading of embedded images by the Webster control. By turning the load images off, the page to be verified will load faster.

Designing the User Interface


The final result of this section is shown in Figure 15.4, which shows the Link View tab, and Figure 15.5, which shows the Web View tab. Most of the controls use the default properties, but a few have their properties modified to meet the needs of this application.

Figure 15.4. Design time view of the Link View tab.

Figure 15.5. Design time view of the Web View tab.

http://www.microsoft.com/icp

Start a new project in Visual Basic. View the currently available custom controls by selecting the Tools | Custom Controls menu item. This project requires the following controls be included in the list:

All controls except the Webster control and the HTTP client control ship with the Visual Basic Professional Edition. After you add the controls, you must also add a reference to the Microsoft Internet Support Objects. Use the Tools | References menu and select this item in the list. If it is not in the list box, click the Browse button and locate the file NMOCOD.DLL. If you can't find this file on your system, you probably need to re-install the Microsoft Internet Control Pack.

Once the proper controls have been added to the project, you can begin to populate the form. To create the controls directly on the form, follow these steps:

  1. Add a label to the top left corner and give it a Caption of Web Link Verifier. Set its FontSize to 13.5 and its BackStyle to 0 (transparent).

  2. Add the label for the URL text box near the top center of the form. Set its Caption to URL to Verify and its BackStyle to 0.

  3. Add a text box below this label. Set its Name to txtURL.

  4. Add the HTTP client control. It's not visible at run time so it can be placed anywhere. In Figure 15.4 and 15.5 it's in the top left corner. Set its Method property to 2 (HEAD method). The default control name, HTTP1, is used in the code. If this is not the name provided as the default, change the control's Name property to HTTP1.

  5. Add a StatusBar control. Set Align to 2 (align bottom) and Style to 1 (single pane simple text). The control's name should default to StatusBar1. If it doesn't, change it so that is matches the code.

  6. Add an SSTab control above the status bar control. Size it similar to what's shown in Figure 15.4. Set its Tabs and TabsPerRow properties to 2. Select Custom on the VB Properties window and click the button with the ellipses to access the tab control's custom properties page (shown in Figure 15.6). On the General tab, enter Link View as the TabCaption for tab 0. Click the button with the ">" symbol to change the current tab to 1. Enter Web View as the caption and click the OK button. Again, the default name should be SSTab1. If it's not, change the control's Name property to SSTab1.

Figure 15.6. The SSTab control's custom properties page.

Now that the form's shell has been created, it's time to add controls to the tabs. Bring the tab back to the Link View by clicking on its tab caption. Refer to Figure 15.4 for control placement. Follow these steps to add the controls for this tab:

  1. Add the Verify command button. Set its Caption to &Verify. Set its name to cmdVerify. Because you don't want the user to be able to start a verification without a URL entered in txtURL, set the button's Enabled property to False.

  2. Add the Reset button. Set its Set its Caption to &Reset. Set its name to cmdMain.

  3. Add the Exit button by copying and pasting the Reset button. When asked by Visual Basic if you wish to create a control array, answer Yes. Set the new button's Caption to &Exit.

  4. Add the Microsoft grid control. Set its Cols property to 3. Leave the rest of its properties set at their default values. The Name property should be Grid1.

  5. Add the frame that appears above the command buttons. Set its Caption to Links To Verify. The default name, Frame1, should be used.

  6. Add an option button to the frame. Set its Name to optLocal and its Caption to &Local Only.

  7. Copy and paste the option button to the frame, below the first. Answer Yes when asked to create a control array. Set the new option button's Caption to &All Links.

You're almost there. Click the Web View tab caption to move to the other tab. To add the controls, refer to Figure 15.5 and follow these steps:

  1. Add a CheckBox control. Assign chkImages to its Name property. Set Value to 1 and its Caption to Load &Images.

  2. Add a Webster control and size it to fill most of the tab. Change the PagesToCache property to 1 and the HomePage property to an empty string. The Name should default to Webster1 but if not, change the Name property to Webster1.

Now that the controls are in place, it's time to start entering some code. The next section covers all the code necessary to make the Link Verifier work.

Coding the Application


The task of this application is to retrieve the user-specified URL, gather all of the anchors out of that page, add those anchors to the grid, and then for each URL in the grid, attempt to retrieve the HTTP header information. Finally, the app marks the URL verified or not verified accordingly.

A lot of the code here is also used in Chapter 14, "WebSearcher: A Simple Search Tool." The link checker, as you will see, is a customized version of the Web search tool. There, instead of looking for the invalid link conditions, you look for a user-specified keyword.

The Declarations Section


The Declarations section contains the following code:

Option Explicit
Dim Conn_Done As Integer
Dim Grid_Pos as Integer

The Conn_Done variable is a flag used by the HTTP control to signal the end of an HTTP request. The Grid_Pos variable stores the row in the grid where the next URL checked will be inserted.

The AddAnchor Subroutine


The AddAnchor subroutine is used to add URLs to the grid control. The routine goes through the grid row by row making sure that the URL to be added doesn't already exist. If the URL doesn't exist in the grid the routine adds it to the grid. The code is shown in Listing 15.1.

Listing 15.1. AddAnchor subroutine.

Sub AddAnchor(sNewAnchor As String)
Dim X As Integer
For X = 1 To Grid_Pos
    Grid1.Row = X
    Grid1.Col = 0
    If Grid1.Text = sNewAnchor Then
        Exit Sub
    End If
Next X
Grid1.AddItem sNewAnchor
Grid_Pos = Grid_Pos + 1
End Sub

The GetHostFromURL Function


This routine was first introduced in Chapter 5. It is taken from the dsWeb sample that ships with the Dolphin Systems dsSocket control discussed in that chapter. The function is used to parse the host name from a URL. The function depends on the URL being valid. If the URL is invalid, it returns an empty string.

The GetHostFromURL() (Listing 15.2) retrieves the host name from the URL. The host name is the portion of the URL that occurs between the "//" and the first "/" characters. If the "//" is not present, GetHostFromURL() considers the URL to be invalid and returns an empty string.

Listing 15.2. GetHostFromURL() Function

Private Function GetHostFromURL(szURL As String) As String
    '   parse out the hostname from a valid URL
    '   the URL should be of the format: http://www.microsoft.com/index.html
    '   the returned hostname would then be: www.microsoft.com
    Dim szHost      As String
    Dim lPos%
    szHost = szURL
    '   invalid URL
    If InStr(szHost, "//") = 0 Then
        GetHostFromURL = ""
        Exit Function
    End If
    szHost = Mid(szHost, InStr(szHost, "//") + 2)
    lPos% = InStr(szHost, "/")
    If lPos% = 0 Then
        GetHostFromURL = szHost
        Exit Function
    Else
        GetHostFromURL = Left(szHost, lPos% - 1)
        Exit Function
    End If
End Function

The Form_Load Event


The Form_Load event is where you take care of all the startup activity for the application. The code is provided in Listing 15.3.

The grid format is set up, including column widths and captions. Then the local only option button is selected as the default. Finally, the Reset command button's Click event is fired.

Listing 15.3. Form_Load event code.

Private Sub Form_Load()
       
    'Set up grid headers
    Grid1.Row = 0
    Grid1.Col = 0
    Grid1.Text = "URL"
    Grid1.ColWidth(0) = 5000
    Grid1.Col = 1
    Grid1.Text = "Verified?"
    Grid1.ColWidth(1) = 900
    Grid1.Col = 2
    Grid1.Text = "Updated"
    Grid1.ColWidth(2) = 1200
    
    optLocal(0).Value = True
    
    Call cmdMain_Click(0)
    
End Sub

User Interface Support Subroutines


There are two routines that provide some simple user interface functionality. The first, GridClear clears the grid in preparation for a new verification. The second, ToggleControls enables and disables some of the controls on the form based on the pState% flag that is passed as a parameter. The code for both routines is in Listing 15.4.

Listing 15.4. GridClear and ToggleControls Subroutines.

Public Sub GridClear()
    Dim X
    
    For X = 1 To Grid_Pos - 1
        Grid1.RemoveItem 1
    Next
    
    Grid1.Row = 1
    For X = 0 To 2
        Grid1.Col = X
        Grid1.Text = ""
    Next    
    Grid_Pos = 1
End Sub
Public Sub ToggleControls(pState%)
    Frame1.Enabled = pState%
    cmdVerify.Enabled = pState%
    Me.MousePointer = IIf(pState%, vbDefault, vbHourglass)
    If pState% = False Then SSTab1.Tab = 0
    SSTab1.Enabled = pState%
    
End Sub

The code behind the text box and the load images checkbox is equally straight forward. The code for these two controls is provided in Listing 15.5.

When the text entered into the txtURL text box changes, the code application sets the Enabled property of the cmdVerify command button to True if there are any characters in the text box. It is set to False otherwise.

The chkImages checkbox merely changes the value of the Webster control's LoadImages property based on whether or not the check box is checked.

Listing 15.5. Code for txtURL and chkImages.

Private Sub txtUrl_Change()
    cmdVerify.Enabled = (Len(Trim$(txtUrl)) > 0)
    
End Sub
Private Sub chkImages_Click()
    Webster1.LoadImages = chkImages.Value
    
End Sub

The cmdMain Command Buttons


For ease of explanation, I put the Exit and Reset buttons in one control array, and left the Verify button on its own. The code for the Click event of the Reset/Exit button control array is found in Listing 15.6.

There's nothing too special about this code—all you want to do is allow the user to clear the results by pressing the Reset button. This clears the form-level variables, as well as the URL textbox and the results grid. It also cancels the Webster control's page load, if one is in progress.

The Exit button simply unloads the form, causing the application to end.

Listing 15.6. The cmdMain_Click event code.

Private Sub cmdMain_Click(Index As Integer)
    Select Case Index
        Case 0      'Reset
            txtUrl.Text = ""
            Webster1.Cancel            
            GridClear            
            StatusBar1.SimpleText = "Ready..."
            
        Case 1      'Exit
            Unload Me
            
    End Select
    
End Sub

The cmdVerify Code


When the user enters a URL into txtURL and clicks on the Verify button, it's time for the real action to begin. The cmdVerify_Click event is where the action gets kicked off, as you'll see.

The code for the event is given in Listing 15.7. The first few lines of code clear the grid and disable some of the buttons and the tab control. Then the host name of the machine on which the URL entered resides is extracted from the URL by calling GetHostFromURL(). Next, the status bar caption is updated to reflect the page being loaded.

The Webster control's LoadPage method is used to load the URL to be verified. The Visual Basic Choose() function is used as the switch for a DoEvents loop. This function was discussed in detail in Chapter 14, but basically the value returned is the value from the list provided that corresponds to the integer value of the first parameter. In this case, the value returned will be based on the current value of the Webster control's LoadStatus property each time through the loop. The loop continues until either the URL is completely loaded or an error occurs.

After the loop finishes, the URL is entered into the grid. If an error occurred while loading the URL (LoadStatus >= 5), the Verified? column in the grid is set to No and the routine exits.

If the URL was loaded successfully, the Verified? column in the grid is set to Yes and the routine proceeds to extract all the links from the page. This is accomplished using the Webster control's GetLinkCount and GetLinkURL methods. These methods allow you to iterate through a list of all the links found on the loaded Web page. The code checks to make sure a link is an HTTP link (as opposed to a mailto: or news: link, for example, which aren't accessed using the HTTP protocol and therefore can't be verified by this application). If it is an HTTP link and the user has selected the Local Links only option button (optLocal(0).Value = True), the code further checks to make sure the link is to a URL on the same host as the URL being verified. If it is, the link is added to the grid using AddAnchor. If optLocal(0).Value = False, then the URL is automatically added to the grid.

After all the links on the page being verified are added to the grid, the HTTP client control is used to retrieve the header information for each of the links in the grid. The original URL is also checked again, but this time to retrieve the HHTP header fields for the URL since the Webster control doesn't provide properties for most of them (it does provide properties for the Content-Type and Content-Size headers).

The code loops through each item in the grid, using the variable Grid_Pos as the count of the number of rows in the grid. The Conn_Done variable is used as a flag to indicate that the current header information request has completed. In cmdVerify_Click the flag is set to 0. The flag is set to 1 within the HTTP1_DocOutput event discussed in the next section. The URL is extracted from the grid, the status bar is updated, the URL is assigned to the HTTP control's URL property, and finally the HTTP control's GetDoc method is invoked to retrieve the header information (recall that the HTTP control's Method property is set to 2 (HEAD method) at design time). Another DoEvents loop waits until Conn_Done is set before continuing to the next URL in the grid.

After all the URLs have been processed, the command buttons and tab are enabled once again, allowing the user to enter another URL to verify or to use Webster control to view the URL entered in txtURL.

Listing 15.7. The cmdVerify_Click event code.

Private Sub cmdVerify_Click()
       
    Dim lHostName$, i%, URL$, X
    
    GridClear
    ToggleControls False
    
    lHostName$ = GetHostFromURL(txtUrl.Text)
    
    StatusBar1.SimpleText = "Loading " & txtUrl.Text
    Webster1.LoadPage txtUrl.Text, False
    
    'wait till the page is loaded
    While Choose(Webster1.LoadStatus + 1, 0, 1, 1, 1, 1, 0, 0)
        DoEvents
    Wend
            
    Grid1.Row = 1
    Grid1.Col = 0
    Grid1.Text = txtUrl.Text
    'if an error occurred loading the page,
    ' add it to the grid and exit
    If (Webster1.LoadStatus >= 5) Then
        Grid1.Col = 1
        Grid1.Text = "No"
        ToggleControls True
        Exit Sub
    End If
    
    'add this link to the grid as verified:
    Grid1.Col = 1
    Grid1.Text = "Yes"
    
    'now get all of the links on the page:
    For i% = 0 To Webster1.GetLinkCount("") - 1
        URL$ = Webster1.GetLinkURL("", i%)
        'is it an HTTP link?
        If UCase$(Left$(URL$, 4)) = "HTTP" Then
            'are we verifying only local links?
            If optLocal(0).Value = True Then
                If InStr(UCase$(URL$), "HTTP://" & UCase$(lHostName$)) Then
                    AddAnchor URL$
                End If
            Else
                AddAnchor URL$
            End If
        End If
    Next
For X = 1 To Grid_Pos
        
    Conn_Done = 0
    Grid1.Row = X
    Grid1.Col = 0
       
    StatusBar1.SimpleText = "Loading " & Grid1.Text
    
    HTTP1.URL = Grid1.Text
    HTTP1.GetDoc
    
    While Conn_Done = 0
        DoEvents
    Wend
    
Next X
ToggleControls True
End Sub

The HTTP Client Control's Event Code


The responses to the HEAD request messages generated in cmdVerify (described in the previous section) are handled by the HTTP control's DocOutput and Error events. The code for these events is given in Listing 15.8.

The DocOutput event (described in detail in Chapter 5, "Retrieving Information From the Web"), is fired whenever the HTTP control receives data from the HTTP server it's connected to. This data can be in the form of HTTP header fields such as Content-Type or Server (these are discussed in Chapter 2, "HTTP: How To Speak On The Web") or content data (such as the HTML markup code or an image file). The event is also fired at the start and end of a received message and in the event of an error. The event provides a parameter named DocOutput which is an object containing all the information about the received message. The object's State property indicates the reason that the DocOutput event was fired and is used in a Select Case construct to determine what course of action to take.

The available states are

icDocHeaders

HTTP header fields have been received

icDocBegin

Retrieval started

icDocEnd

Retrieval ended

icDocData

Content data is being received

icDocError

An error has occurred


Because you're not interested in knowing when the retrieval starts or what the content data looks like (there shouldn't be any content data returned because the request message used the HEAD method), these two states have no code associated with them in Listing 15.8.

The icDocHeaders state is entered whenever all the HTTP response message header fields have been received. The DocOutput object provides a collection aptly named Headers, which contains all the header fields received. The Headers collection has a Count property and an Items collection. There is one entry in the Items collection for each header field received. Each item has a Name and a Value property. You're going to be displaying only the Last-Modified header, so the code loops through all the available header fields (there won't be more than a few). If the Last-Modified header is found, its value is placed in the Updated column of the grid.

The icDocEnd state is entered when the connection with the Web server terminates. If the Conn_Done flag was not previously set by an error condition, the code marks the current URL as verified and sets the Conn_Done flag to signal the end of the verification process for the current URL. Note that even if an error such as URL not located occurs, the icDocEnd state is still entered

The icDocError state is entered whenever an HTTP server returns an error code. The application places the HTTP control's ReplyCode property (the error code received from the HTTP server) into the status bar, marks the current URL as not verified, and marks the end of the verification for this URL by setting Conn_Done to 1.

The Error event is fired whenever an error occurs that causes the HTTP request/response messages to be invalid. This event is handled the same way as the icDocError state discussed in the previous paragraph.

Listing 15.8. The HTTP control's event code.

Private Sub HTTP1_DocOutput(ByVal DocOutput As DocOutput)
    Dim i%
    Select Case DocOutput.State
    Case icDocHeaders
        With DocOutput.Headers
            For i% = 1 To .Count
                If .Item(i%).Name = "Last-Modified" Then
                    Grid1.Col = 2
                    Grid1.Text = .Item(i%).Value
                End If
            Next
        End With
        
    Case icDocBegin
        
    Case icDocEnd
        'if the done flag is already set, exit:
        If Conn_Done Then Exit Sub
        
        StatusBar1.SimpleText = "Done... "
        Grid1.Col = 1
        Grid1.Text = "Yes"
        Conn_Done = 1
    
    Case icDocData
    
    Case icDocError
        'if the URL doesn't exit, we'll get an error...
        StatusBar1.SimpleText = "Reply Code: " & HTTP1.ReplyCode
        Grid1.Col = 1
        Grid1.Text = "No"
        Conn_Done = 1
        
    End Select
End Sub
Private Sub HTTP1_Error(Number As Integer, Description As String, Scode As Long, Source As String, HelpFile As String, HelpContext As Long, CancelDisplay As Boolean)
    Conn_Done = 1
    Grid1.Col = 1
    Grid1.Text = "No"
    
End Sub

Testing The Application


Now that all the code is entered, it's time to test the application. Either connect to the Internet or start a local Web server then run the application. Enter a URL in the txtURL text box and click the Verify button. You should see the status bar indicate the page being loaded, then the grid fills with all the local links on the page you specified. Finally, all of those links are checked and the status bar is updated as each link is checked.

Figure 15.7 shows the application after it was run against a local Web server using the default Web page (note that no file name is specified in the URL text box, only the server name). I selected All Links in the Links To Verify frame in order to show the two external links as not verified (the machine was not connected to the Internet at the time the verification was performed).

Figure 15.7. Verifying a local server.

Figure 15.8 shows the application after it was run against a server that provides the Last-Modified HTTP header field. I re-sized the columns at runtime in order to display all three columns onscreen.

Figure 15.8. Verifying on a server that provides Last-Modified.

Using the Microsoft HTML Control


If you don't wish to use the Sax Webster control, or if you're looking for a programming challenge to wind up this book, rewrite the application using the Microsoft Internet Control Pack's HTML client control.



If you have the Webster control and are modifying the project created above, you must remove the Webster control from the new project. For some reason, if both controls are in the project, the Microsoft HTML control is unable to connect to an HTTP server.

The Microsoft control lacks the GetLinkCount() and GetLinkURL() methods provided by the Webster control but makes up for this by providing an event named DoNewElement. If the control's ElemNotification property is set to True, this event is fired for each new HTML element parsed as the page to be verified is loaded. You can check the event's ElemType parameter to determine if the element is a link anchor (in which case ElemType will be A) and if it is, use a modified AddAnchor procedure to add the link to the grid control. AddAnchor must be modified because the HTML control does not resolve relative URLs to the absolute URLs that are necessary for the HTTP control. I'll leave these modifications to you as a code challenge.

Sample code for the DoNewElement event is provided in Listing 15.9. Chapter 4 and the Internet Control Pack's help file describe this event and its parameters in more detail.

Listing 15.9. Sample DoNewElement event code.

Private Sub HTML1_DoNewElement(ByVal ElemType As String, _
        ByVal EndTag As Boolean, ByVal Attrs As HTMLAttrs, _
        ByVal Text As String, EnableDefault As Boolean)
    Dim i%
    'is this a link anchor?
    If UCase$(ElemType) = "A" Then
        'yes, find the HREF element:
        For i% = 1 To Attrs.Count
            If UCase$(Attrs.Item(i%).Name) = "HREF" Then 
                AddAnchor Attrs.Item(i%).Value
            End If
        Next
    End If
                
End Sub


The UCase$() functions are used in the code above because the HTML control does not modify the case of the HTML tags as they are read from the HTML file. If an element's tag was placed in the file as lower case, the ElemType parameter is lower case as well.

You will also have to modify the code for cmdVerify_Click to use the HTML control to load the initial page. You should use the Conn_Done flag to signal the end of the page load by placing Conn_Done = 1 in the HTML control's EndRetrieval event and Conn_Done = -1 in the control's Error and Timeout events. Sample code for cmdVerify_Click is provided in Listing 15.10.

Listing 15.9. Sample DoNewElement event code.

Private Sub cmdVerify_Click()
       
    Dim lHostName$, i%, URL$, X
    
    GridClear
    ToggleControls False
    
    lHostName$ = GetHostFromURL(txtUrl.Text)
    
    StatusBar1.SimpleText = "Loading " & txtUrl.Text
    Conn_Done = 0
    HTML1.ElemNotification = True
    HTML1.RequestDoc txtURL.Text
    
    'wait till the page is loaded
    While Conn_Done = 0
        DoEvents
    Wend
            
    Grid1.Row = 1
    Grid1.Col = 0
    Grid1.Text = txtUrl.Text
    'if an error occurred loading the page,
    ' add it to the grid and exit
    If (Conn_Done = -1) Then
        Grid1.Col = 1
        Grid1.Text = "No"
        ToggleControls True
        Exit Sub
    End If
    
    'add this link to the grid as verified:
    Grid1.Col = 1
    Grid1.Text = "Yes"
    
    'now check all the links in the grid
    For X = 1 To Grid_Pos
            
        Conn_Done = 0
        Grid1.Row = X
        Grid1.Col = 0
           
        StatusBar1.SimpleText = "Loading " & Grid1.Text
        
        HTTP1.URL = Grid1.Text
        HTTP1.GetDoc
        
        While Conn_Done = 0
            DoEvents
        Wend
        
    Next X
ToggleControls True
End Sub

The last major change you'll have to make is to correct the code for the chkImages_Click event. The HTML control uses a property named DeferRetrieval to indicate whether embedded documents should be loaded by the control. Change the line of code for this event to read

    HTML1.DeferRetrieval = (chkImages.Value = 0)

You'll also have to modify other code that references Webster1 to reference HTML1 (or whatever name you give the HTML control). Note that the HTML control does support the Cancel method so in the cmdMain_Click event code you simply replace Webster1.Cancel with HTML1.Cancel.

The code for the HTTP control's events can be left in tact as long as the link URL resolution is handled in AddAnchor as described above.

Summary


Being the last chapter in the book, this chapter was designed to incorporate information from several of the previous chapters. If you hadn't done so already, hopefully this chapter prompted you to read some of the earlier chapters. Probably the most important chapter to help you grasp this chapter is Chapter 2, "HTTP: How To Speak On The Web," which discusses the HTTP protocol and HTTP header fields in detail.

The book concludes with several appendixes that discuss how to create HTML files (Appendix A, "HTML Reference"), Microsoft's new VB Script programming language (Appendix B, "Visual Basic Script Reference"), and programming Win/CGI application for the Microsoft Internet Information Server (Appendix C, "Win/CGI on the Microsoft Internet Information Server"). The final appendix, Appendix D, "Bibliography and Cool Web Sites," provides a good listing of resources both on and off the 'Net to assist you in your Web programming endeavors.

Previous Page Page Top TOC Next Page See Page