Exploit Interactive HomeHomeSearch
Issue CoverEditorialFeaturesRegular ColumnsNews and EventsEt cetera

Audit of Links on the Exploit Interactive Web Site

In a follow-up article we give an updated audit on links on the Exploit Interactive web site.

Background

On 22 December 1999 an analysis of the Exploit Interactive web site was carried out using Microsoft's SiteServer Analyzer package. A summary of the findings was published in Exploit Interactive issue 4 [1]. How have things changed since then?

Update Of Audit

On 28 March 2000 the Microsoft's SiteServer Analyzer package was again used to analyse the Exploit Interactive web site. We were pleased to find that no significant broken links were found (a handful of links which were deliberately broken in order to demonstrate 404 error message were found).

Although no broken links were found, the analysis did report on a number of unusual error codes. On further investigation these were found to be the result of server redirects. The conclusion for the SiteServer analysis seems to be that the Exploit Interactive web site contains no significant broken links.

Xenu Analysis

In order to confirm the positive report from the SiteServer analysis a second link checking tool was used. We made use of the Xenu package [2].

Xenu reported on several links which were not available. These were not available for a variety of reasons including (a) access was forbidden, (b) resource was not found, (c) the service was temporarily overloaded or (d) the remote service was unavailable.

Xenu detected many more links than SiteServer. This was partly due to the limited configuration options provided in Xenu which means that it is not possible to suppress link-checking in named areas. This is needed for the Exploit Interactive web site, as it contains a number of reports of analyses of remote web sites which contain information on broken links.

Xenu also detected invalid internal links in the "Print Entire Issue" section. This page provides the entire content of an issue in a single page. It is intended for users who wish to print the entire contents of an issue. The page is created by a server-side include file which pulls in all article fragments, excluding navigational elements. A side-effect of this is that internal links (e.g. to references) will not work, as there will be duplicate link names.

Xenu found about a dozen legitimate broken links which were not detected by SiteServer. A number were due to invalid HTML which, unfortunately, had not been spotten when the issue was released. These errors were fixed. There were also about half a dozen links which were no longer available. These were typically links to a conference or news web site, and the resource appears to have been deleted.

Policy On Correcting Broken Links

Once we have found a broken link in a published article, we have to decide what to do. If we make a change we could be accused of retrospectively altering a published document. This may be a dangerous precedent to set. On the other hand the broken link will cause problems to users who try to follow it, and will make future link checking more difficult as the numbers of broken links grow.

We have chosen a compromise. Since the hose style for hypertext links is to provide them in the references for an article, and to display the URL as the hypertext anchor, we can remove the hypertext link, leaving the URL as plain text. This provides the full meaning provided in the original article, but avoids users wasting their time in following links which are known to be broken. As a reinforcement an icon is provided which indicates that a link is broken, as illustrated in Figure 1.

Figure 1: How Broken Links Are Treated
Figure 1: How Broken Links Are Treated

Discussion

In addition to the two packages mentioned above, the server log files were also examined briefly. It was noticed that there were a number of 404 error messages present. Further examination revealed that most of the errors were due to missing resources which were called from style sheet files. This is believed to be the result of malformed URLs. For example if somebody creates a link to this article in the form <a href="http://www.exploit-lib.org//issue5/exploit-audit/"> (i.e. with a spurious slash [/] in the URL) the page will appear to be displayed correctly, but links to images in style sheet files will fail.

Report Of Findings

Although the SiteServer analysis appears to miss some broken links, it does provide a useful summary of the extent of a web site. The summary of the findings is given in Table 1.

Table 1: Summary of Audit
  Jan 2000 Apr 2000
No. of pages 906 1,135
No. of internal links 6,842 6,861
No. of external links 3,117 3,172
No. of broken links 0 0
No. of images 234 378

Note: The number of links excludes links contained in the report of the web site.

It will be noted that the number of pages appears to have gone down. This is due to the removal of a number of incorrect links to resources which meant that in some cases resources were duplicated (e.g. http://www.exploit-lib.org/issue1/amazon/ and http://www.exploit-lib.org/issue1//amazon/ were treated as two separate resources).

Conclusions

The comparison of the two link checking packages appears to show that the Xenu link checker is more comprehensive, although SiteServer provides better control over excluding resources. However both packages fail to provide information on broken links with style sheets.

References

  1. Audit of Links on the Exploit Interactive Web Site, Brian Kelly, Exploit Interactive issue 4, January 2000
    URL: <http://www.exploit-lib.org/issue4/exploit-audit/> Link to external resource
  2. Xenu,
    URL: <http://www.xenu.com/> Link to external resource

For citation purposes:
Brian Kelly, "Links To Telematics For Library Web Sites", Exploit Interactive, issue 5, April 2000
URL: <http://www.exploit-lib.org/issue5/exploit-audit/>


[HTML Validation] - [Accessibility check]