University web managers in British Universities no doubt expected a leisurely return to work after the Easter break. But on reading a message entitled Webtechs porn warning sent to the website-info-mgt (and web-support) Mailbase mailing lists  they found their priorities had changed.
The message sent on Wednesday 7th April announced that "The registration for the domain www.webtechs.com appears to have lapsed and been taken over by an "adult content" site. The problem is that there are pages around bearing their "validated HTML" gif linked to the former validation service http://www.webtechs.com/html-val-svc/ which is now a source of cyberporn.". To make matters worse "If you use Analog to process your server stats, the "validated HTML" gif and link may be compiled in and be appearing automatically at the end of the reports." .
This incident generated a great deal of activity within the next few days. Web managers quickly learnt the syntax of the AltaVista search engine in order to discover sites which contained links to Webtechs. Using the search term link:www.webtechs.com resulted in 25,533 hits. Clearly too many for even the most determined of web mangers to check, but a useful indication of the size of the problem.
Using the search term link:www.webtechs.com host:ac.uk enabled the search to be restricted to UK academic web sites. As shown in Figure 1 over 5,000 pages contained a link to Webtechs.
Figure 1: Using AltaVista to Discover Web Sites Containing Links to Webtechs
Further analysis indicated the following numbers of affected pages in a variety of communities.
|Community||No. of affected pages|
|All web pages||25,533|
|UK Academic community (.ac.uk domain)||5,026|
|Other UK web sites (.uk excluding .ac.uk domain)||230|
|US Academic community (.edu domain)||7,610|
|Non profit making organisations (.org domain)||1,065|
|Government organisations (.gov domain)||185|
|Network organisations (.net domain)||2,035|
|Military organisations (.mil domain)||30|
A list of the numbers of affected pages in several EU countries is given in Table 2.
|Community||No. of affected pages|
|UK web sites (.uk domain)||5,256|
|Irish web sites (.ie domain)||34|
|French web sites (.fr domain)||196|
|Belgian web sites (.be domain)||66|
|German web sites (.de domain)||575|
|Italian web sites (.it domain)||173|
|Spanish web sites (.es domain)||158|
|Portugese web sites (.pt domain)||11|
|Swedish web sites (.se domain)||224|
|Finnish web sites (.fi domain)||230|
|Norwegian web sites (.no domain)||54|
Please note that the figures given in Tables 1 and 2 may not be completely accurate due to limitations in AltaVista's searching capabilities (e.g. the domain www.it.kth.se is included in Italy's total due to the precence of .it in the domain name), the date on which the AltaVista robot last trawled, etc.
The Role of Analog
Analysis of these hits indicated that large numbers of them contained "Web Server Statistics for ..." in their title. These pages contained statistical summaries of web site traffic generated by the Analog software, as illustrated in Figure 2.
Figure 2: Output From Analog Contains Icon Pointing to Webtechs
The Analog software was developed at the University of Cambridge. It is a widely used and freely available package for analysing web server log files. Stephen Turner, the author of Analog, has documented a solution to the problem. He described  how this problem occurred in old versions of Analog and that an upgrade to the software would fix the problem.
Stephen also described the background to the incident: "According to Webtechs, they have not intentionally given up control of their domain, but Internic (or Network Solutions, or whatever we have to call them now) lost their registration even though it was properly paid up, and "Virtual Domain Buyers" took it over. Webtechs are trying to get it back. So maybe it will all become right again soon."
Implications For Projects
The figures given in Tables 1 and 2 probably exaggerate the numbers of pages which contain inadvertent links to pornography. A great many of the pages have, no doubt, been created automatically by the Analog software. These pages are unlikely to be visited frequently, and visitors are unlikely to click on an icon to be found at the bottom of such pages.
However the Webtechs incident has some worrying implications, especially for projects which have their own domain name. As described by Kelly and Peacock  a number of EU Telematics for Libraries projects have their own domain name, such as MALVINE (at http://www.malvine.org/). Within the EU's Telematics Application Programme the DESIRE project (at http://www.desire.org/) also has its own domain name.
Unless selling a popular domain name to a porn company is used as an exit strategy (!) there is a clear need for projects to be aware of the dangers of reuse of their domain name once the project has completed and no further funding is available to pay for the domain name. Similarly funding bodies, such as the European Union, should be alert to these dangers - especially as once a project has finished, there will normally be nobody left to deal with any such incidents.
Implications For Web Managers
What steps should managers of web services take to ensure that they are not inadvertently pointing to porn sites? Clearly search engines such as AltaVista can be used to check if a site contains such pointers - although, of course, this is not an infallible solution and a search across web file store may be a better solution.
If links to the Webtechs web site are found, what should the web manager do? It could be argued that, unless the page is directly managed by the web manager, it would not be proper - and perhaps even illegal, to tamper with someone else's web pages. On the other hand removing such links may be desirable - if not for legal reasons then to save embarrassment. Can you imagine giving a training course and clicking on the icon to demonstrate a HTML validation service, and then going to a porn site?
Implications For Mirroring Services
The Webtech HTML validation service was very popular in the mid 1990s, when HTML authoring first became popular. Shortly afterwards the HENSA service (described elsewhere in Exploit Interactive ) provided a UK mirror of the service. In November 1998 following discussion on the web-support Mailbase list, Dave Beckett announced that the Webtech HTML validation service had been replaced by the W3C HTML validation service . Fortunately the popularity of the service made HENSA aware that the Webtech service was very dated. Failure of Webtech's to respond to requests for an update prompted HENSA to replace their service with W3C's validation service. This was, perhaps, fortunate for HENSA as an automated process for mirroring the validation service could have resulted in HENSA hosting a porn site!
Although HENSA are probably sufficiently experienced in mirroring services not to be caught in this way, it does illustrate some potential problems for sites mirroring web services.
No doubt the Irish Catholic women's ordination campaign  and the First Baptist Church of Sausalito  would be embarrassed by the links they are now hosting. Sadly porn companies now appear to be actively purchasing expired domain names. Tony Grimes, the Internet Marketing Executive for Macmillan Publishers has had similar experiences:
"I work on the Macmillan Reference website which used to own the groveartmusic.com url. Unfortunately, our registration [of] this url lapsed when we stopped using it last year. Since then it has been purchased by a Dutch company for pornographic purposes. Obviously this means that there are many websites containing this link who believe that they are actually linking to Grove Dictionaries of Music."
A number of solutions to the problem of bona fide web sites transforming into pornographic sites have been discussed. The Computer Science department at the University of Kent at Canterbury were in a position to quickly regenerate their list of publications since the pages were generated from a database. Jon Knight, Loughborough University, has suggested that information gateways could store an MD5 checksum of catalogued resources and provide warnings if the page changes . Dan Brickley, University of Bristol, has proposed use of "a smarter link checker which periodically consults a PICS metadata label bureau and asks it for a description of each site, e.g. using a pornography-filtering ratings vocabulary like RSACi." .
System administrators may also provide solutions. David Hastings, a Systems Administrator at the Oxford University Computing Services (OUCS) regards the Webtechs incident as annoying rather than serious. OUCS have blocked access to the Webtechs site's numeric address in their cache configuration file, and sent out a warning message to webmasters in Oxford University. However, as David pointed out, users outside Oxford University will not be affected by their cache filter, and will still be able to follow links to the porn site from a page at Oxford University. As this "won't do anything to enhance the reputation of the University!" they are in the process of removing links to webtech.com from Oxford web servers. David made two additional suggestions:
- There ought to be a mechanism within the UK academic community for notifying webmasters of dubious sites.
- IP addresses shouldn't be recycled so quickly. Phone numbers are held for 2 years or so before they are re-used, so why not something similar for IP addresses?
In the longer term we may see more elegant protocol solutions. For example external link databases could be used for managing hypertext links (such as the proposed XLink protocol ) or digitally signed web sites .
Although there may be solutions provided in the future, web managers are still left with dealing with problems today. This article concludes with thoughts from Rebecca Linford, the University Web Administrator at the University of Dundee: