Exploit Interactive HomeHomeSearch
Issue CoverEditorialFeaturesRegular ColumnsNews and EventsEt cetera

Analysis of NFP Web Sites

Brian Kelly reviews NFP web sites using a variety of automated tools, and makes some recommendations on a number aspects related to the design of web site architecture.

National Focal Points

National Focal Points (NFPs) have been established in all European Commission Member States and in other European countries to promote the Telematics For Libraries programme and to assist proposers requiring sector-specific advice and information.

The different countries have taken a variety of approaches to the services provided by NFPs. The aim of this article is to review the different technical approaches taken by NFPs in the provision of their web sites. The issues which emerge may be of use in developments under the Fifth Framework.

In this Web Technologies column a report of a number of mainly automated analyses of the web sites used by National Focal Points is given. Note that no attempt has been made to analyse the content of the web sites. The analyses were carried out on 4-5th October 1999.

NFP Web Sites

The analysis of NFP web sites makes use of the central list of National Focal Points maintained by the Commission [1].

Of the eighteen countries listed eleven provide an NFP web site. Details of the web sites addresses is given below.

Table 1: URLs for NFP Web Sites
Country NFP Web Site
Austria http://www.bmwv.gv.at/4fte/3nfp.htm
Belgium http://www.belspo.be/euro/nfp.htm
Finland http://renki.helsinki.fi/eu/
France http://dges.mesr.fr/bib/info/europe/PlaquetteCfppa.htm
Germany http://www.dbi-berlin.de/bib_wes/dbi_euro/eurohome.htm
Ireland http://ireland.iol.ie/~libcounc/
Norway http://info.rbt.no/eu/
Spain http://www.bne.es/punto.htm
Sweden http://www.kb.se/bibsam/eubibpro/euhemsid.htm
Switzerland http://www.snl.ch/f/fuehr/z_pointf.htm
United Kingdom http://www.mailbase.ac.uk/lists/lis-european-programmes

The UK web site provides access to an archive of postings to a mailing list. As it is different in character to the other NFP web sites it is not analysed further in this article.

URL Naming

As can be seen from Table 1 a variety of URL naming schemes are in use. No fewer than six of the web sites include a file name (all of which end in .htm). One web site uses the tilde (~) convention. Two sites (Finland and Norway) use short URLs with a clearly defined directory used to manage the contents.

A review of URLs for Telematics for Library web sites [2] provided advice on URL naming conventions, which are repeated below:

These simple guidelines should make NFP web sites more accessible. Of course these guidelines may conflict with local policies for hosting web sites, so they should be regarded as guideslines and not rules.

Server Technology

UKOLN's doc-info [3] and http-info [4] web-based document analysis services were used to analyse the entry point of the NFP web sites. A summary of the findings is given in the following table.

Table 2: Analysis of the Entry Point to NFP Web Sites
NFP Web Site Server Profile Size (bytes) Metadata Nos. of links. Other Comments
Austria Apache 1.2.4 23 images 13,421 None 26  
Belgium Microsoft-IIS/3.0 3 framesets Not known None Not known Frames
Finland Apache/1.3.6 3 images 22,634 None 29  
France Netscape-Enterprise/3.0 0 images 11,646 None 0 Single text page
Germany mod_perl/1.18 Apache/1.3.4 PHP/3.0.7 (Unix) (SuSE/Linux) 19 images 25,459 None 26  
Ireland Apache/1.3.0 (Unix) 3 images 40,274 None 71  
Norway Microsoft-IIS/4.0 2 images 15,991 None 15  
Spain Apache/1.1.0 17 images (1 background) 31,172 1 (Author) 15  
Sweden Microsoft-IIS/3.0 12 images 12,721 None 12  
Switzerland Apache/1.3.4 22 images 23,162 None 45  

Seven of the web sites are (probably) hosted on a Unix server and three on an Windows NT server.

It was interesting to observe that only one instance of use of metadata was present in the NFP web site entry points.

Analysis Of NFP Web Sites

The Microsoft SiteServer software [5] was used to analyse each NFP web site. A summary of the findings is given in the following table.

Table 3: Analysis of NFP Web Sites
NFP Web Site No. of pages No. of images No. of local links No. of offsite links
Austria 1,418 121 7,118 3,253
Belgium 34 7 73 53
Finland 23 5 3 81
France 1 1 0 0
Germany 14 10 28 18
Ireland 121 6 138 118
Norway 39 3 14 90
Spain 1,734 1,060 23,679 3,093
Sweden 88 24 79 477
Switzerland 68 21 790 182

Due to the lack of a directory to differentiate the contents of the NFP web site from other areas on the server the information for a number of the web sites listed in Table 3 is likely to be too large.

It should be noted that failure to use a directory structure to group resources related to the NFP not only makes auditing difficult. It also makes it difficult to provide automated harvesting of the resources.

Links To NFP Web Sites

The linkpopularity.com web site [6] was used to obtain details of the number of links to NFP web sites. The results are given in Table 4.

Table 4: Analysis of Links to NFP Web Site
NFP Web Site No. of Links Try it
Austria 5 (AltaVista) - 4 (Infoseek) - 0 (HotBot) Try it
Belgium 5 (AltaVista) - 6 (Infoseek) - 1 (HotBot) Try it
Finland 1 (AltaVista) - 12 (Infoseek) - 1 (HotBot) Try it
France 0 (AltaVista) - 0 (Infoseek) - 0 (HotBot) Try it
Germany 15 (AltaVista) - 10 (Infoseek) - 4 (HotBot) Try it
Ireland 24 (AltaVista) - 22 (Infoseek) - 7 (HotBot) Try it
Norway 4 (AltaVista) - 12 (Infoseek) - 1 (HotBot) Try it
Spain 3 (AltaVista) - 5 (Infoseek) - 1 (HotBot) Try it
Sweden 5 (AltaVista) - 9 (Infoseek) - 2 (HotBot) Try it
Switzerland 0 (AltaVista) - 0 (Infoseek) - 1 (HotBot) Try it

It should be noted that the information on the number of links is taken from the databases hosted by the AltaVista, Infoseek and HotBot search engines. It cannot be guaranteed that the information held on the databases is complete. In addition the linkpopularity.com web site states that the AltaVista results have been erratic recently.

Searching NFP Web Sites

None of the NFP web sites appeared to provide a search facility. Since the sites appear to be small, browsing may be adequate for exploring the web sites. However as the web sites grow, a search facility will become of increasing importance.

As described in an analysis of search engines on UK University web sites published recently in Ariadne [7] small organisations which possess limited technical expertise may find it useful to provide access to a search facility hosted remotely, such as a global search engine.

An example of this approach is shown below. This example uses the HotBot search engine. The interface is configured to search across all NFP web sites by default, although searches of individual NFP web sites can also be chosen.


Note The following points should be noted:

Not all pages will be indexed
The search is limited to the pages which have been indexed by the HotBot indexer. HotBot may not have indexed all pages on the server, and the pages may have changed since the pages were last indexed.
All resources on server may be searched
The search is carried out across all resources on the server, and not just on the NFP pages. For example all pages on the server ireland.iol.ie will be searched and not just those under the directory ireland.iol.ie/~libcoun/.

Although web sites will ideally have their own search engine which can be configured to support local requirements (e.g. index new resources when they become available, index a range of file formats, omit certain resources from the index such as draft document, etc.) the use of a remote index may be worth considering, especially if remote search service allow searches to be restricted to areas of the web site.

Support for the Robot Exclusion Protocol

The Robot Exclusion Protocol [8] enables a web site administrator to specify directories which robots should not access. Although it does not provide a security mechanism this protocol can be used to avoid search engines indexing draft documents and personal files. It can also be used to stop search engines from wasting server capacity by attempting to index files such as images, CGI scripts, etc.

The Robot Exclusion Protocol makes use of a file with the name robots.txt which is located at the root of the web server. A typical file (taken from the Irish NFP web site) is shown below, with annotations explaining the role of the statements.

User-agent: * # Following conditions apply to all robots
Disallow: /cgi-bin/ # Robots not allowed to index resources (typically scripts) in cgi-bin directory
Disallow: /tmp/ # Robots not allowed to index resources in /tmp (temporary files)
Disallow: /resource/home/
Disallow: /iol/
Disallow: /thisweek/ # Robots not allowed to index resources in /thisweek (typically news items)
Figure 1: A Typical robots.txt File

An analysis of the robots.txt files on the servers which host NFP web sites is given in Table 5. UKOLN's web-based /robots.txt checker [9] was used to analyse these files.

Table 5: Analysis of robots.txt Files on NFP Web Site
NFP Web Site Status robots.txt File
Austria None robots.txt file
Belgium None robots.txt file
Finland Disallows access to cgi-bin, dc5, _private, dc4b, dc4htm, gablocal, images and pics robots.txt file
France None robots.txt file
Germany Disallows access to entire web site robots.txt file
Ireland Disallows access to /cgi-bin/, /tmp/, /resource/home/, /iol/ and /thisweek/ robots.txt file
Norway None robots.txt file
Spain None robots.txt file
Sweden None robots.txt file
Switzerland Disallows access to /cgi-bin/, /usage/, /interne/, /imgs/ and /cache-usage robots.txt file

 

404 Error Pages

The 404 error page can be an important navigational feature for web site, especially for web sites which have long URLs which may be difficult to type correctly. As described in an Ariadne article which analysed 404 error pages provided on UK University web sites [10] there are a range of features which can be provided on a well-designed 404 page.

A brief summary of the 404 error pages for NFP web sites is given in Table 6.

Table 6: Analysis of 404 Error Messages
NFP Web Site Error message Try it
Austria Brief text message 404 page
Belgium Very brief text message 404 page
Finland 404 page
France Brief text message 404 page
Germany Brief text message 404 page
Ireland Contains site map and advertising 404 page
Norway Brief text message 404 page
Spain Brief text message 404 page
Sweden Brief text message 404 page
Switzerland Brief text message 404 page

As can be seen from Table 4 only the Irish NFP web site provides a tailored 404 error page - the remainder use the default server message.

Accessibility Of NFP Web Sites

The Bobby tool [11] was used to analyse the accessibility of the main entry point for NFP web sites. The results are summarised in Table 7.

Table 7: Accessibility of NFP Entry Points
NFP Web Site Comments
Austria Priority 1: ALT attribute missing for images (15 instances).
Possible incorrect HTML (8 instances).
Belgium Priority 1: Frames require a title.
Note individual frame sets not analysed.
Finland Priority 1: No problems found.
Possible incorrect HTML (2 instances).
France Priority 1: No problems found.
Germany Could not access page
Ireland Priority 1: ALT attribute missing for images (2 instances).
Norway Priority 1: No problems found.
Spain Priority 1: ALT attribute missing for images (15 instances).
Sweden Priority 1: ALT attribute missing for images (1 instance).
Switzerland Priority 1: ALT attribute missing for images (22 instances).
Possible incorrect HTML (4 instances).

Conclusions

The analysis of the NFP web sites has shown that a variety of approaches have been taken. With the ever-increasing importance of dissemination of the activities funded by the European Commission the role of the National Focal Points (and their successor under the Fifth Framework) will become even more important. As the web becomes more sophisticated it is important that web sites are designed to facilitate automated processes, and not just for viewing by humans.

Based on this survey a number of recommendations can be made.

Short URLs
Short URLs are desirable as they are easier to remember, cite and type than long URLs.
File extensions may be confusing
There may be ambiguities in the use of file extensions (such as .htm and .html). In addition the use of server default file names (such as intro.html or index.html) can reduce the length of the entry point (e.g. http://renki.helsinki.fi/eu/ instead of http://renki.helsinki.fi/eu/index.html).
Store the content of NFP web sites in their own directories
The content of NFP web sites should be stored in their own directories, and the NFP web site entry point should be located within this directory.
Use metadata
Metadata such as <META NAME="keywords" VALUE="..."> or <META NAME="content" VALUE="..."%gt; can improve the ranking of the web site in search engines.
Beware of frames
Many search engines cannot access framed web sites. If frames are used there should be an alternative mechanism for search engines (and users with browsers which don't support frames).
Analyse link popularity
Use services such as linkpopularity.com to analyse the number of links to your web site. Consider submitting details of your web site to search engines.
Use the Robot Exclusion Protocol
Use the Robot Exclusion Protocol to stop search engines from indexing files which should not be widely disseminated.
Investigate options for implementing a search facility
Investigate the options for implementing a search facility, including use of a remote service, if appropriate.
Consider enhancements to the site 404 page
Think about enhancements to the web site 404 error page.
Accessibility
Think about the needs for people with disabilities who are accessing your web site. Consider use of an accessibility checker such as Bobby.
Consistency across NFP web sites
Think about the advantages in providing a consistent interface, structure and methodology across NFP web sites.
Develop good working links with the web server administrator
The web server administrator may be responsible for implementing a number of the recommendations given here (for example updating the robots.txt and the 404 error message files and providing and configuring a search facility).

It is hoped that these recommendations may prove useful to new Commission-funded and other project web sites which are about to be set up.

Summary of the Tools Used

The following tools were used to carry out the analyses:

doc-info, http-info and /robots.txt checker
Freely-available web-based tools for analysing web resources and HTTP headers. Developed by UKOLN as part of the WebWatch project.
Microsoft SiteServer
Web server software which includes an analysis component.
linkpopularity
A web-based service for analysing the number of links to a web site.
Bobby
A web-based service for analysing the accessibility of a web site.

References

  1. National Focal Points, European Commission DGXIII - Telematics for Libraries
    URL: <http://www.echo.lu/libraries/en/nfp-list.html>
  2. Web Technologies: URLs for Telematics for Libraries Project Pages, Exploit Interactive issue 1
    URL: <http://www.exploit-lib.org/issue1/urls/>
  3. doc-info, UKOLN
    URL: <http://www.ukoln.ac.uk/web-focus/webwatch/services/doc-info/>
  4. http-info, UKOLN
    URL: <http://www.ukoln.ac.uk/web-focus/webwatch/services/http-info/>
  5. SiteServer, Microsoft
    URL: <http://www.microsoft.com/siteserver/site/>
  6. linkpopularity.com, linkpopularity.com
    URL: <http://www.linkpopularity.com/>
  7. WebWatch: UK University Search Engines, Ariadne issue 21
    URL: <http://www.ariadne.ac.uk/issue21/webwatch/>
  8. Standard for Robot Exclusion, Koster, M
    URL: <http://info.webcrawler.com/mak/projects/robots/norobots.html>
  9. http-info, UKOLN
    URL: <http://www.ukoln.ac.uk/web-focus/webwatch/services/robots-txt/>
  10. Web Watch: 404s - What's Missing?, Ariadne issue 20
    URL: <http://www.ariadne.ac.uk/issue20/404/>
  11. Bobby, CAST
    URL: <http://www.cast.org/bobby/>

Author Details

Picture of Brian Kelly Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath
BA2 7AY

URL: <http://www.ukoln.ac.uk
Email: b.kelly@ukoln.ac.uk

Brian Kelly is UK Web Focus. He works for UKOLN, which is based at the University of Bath

For citation purposes:
Brian Kelly, "Analysis of NFP Web Sites", Exploit Interactive, issue 3, October 1999
URL: <http://www.exploit-lib.org/issue3/nfp-websites/>


NOTE: An article has been posted.