

When new projects and services are launched they are often expected to provide regular performance indicators which seek to provide an objective description of their service. projects and service may also be expected to reach agreements with the funding body on the minimum levels of services to be provided (often referred to as Service Level Agreements or SLAs). In this article Brian Kelly looks at performance indicators and SLAs which are relevant to web sites.
Why have performance indicators? Aren't they very time-consuming to produce? Aren't they of interest only to "bureaucrats and bean-counters"?
The interest in performance indicators reflects a need to ensure that the funding (in many cases public funding) is being spent wisely. It can be true, though, that performance indicators can be very time-consuming for service providers to produce. And many service providers would point out that performance indicators can be misleading, and may require knowledge of the service in order to make valid interpretations.
It should be born in mind that service providers themselves will also benefit from monitoring service indicators. Indications of growth in the service will help to identify when extra capacity (such as CPU power, disk space, etc.) is required. The informaton on growth can also be used to promote a service. Knowledge of failures (e.g. system unavailability) are needed in order to fix the problem. Records of persistent failures will be needed in order to identify and correct more serious procedural or systematic problems.
An example of a body which has responsibility for monitoring the performance of national services against their Service Level Definitions (SLDs) is the Monitoring and Advisory Unit (MAU) [1]. One of the roles of the MAU is "Monitoring the performance of each aspect of each service against the Service Level Definitions (SLDs) contained in its MoU." Bodies such of the MAU will be interested in gaining a greater understanding of appropriate performance indicators for web sites, and the resource implications in collecting such information.
In this article we describe performance indicators which are relevant to projects and services which provide a web site as a significant deliverable. The article reviews a number of quantifiable performance indicators, discusses the validity of the indicators and describes the resource implications in collecting and analysing the data.
The first performance indicator to consider is web server statistics. Most readers will probably be familiar with web server statistics: they are normally included in annual reports (and always seem to show a healthy growth) and often can be found on web sites themselves.
Web statistics are produced by the web server software. The raw data will be produced by default - no additional configuration will be needed to produce the server's default set of usage data.
The server log file records information on requests (normally referred to as a "hit") for a resource on the web server. Information included in the server log file includes the name of the resource, the IP address (or domain name) of the user making the request, the name of the browser (more correctly, referred to as the "user agent") issuing the request, the size of the resource, date and time information and whether the request was successful or not (and an error code if it was not). In addition many servers will be configured to store additional information, such as the "referer" (sic) field, the URL of the page the user was viewing before clicking on a link to get to the resource.
An example of a server log file is shown in Figure 1.
#Software: Microsoft Internet Information Server 4.0 #Version: 1.0 #Date: 1999-12-25 00:00:21 #Fields: date time c-ip cs-username cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs(User-Agent) cs(Cookie) cs(Referer) 1999-12-25 00:00:21 194.237.174.119 - GET /issue1/jobs/Default.asp - 200 20407 AltaVista-Intranet/V2.3A+(www.altavista.co.uk+jan.gelin@av.com) - - 1999-12-25 00:03:39 194.237.174.119 - GET /statistics/ExpIntHits1.asp - 200 10519 AltaVista-Intranet/V2.3A+(www.altavista.co.uk+jan.gelin@av.com) - - 1999-12-25 00:26:54 209.67.247.158 - GET /robots.txt - 200 303 FAST-WebCrawler/2.0.9+(crawler@fast.no;+http://www.fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html) - - 1999-12-25 00:32:47 194.237.174.119 - GET /issue2/default.asp - 200 5332 AltaVista-Intranet/V2.3A+(www.altavista.co.uk+jan.gelin@av.com) - - 1999-12-25 01:49:54 206.186.25.7 - GET /resources/images/main/bg.gif - 200 300 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) ASPSESSIONIDGQQGQGAD=IIHCBIFDIECKPAPGICDEOJII;+SITESERVER=ID=22e0a17296b8c2ed1f77460cde75c27f http://www.exploit-lib.org/issue1/webtechs/ 1999-12-25 01:49:54 206.186.25.7 - GET /issue1/webtechs/Default.asp - 200 24659 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) - http://www.statslab.cam.ac.uk/%7Esret1/analog/webtechs.html 1999-12-25 01:49:54 206.186.25.7 - GET /resources/images/main/global_home_h.gif - 200 487 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) ASPSESSIONIDGQQGQGAD=IIHCBIFDIECKPAPGICDEOJII;+SITESERVER=ID=22e0a17296b8c2ed1f77460cde75c27f http://www.exploit-lib.org/issue1/webtechs/ 1999-12-25 01:49:54 206.186.25.7 - GET /resources/images/main/global_search_disabled.gif - 200 534 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) ASPSESSIONIDGQQGQGAD=IIHCBIFDIECKPAPGICDEOJII;+SITESERVER=ID=22e0a17296b8c2ed1f77460cde75c27f http://www.exploit-lib.org/issue1/webtechs/ 1999-12-25 01:49:56 206.186.25.7 - GET /resources/images/main/local_home01.gif - 200 663 Mozilla/2.0+(compatible;+MSIE+3.02;+AK;+Windows+NT) ASPSESSIONIDGQQGQGAD=IIHCBIFDIECKPAPGICDEOJII;+SITESERVER=ID=22e0a17296b8c2ed1f77460cde75c27f http://www.exploit-lib.org/issue1/webtechs/Figure 1: Sample Web Server Log File
Figure 1 shows the first few records of the server log file for the Exploit Interactive web magazine for Christmas Day, 1999. The first four lines are comments. The first line of data shows that at 21 seconds past midnight on Christmas Day a computer with the IP address 194.237.174.119 issued a GET request (the normal method for requesting a resource) for the resource http://www.exploit-lib.org/issue1/jobs/Default.asp (note the domain name is not included in the entry in the log file, since it is the same for all entries). The resource was 20,407 bytes and the resource was transferred successfully (a 200 error code). The resource was requested by the AltaVista-Intranet user agent (a robot which indexes web sites).
The first four records are from a web robot from AltaVista. However at 01:49 there is a request for http://www.exploit-lib.org/issue1/webtechs/Default.asp resource. This request is issued by a Mozilla/2.0 browser (the code for Netscape) and the user was following a link from http://www.statslab.cam.ac.uk/%7Esret1/analog/webtechs.html. It will be noticed that this request is accompanied by requests for a number of images (.gif files).
It is not too difficult to see how this raw data can be used to provide graphical displays showing growth in the numbers of hits, profiles of the browsers used to access the web site, etc. An example of a simple display of changes in the number of hits during 1999 for the University of Cambridge Statistical Laboratory web site is shown in Figure 1 (complete data is available at [2]).
month: reqs: pages: Gbytes: --------: -------: -------: -------: Jan 1999: 465,499: 203,206: 6.773:Feb 1999: 412,622: 192,120: 5.762:
Mar 1999: 512,918: 234,660: 6.816:
Apr 1999: 520,227: 239,278: 6.902:
May 1999: 517,235: 237,149: 8.139:
Jun 1999: 518,836: 242,153: 8.610:
Jul 1999: 517,605: 237,728: 7.977:
Aug 1999: 444,627: 189,801: 6.439:
Sep 1999: 514,747: 232,570: 8.529:
Oct 1999: 563,913: 260,173: 10.444:
Nov 1999: 715,738: 329,487: 12.071:
Dec 1999: 620,671: 272,260: 13.703:
Figure 2: Web Server Statistics for the Statistical Laboratory, University of Cambridge
The well-known saying about "lies, damm lies and statistics" can be updated to refer to "lies, damm lies and web statistics". In the interpretation of web statistics there are a number of caveats to be aware of.
If the numbers of hits shows a steady growth over an extended period, is this a clear indication of a growth in the popularity of the service? The answer, quite simply, is no. If the numbers of hits on your web site grows by, say 50% over a year, the number of visitors could actually be decreasing. On the other hand, the numbers of visitors could be growing at a far greater rate.
The numbers of hits received by a web site is influenced by several factors:
As a consequence of the points mentioned above usage summaries will overestimate the numbers of visitors who make use of the information of services provided by a web site. However it should be pointed out that there are other factors which will result in an underestimation of the numbers of visitors:
A number of web statistics analysis packages provided information on the the number of user sessions (visitors). Is this information more meaningful than "hits"? We must first define the term.
A user session can be defined as a series of requests from a unique IP address within a specified period of time (often 30 minutes). So a growth in the number of user sessions will not be affected by changes in the architecture of the web site (i.e. more images added). So are user sessions a more relevant indicator? The answer is yes, but user sessions can still be misleading. User sessions will still be distorted by robots, one-off visitors and caching. They will also be affected by multi-user machines, so that if a PC is used by several people, it will be regarded as the same user. More worryingly, institutional caches or firewalls, will be treated as a single user. So if you are pleased to notice a growth in the average time spent by users at your web site, this could be the result of many one-off visitors who are behind the same firewall or cache who are accessing your web site indepentently of each other.
Jeff Goldberg from the University of Cranfield Computer Centre has written a document entitled "On Interpreting Web Statistics" which argues that web usage statistics are (worse than) meaningless [6]. Although this document is now quite old, we have seen that web statistics can be misleading. Does this mean that we should forget about web statistics as a performance indicator?
I would argue that the answer to this is no. Although, as Susan Haigh and Janette Megarity point out in their report on "Measuring Web Site Usage: Log File Analysis" "log file analysis is perhaps best viewed as an art disguised as a science." [7] web statistics do provide valuable information. However it may be necessary to carry out data mining, in order to detect patterns which may be hidden from simple analyses.
A wide range of web statistical analysis packages are available, including free packages such as Analog [8], Analog's companion package Report Magic [9], Webaliser [10] and aWebVisit [11]. Licensed products include WebTrends [12] and Accrue's HitList [13]. A more complete listing is available at Internet Product Watch [14] and Yahoo! [15].
An alternative approach to using web statistical analysis packages is to make use of externally-hosted statistical analysis services. Services such as NedStat [16], SiteMeter [17], SuperStats [18] and Stats4All [19]. As described in a recent Ariadne article [20], there appears to be a growing market for a range of externally-hosted services. They have the advantage of being easy to set up. In many cases they are free, and are funded by advertising, although there may be a licensed version which is free of advertising.
A review of the NedStat and SiteMeter services is given elsewhere in this issue of Exploit Interactive [21].
What are the resource implications in providing meaningful summaries of web usage statistics?
Initially the web server must be configured appropriately. For example, what information should be recorded? Should IP addresses be resolved (so that domain names such as bath.ac.uk will be stored in the log files) or, in order to maximise the performance of the web server, should the resolution take place when the statistics are being analysed?
It may be necessary to develop automated processes for managing server log files. Server log files can be very large. The log files for Exploit Interactive, which are kept on a daily basis, vary in size from 15 Kb to 3,844 Kb. The large size of log files will have implications for disk storage and the processing power of the computer which will carry out the analysis.
It is desirable that automated processes are implemented for analysing server log files. This will normally require use of a server system (often a Unix or NT server). Recent versions of log analysis software will enable automated analyses to be initiated from within the application. However this not be possible in entry level packages.
Although several analysis packages are freely-available, the requirements for data-mining or automation may necessitate the purchase of an expensive package, or significant software development / system configuration effort if a solution based on free tools is required. It should also be pointed out that a powerful computer may be needed to analyse log files.
As well as indicators of access to a web site, it is also desirable to provide indicators of access problems. This may include information on broken links and server availability statistics.
We've all come across 404 error messages which indicate a broken link. So we know how irritating they are. It is desirable that web services should minimise the numbers of broken links. This should include internal links and links to external resources.
Link checkers can be used to detect broken links. Many authoring packages will provide link-checking capabilities. In addition there are a number of dedicated link checking applications available, some of which are listed at Yahoo! [22].
It should be noted that simple link-checking software will typically only report on simple hypertext links (the <A HREF=".."> element) and inline images (the <IMG SRC=".."> element). In the current ganeration of web sites there are likely to be several other types of links, including links to style sheet files, links to images within style sheets, links in JavaScript, links in HTML FORMs, etc. If link-checkng software which can report on such links is not available it will be necessary to analyse the server log files in order to detect user requests for unavailable resources.
Information on the server availability is important. The server may be managed by an the service's technical staff or by a central IT department. Procedures for detecting when the server is unavailable and systematically recording the down time may be available. It this is not the case third-party services are available which can monitor services and provide automated notification in a variety of ways, such as email or by pager.
An example of a third-party which provides this type of service is WatchDog [23].
![]() Figure 3: The WatchDog Interface |
It should be noted that the professional version of web statistcal software such as WebTrends [12] will also provide monitoring functions.
As described in a recent article in Ariadne [24] and elsewhere in this issue of Exploit Interactive [25] it is possible to obtain details on links to your web site using services such as LinkPopularity.com [26]. As the article mentions, the quality of the information provided may be questionable. However, as with web usage statistics, as long as the reservations are born in mind, information and trends in the numbers of links to a web site may provide a valuable insight into how useful the community finds the web site.
![]() Figure 4: LinkPopularity |
If your web site acts as a portal, providing links to external web resources, it may be desirable to provide information on the numbers of users who follow links from your web site. This is not normally possible, if you make use of the <A HREF="http://www.acme.com/"> element. However it is possible to make use of a simple redirection to a CGI script. For example if you use a link such as:
<A HREF="/cgi-bin/redirect-link?http://www.acme.com/">
hen a user follows the link, the /cgi-bin/redirect-link can be used to record user details, before going to the destination.
Depending of the role of the web service, it may be desirable for its content to be available through use of search engines such as AltaVista. As described in a recent article in Exploit Interactive [27] there are a number of approaches to using the web to promote your web site. The approaches discussed in the section on recording the number of links to your web sites can also be used to record the number of resources held in a search engine.
The following checklist is provided as a summary of the potential performance indicators covered in this article.
| Area | Summary | Caveats | Recommendation |
| Web Statistics | Trends in accesses, users, user profiles, errors, referrals, etc. | Results distorted by caches, robots, web site architecture, etc. | Should be provided, accompanied by warnings about limitations. |
| Error Logs | Detecting errors on the web site, such as a variety of broken links | Will only detect broke links which users have attempted to access. | Will be provided as part of an analysis of web statistics. |
| Server Availability | Time the server is available. | May be costly / time-consuming to automate. | Should be provided for large-scale services. |
| Links To Web Site | Information on links to a web site. | Will only report on links which are included in search engine database. | Value not yet clear. |
| Links From Web Site | Information on links followed by users from your web site. | Requires use of intermediary script. | May be useful for portal sites. |
| Search Engine Coverage | Information on number of resources on your web site indexed by a global search engine. | Numbers provided may vary due to "document fluctuation". | May be useful for some sites. |
Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath
England
BA2 7AY
URL: <http://www.ukoln.ac.uk>
Email: <b.kelly@ukoln.ac.uk>
Brian Kelly is UK Web Focus. He works for UKOLN, which is based at the University of Bath.
For citation purposes:
Brian Kelly, "Performance Indicators for Web Sites",
Exploit Interactive, issue 5, April 2000
URL: <http://www.exploit-lib.org/issue5/indicators/>
[HTML Validation] - [Accessibility check]
|
Issue Home | Editorial | Features | Regular Columns | News and Events | Et cetera | ||
|
| ||
| Go to Top |
A UKOLN Service. Contact Us. Copyright © 1999 |
Last Updated: 7 April 2000 |