Exploit Interactive HomeHomeSearch
Issue CoverEditorialFeaturesRegular ColumnsNews and EventsEt cetera

Using Metadata To Improve Local Searching

Brian Kelly describes how the use of Dublin Core metadata is used to provide enhanced searching features for Exploit Interactive.

The Need For Metadata

Providing a search facility for a web site is very easy. Many tools are available, including many which are available free-of-charge. However as web sites grow web administrators often become aware that end users can find the search facilities of limited use. It becomes difficult, for example, to search for a document written by a particular person, to search within a particular category of the web site or to combine a variety of search criteria.

Metadata, which can be defined as structured data about data, can help to overcome these limitations. Dublin Core metadata [1], in particular, provides an agreed standard for metadata for resource discovery. Using simple Dublin Core metadata it is possible to search by an author's name. Using more advanced Dublin Core metadata it is possible to provide more sophisticated types of search queries.

The New Search Facility in Exploit Interactive

An enhanced search facility has been released for Exploit Interactive [2]. Full-text searching and searching by author name or description has been available since issue 1. As can be seen in Figure 1 it is now possible to search across a particular issue.

Figure 1: Searching for 'rdf' In Issue 1
Figure 1: Searching for "rdf" In Issue 1

Figure 1 illustrates a search for an article containing "rdf" in issue 1. As Exploit Interactive makes use of the issue number in the URL, this type of search could have been provided by a simple search tool which provided filtering capabilities based on URLs. However this approach is very limited as it is dependent on the URL naming conventions. In fact the query does not make use of the URL name; Dublin Core metadata is used to describe the issue number.

Figure 2 illustrates this point. In Figure 2 illustrates a search for "rdf" in "Feature Articles" (as opposed to Regular Column, News and Events, or Editorial columns). Since there is no encoding of a "Feature Article" in the URL, this type of query requires an alternative approach. The approach employed is to use Dublin Core metadata to describe article types.

Figure 2: Searching for 'rdf' In Feature Articles
Figure 2: Searching for "rdf" In Feature Articles

This approach can be extended. For example it is possible to search for articles about projects which have been funded by a range of funding programmes, such as Telematics For Libraries, DIGICULT, etc.

The interface illustrated in Figures 1 and 2 is dependent on the browser providing support for frames and JavaScript. Although support for frames and JavaScript is common, it is by no means universal. In addition browsers from different suppliers may not be compatible. In order to overcome these problems we have also developed an interface which uses simple HTML. This is illustrated in Figure 3.

Figure 3: Simple interface for advanced searching
Figure 3: Simple interface for advanced searching

An additional advantage with this interface is that it may be possible to allow multiple categories to be searched (for example it may be possible to search for articles in issue 1 and issue 3 which have been funded by the Telematics For Libraries programme). We are currently investigating functionality provided by the indexing software to see if this can be done.

A Description Of The Metadata

A summary of the metadata used is given in Table 1.

Table 1: Dublin Core Metadata Used In Exploit Interactive
Description Function Example
Issue number (e.g. 1) Searching in a particular issue (or range of issues) <meta name="DC.Relation.IsPartOf" content="http://www.exploit-lib.org/issue4/">
Type of article (Regular, Feature, News, etc.) Searching for a particular article type(s) (e.g. Regular or Feature article, but not News) <meta name="DC.Type" content="text.article.feature" scheme="Exploit-categories">
Funding body for article, such as "tfl" (Telematics For Libraries). Searching for articles about projects funded by a particular funding body. <meta name="DC.Subject" content="tfl" scheme="Exploit-article-funders">

 

Managing The Metadata

The metadata included in articles is not embedded directly. Instead it is defined by a simple variable using VBScript in the article_defaults.ssi file. Every article has an article_defaults.ssi file, which defines the author, title, etc. as illustrated below.

<%
doc_title = "Using Metadata To Improve Local Searching"
author = "Kelly, B."
description = "Brian Kelly describes how metadata is used in Exploit Interactive ..."
keywords = "EXPLOIT, TAP, Telematics for Libraries"

' Give the article type : either feature, regular, news, editorial or etc
article_type = "regular"

' Give the funding body : either tfl, tap, elib, institution, national or other
' tfl = funded by Telematics For Libraries;  tap = funded by (other) Telematics Application Programme
' institution = funded by institution;  national = funded by national level; other = other funding body
funding_body = "tfl, digicult"

%>
Figure 4: Definition Of The Metadata Values

The metadata is read in by a default.asp file. This file calls another file which transforms the variables into Dublin Core metadata. The output, which is embedded in the HTML for each article, as shown below.

<meta name="DC.Title" content="Using Metadata To Improve Local Searching">
<meta name="DC.Creator" content="Kelly, B.">
<meta name="DC.Description" content="Brian Kelly describes how metadata is used on the Exploit Interactive web magazine to provide advanced searching capabilities">
<meta name="DC.Relation.IsPartOf" content="http://www.exploit-lib.org/issue5/">
<meta name="DC.Type" content="text.article.regular" scheme="Exploit-categories">>
<meta name="DC.Subject" content="tfl" scheme="Exploit-article-funders">
<meta name="description" content="Brian Kelly describes how metadata is used on the Exploit Interactive web magazine to provide advanced searching capabilities">
<meta name="keywords" content="EXPLOIT, TAP, Telematics for Libraries">
Figure 5: HTML Representation Of The Dublin Core Metadata

This approach has a number of advantages. The data can be reused (for example the title is used in the HTML <TITLE> element and the title and author name are used in the citation information. In addition, if an alternative syntax for storing the metadata is required (e.g. RDF) this can be done by simply changing a single script.

Further Information

The search service described in this article has been implemented using Microsoft's SiteServer software. A short paper about this work [3] has been accepted at the Ninth International World Wide Web Conference, to be held in Amsterdam in May 2000 [4].

References

  1. Dublin Core Metadata Initiative
    URL: <http://www.purl.org/dc/> Link to external resource
  2. Category Search, Exploit Interactive
    URL: <http://www.exploit-lib.org/cat_search/> Link to external resource
  3. A Lightweight Approach To Support Of Resource Discovery Standards, Brian Kelly, Poster Presentation at the WWW 9 conference,
    URL: <http://www.ukoln.ac.uk/web-focus/papers/www9/resource-discovery/> Link to external resource
  4. Ninth International World Wide Web Conference
    URL: <http://www9.org/> Link to external resource

Author Details

Picture of Brian Kelly Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath
England
BA2 7AY

URL: <http://www.ukoln.ac.uk> Link to external resource
Email: <b.kelly@ukoln.ac.uk>

Brian Kelly is UK Web Focus. He works for UKOLN, which is based at the University of Bath.

For citation purposes:
Brian Kelly, "Using Metadata To Improve Local Searching", Exploit Interactive, issue 5, April 2000
URL: <http://www.exploit-lib.org/issue5/metadata/>


[HTML Validation] - [Accessibility check]