Exploit Interactive HomeHomeSearch
Issue CoverEditorialFeaturesRegular ColumnsNews and EventsEt cetera

BUILDER:
The Hybrid Library Projects Search Engine

Ian Upton introduces the Hybrid Library Projects Search Engine [1]. The aim of the search engine is to demonstrate the potential of a seamless interface cross-searching a number of web servers and document types.

Introduction

The Hybrid Library Projects Search Engine is a web application developed using Microsoft SiteServer [2]. SiteServer is a Back Office component [3] that sits on top of Microsoft Internet Information Server (IIS) and provides personalisation, analysis and indexing facilities. Indexing can be facilitated using file scans and / or HTTP crawls.

Hybrid Library Search
Figure 1: The Hybrid Library Search Interface

BUILDER [4] first encountered SiteServer when researching ways to index two electronic journal products, Midland History [5] and Forensic Linguistics [6], available in Adobe Acrobat (PDF) format. As well as indexing HTML and Microsoft Office documents, SiteServer can be coerced into indexing Adobe Acrobat (.PDF) files, using a plug-in [7], which made it an attractive proposition for supporting these deliverables. SiteServer can be obtained for less than £150 through the Microsoft Select Scheme. Even though BUILDER has only exploited SiteServers indexing facilities (so far ignoring personalisation and analysis) the software has been a cost-effective purchase.

Having implemented journal searching and provided a search facility for the BUILDER Web site we became aware of SiteServers ability to cross-search different web resources and present results as a single ranked list. Given current discussions, cross-searching different web sites on different platforms is clearly a hot topic. Having a tool that provided a possible solution we began to look for a demonstrator and the Hybrid Library Projects Search Engine was born!

Hybrid Library Projects Search Engine: A Cross-searching Application From Scratch

So what would be involved in setting up a similar product from scratch? The task can be divided into the following areas:

1. Infrastructure

SiteServer runs on a Windows NT Server running IIS 4.0. The software can be installed in a matter of minutes. Warning! Our experience of SiteServer is that it is a very processor intensive piece of software and, given the choice, we would advise setting it up on a dedicated box. For industrial applications it is possible to set the software up so that database functionality is performed by SQL Server rather than SiteServers native Access database. For our BUILDER applications we did not feel that such measures were necessary.

2. Administration

All of SiteServers functionality is controlled through a straightforward graphical user interface (GUI) which includes test search pages so you can check indexes before developing your own interfaces to them. We spent about a day familiarising ourselves with the administration interface.

Once a search application has been constructed maintenance becomes no more than an administrative task. We estimate it would now take one to two hours training to get a typical department administrator up to the point where they could create, modify and check searches in an existing search engine application. Office IT skills (database, spreadsheet, email) experience is more than adequate for this role.

Once set up, the search process looks after itself. Crawls can be scheduled (the Hybrid Library Projects Search Engine crawls at 2 am each night) and the intensity of the crawl can be controlled. Warning! It is possible to let the SiteServer spider loose and bring a remote web server to its knees. By default SiteServer will retrieve five documents at a time concurrently. For an application such as the Hybrid Library Projects Search Engine this is clearly overkill and is likely to upset web server administrators. For the Hybrid Library Projects Search Engine we adjusted this default to retrieve a single document every two seconds during off-peak periods.

3. Web Interface

Although you can use the supplied interface to access created indexes, SiteServer exposes itself as a number of IIS components. These components can be easily accessed from within the Active Server Pages (ASP) environment and manipulated using server-side VBScript, Javascript or Perl. Given the component nature of the environment , scripting search pages takes the form of glueing these components together to create the required functionality. Developing the Hybrid Library Projects Search Engine web interface took approximately two days. Such development may take longer in other contexts. (BUILDER already has significant ASP experience.)

Example Code:

This example VBScript code snippet performs a query and returns result as an array for display within a web page. This code snippet represents about 90% of the code involved in implementing the Hybrid Library Projects Search Engine.

# get querystring
x = Request.QueryString

' Create the Query Object, and set properties for the search.
set Q = Server.CreateObject("MSSearch.Query") Q.SetQueryFromURL(x)
Q.Catalog = "HybridLibrarySearch"
Q.SortBy = "rank[d], Title"
Q.Columns = "DocTitle, DocAddress, FileName, Description, Rank"

' Execute the query and create the recordset holding the search results.
set RS = Q.CreateRecordSet("sequential")

' Generate response.
if err then
session("!hlsresults!") = "Error : " & err.description
elseif RS.BOF and RS.EOF then
if Q.QueryIncomplete=true then
session("!hlsresults!") = "Error: Too complex!"
else
session("!hlsresults!") = "Error: No documents!"
end if
else
dim x()
c = 0
redim x(RS.Properties("RowCount"),4)

' construct records
Do while not RS.EOF
x(c,0) = RS("DocTitle")
x(c,1) = RS("FileName")
x(c,2) = RS("DocAddress")
x(c,3) = RS("Description")
c = c + 1
RS.MoveNext
Loop
session("!hlsresults!") = x
end if

' return to search page
response.redirect "index.asp"

Figure 2: Code Fragment

Reader Response

If you have any comments on this article, please contact the editors (exploit-editor@ukoln.ac.uk).

References

  1. Hybrid Library Projects Search Engine
    URL: <http://www.builder.bham.ac.uk/hls/>
  2. Microsoft SiteServer (including free trial download)
    URL: <http://www.microsoft.com/siteserver/site/>
  3. Windows NT Explorer: The Microsoft SiteServer Search Facility, Burridge, Brett., Ariadne, March 1999 (19)
    URL: <http://www.ariadne.ac.uk/issue19/nt/>
  4. The BUILDER Project
    URL: <http://builder.bham.ac.uk/>
  5. Midland History
    URL: <http://www.bham.ac.uk/midlandhistory/>
  6. Forensic Linguistics
    URL: <http://www.bham.ac.uk/forensiclinguistics/>
  7. Adobe Acrobat PDF IFilter Plug-in
    URL: <http://www.adobe.com/supportservice/custsupport/SOLUTIONS/12b42.htm>

Author Details

Ian Upton
Technical Development Officer
Information Services
Main Library
The University of Birmingham
Birmingham

URL: <http://builder.bham.ac.uk/>
URL: <http://www.idsolutions.co.uk/ian/>
Email: i.p.upton@bham.ac.uk
Tel: +44 0121 414 6380


For citation purposes:
Ian Upton, "BUILDER: the Hybrid Library Projects Search Engine", Exploit Interactive, issue 3, 25 October 1999
URL: <http://www.exploit-lib.org/issue3/builder/>