

Ian Upton introduces the Hybrid Library Projects Search Engine [1]. The aim of the search engine is to demonstrate the potential of a seamless interface cross-searching a number of web servers and document types.
The Hybrid Library Projects Search Engine is a web application developed using Microsoft SiteServer [2]. SiteServer is a Back Office component [3] that sits on top of Microsoft Internet Information Server (IIS) and provides personalisation, analysis and indexing facilities. Indexing can be facilitated using file scans and / or HTTP crawls.
![]() Figure 1: The Hybrid Library Search Interface |
BUILDER [4] first encountered SiteServer when researching ways to index two electronic journal products, Midland History [5] and Forensic Linguistics [6], available in Adobe Acrobat (PDF) format. As well as indexing HTML and Microsoft Office documents, SiteServer can be coerced into indexing Adobe Acrobat (.PDF) files, using a plug-in [7], which made it an attractive proposition for supporting these deliverables. SiteServer can be obtained for less than £150 through the Microsoft Select Scheme. Even though BUILDER has only exploited SiteServers indexing facilities (so far ignoring personalisation and analysis) the software has been a cost-effective purchase.
Having implemented journal searching and provided a search facility for the BUILDER Web site we became aware of SiteServers ability to cross-search different web resources and present results as a single ranked list. Given current discussions, cross-searching different web sites on different platforms is clearly a hot topic. Having a tool that provided a possible solution we began to look for a demonstrator and the Hybrid Library Projects Search Engine was born!
So what would be involved in setting up a similar product from scratch? The task can be divided into the following areas:
SiteServer runs on a Windows NT Server running IIS 4.0. The software can be installed in a matter of minutes. Warning! Our experience of SiteServer is that it is a very processor intensive piece of software and, given the choice, we would advise setting it up on a dedicated box. For industrial applications it is possible to set the software up so that database functionality is performed by SQL Server rather than SiteServers native Access database. For our BUILDER applications we did not feel that such measures were necessary.
All of SiteServers functionality is controlled through a straightforward graphical user interface (GUI) which includes test search pages so you can check indexes before developing your own interfaces to them. We spent about a day familiarising ourselves with the administration interface.
Once a search application has been constructed maintenance becomes no more than an administrative task. We estimate it would now take one to two hours training to get a typical department administrator up to the point where they could create, modify and check searches in an existing search engine application. Office IT skills (database, spreadsheet, email) experience is more than adequate for this role.
Once set up, the search process looks after itself. Crawls can be scheduled (the Hybrid Library Projects Search Engine crawls at 2 am each night) and the intensity of the crawl can be controlled. Warning! It is possible to let the SiteServer spider loose and bring a remote web server to its knees. By default SiteServer will retrieve five documents at a time concurrently. For an application such as the Hybrid Library Projects Search Engine this is clearly overkill and is likely to upset web server administrators. For the Hybrid Library Projects Search Engine we adjusted this default to retrieve a single document every two seconds during off-peak periods.
Although you can use the supplied interface to access created indexes, SiteServer exposes itself as a number of IIS components. These components can be easily accessed from within the Active Server Pages (ASP) environment and manipulated using server-side VBScript, Javascript or Perl. Given the component nature of the environment , scripting search pages takes the form of glueing these components together to create the required functionality. Developing the Hybrid Library Projects Search Engine web interface took approximately two days. Such development may take longer in other contexts. (BUILDER already has significant ASP experience.)
This example VBScript code snippet performs a query and returns result as an array for display within a web page. This code snippet represents about 90% of the code involved in implementing the Hybrid Library Projects Search Engine.
|
# get querystring ' Create the Query Object, and set properties for the search. ' Execute the query and create the recordset holding the search results. ' Generate response. ' construct records ' return to search page | ||
| Figure 2: Code Fragment |
If you have any comments on this article, please contact the editors (exploit-editor@ukoln.ac.uk).
Ian Upton
Technical Development Officer
Information Services
Main Library
The University of Birmingham
Birmingham
URL: <http://builder.bham.ac.uk/>
URL: <http://www.idsolutions.co.uk/ian/>
Email: i.p.upton@bham.ac.uk
Tel: +44 0121 414 6380
For citation purposes:
Ian Upton, "BUILDER: the Hybrid Library Projects Search Engine",
Exploit Interactive, issue 3, 25 October 1999
URL: <http://www.exploit-lib.org/issue3/builder/>
|
Issue Home | Editorial | Features | Regular Columns | News and Events | Et cetera | ||
|
| ||
| Go to Top |
A UKOLN Service. Contact Us. Copyright © 1999-2006
|
Last Updated: 25 October 1999 |