Exploit Interactive HomeHomeSearch
Issue CoverEditorialFeaturesRegular ColumnsNews and EventsEt cetera

The DIEPER Project (DIgitised European PERiodicals)

Thanos Massias introduces the DIEPER project.

Introduction

DIEPER logo Digitising of existing printed material (retrodigitising) has become a very important topic in recent years. Advances in the relevant technologies have made the task more affordable in terms of cost, time and labour intensity. On the other hand the demand for digitised material is growing rapidly and it is anticipated that this trend will continue. As a consequence, many initiatives have been undertaken on this field by various parties, including publishers, libraries and various institutions.

An important fact to note is that there is no standard approach to the digitisation of printed material. This is true not only for the technical aspects (e.g. digitising equipment, output formats, type of storage, methods of access) but also for things like the motivation and objectives, the selection of the original material, the targeted audience, and so on.

Why are users interested in digitised documents? There are a number of reasons:

It is easier to locate wanted material
The means of indexing and searching electronic information are now very sophisticated. Depending upon the digitising approach, considerable information is usually available. In many cases the full text, or at least a considerable portion of it, is also available. Therefore very powerful search mechanisms can be employed. The interface to those mechanisms usually works over a local area network or the Internet, making it possible for the user to use the facilities remotely. In fact in many cases where access to the digitised material is restricted to a special group of users (e.g. the faculty members of a university or the customers of a publisher) the search capability may have unrestricted access.
The access is faster and easier
The physical presence of the user at the place where the material is held is no longer required. A user can remotely access the electronic form of the material in a very short time. The explosive growth of the Internet has given a whole new perspective to this. Even if the material is not accessible through the Internet but rather through a local network, retrieving the digitised material is usually easier and faster than retrieving the printed prototype. The enhanced search capabilities we discussed above and the ease of storage and reproduction in electronic and/or printed form are some of the reasons. Apart from this retrodigitising is a way to access rare materials which are difficult to be loaned if at all. Finally, an electronic document is always available, unlike a printed one which might be on loan, be misplaced somewhere in the library, etc.
The retrieved material is more usable
We already mentioned storing and reproduction, but electronic documents can offer even more. Depending on the format of a retrieved electronic document, users may be able to incorporate parts of it (e.g. portions of text, images, figures) in their documents. Of course this is a side effect that is not always desirable by the providers of the electronic documents or their authors. The same applies to the reproduction of the material.
More services are offered
An electronic document can be more content rich than the printed counterpart. For example useful shortcuts (hypertext, hyperlinks) may be provided, thus making it easy to locate the references, be informed about the authors or citations or be redirected to other relevant documents. Databases or electronic dictionaries (e.g. etymological, technical, terminology) could be linked with them. Automatic translation to other languages or to Braille is another possibility. Hypermedia (video, sound, etc.), software, datafiles etc can also be incorporated. Most of the above are not commonplace yet but the fact is that providers of electronic information are in a path of extending the content and services associated with the final product.

Why are libraries interested in digitised documents? We can note the following:

Services to the users are improved
As we discussed above a user can benefit in quite some ways by using the digitised versions and the facilities and services that come along with that. But there is more to it. The fact that the users undertake the document location and retrieval process, relieves the personnel of the library from the associated workload. If remote access to digitised material is provided then the library also gets less crowded.
Better handling of the collection of printed material
There is no need for multiple printed copies of material. The cases of material been lost or damaged resulting in the need to re-acquire it are eliminated. Labour intensive tasks, like the conventional loan process or the need for restoring the items at their positions afterwards, are also cut down. Another important parameter for many libraries is that the extraction of usage statistics is also easier in this case. It becomes easy to find out what is and what is not of interest to the users just by recording access data. This can lead to a more rational structure of the collection.
Preservation of the originals
Printed material is subject to aging, damage, theft, etc. This usually causes strict access policies to the holdings of the library. In fact rare items are usually unavailable to most users. Retrodigitising eliminates the wear of the prototypes and unlike other ways of providing copies of the prototypes guaranties that the fidelity of the reproduction will also be unaltered by time. actualy microforms are still considered the best way for long-term preservation but they are more difficult to handle and the process of finally retrieving the required document is usually slow and expensive.

It is time to see how DIEPER fits into the picture. The DIEPER partners are convinced that digitising is going to continue at an increasing rate. Especially when it comes to Europe, it should be pointed out that although much hard work is done in this area, the amount of coordination is low. Many different initiatives of varying scales are undertaken by the private and public sector at levels from local to European. Thus the whole effort is not very well coordinated, resulting in overlapping, poor inter-operability and other unwanted effects. This situation is expected to improve in the future. In the meantime there are things which should be done:

Those are exactly the points that DIEPER intends to address. To keep the program as realistic as possible, a hands-on approach was selected. The entire process of digitisation will be explored by focusing on a carefully selected set of European periodicals.

Technical concept

The objectives of the DIEPER project are:

Building a virtual library of periodicals
A central access point for all digitised periodicals will be devised as a register built on the model of the European Register of Microform Masters. Records of the register will be linked to reliable and comprehensive archives of periodical literature working at different sites throughout Europe. This means that only location information along with some cataloguing information will be recorded for each item. The actual digital documents will remain at their respective locations and not be mirrored. In addition a search engine accessible will allow to do a keyword search against as much of the text (full text or tables of contents) of the digitised periodicals as it is possible.
Provision of (originally) printed journals in electronic format on the Internet
To provide the users with retrospective digitised periodicals an optimised document management system will be developed in the course of the project.
Digitisation (image capturing) of the printed material
Digitisation (400-600 dpi, 1-8 bit) of the paper material using a special book scanner or from microfilm using a microfilm scanner. Input of basic bibliographic data to the relevant categories of the TIFF header. Storage as image files in the TIFF format in the highest quality as the digital master file, and archiving on optical storage media (CD-R, DVD).
Conversion of the digitised material (in part) to searchable full text
This conversion will be done by OCR software without further intellectual correction of the text material. For documents printed in gothic letters, it is expected that current OCR technology will not be sufficient. Preparation of a text file for full text search. It is not planned to show the full text to the user. Preparation of tables of contents files.
Description of the document structure
Description of the physical document structure using an XML-based DTD (eXtensible Markup Language, Document Type Definition). This XML description of a period contains the document (images, full text) together with its bibliographic, document structure, article structure, page numbering, index, and TIFF header information applying the RDF (Resource Description Framework) model.
Provision of access to the articles via library i.e. international registers

Schedule

The DIEPER project started in November 1998 and will last 26 months. It is managed and co-ordinated by the State and University Library of Lower Saxony (Goettingen/Germany).

Project Partners

The following partners participate in the DIEPER project:

Reader Response

If you have any comments on this article, please contact the editors (exploit-editor@ukoln.ac.uk).

Further Information

Author Details


Thanos Massias
Dipl. Mechanical Engineer
Library & Information Service
University of Patras
Greece

Email: thanos@lis.upatras.gr

For citation purposes:
Massias, T,"The DIEPER Project (DIgitised European PERiodicals)", Exploit Interactive issue 4, January 2000
<URL: http://www.exploit-lib.org/issue4/dieper/>


[HTML Validation] - [Accessibility check]