The DIEPER Project (DIgitised European PERiodicals)
Thanos Massias introduces the DIEPER project.
Digitising of existing printed material (retrodigitising) has become a very important
topic in recent years. Advances in the relevant technologies have made the task more
affordable in terms of cost, time and labour intensity. On the other hand the demand for
digitised material is growing rapidly and it is anticipated that this trend will continue.
As a consequence, many initiatives have been undertaken on this field by various parties,
including publishers, libraries and various institutions.
An important fact to note is that there is no standard approach to the digitisation of
printed material. This is true not only for the technical aspects
(e.g. digitising equipment, output formats, type of storage, methods of access)
but also for things like the motivation and objectives, the selection of the original
material, the targeted audience, and so on.
Why are users interested in digitised documents? There are a number of reasons:
- It is easier to locate wanted material
- The means of indexing and searching electronic information are now very sophisticated.
Depending upon the digitising approach, considerable information is usually available.
In many cases the full text, or at least a considerable portion of it, is also available.
Therefore very powerful search mechanisms can be employed. The interface to those mechanisms
usually works over a local area network or the Internet, making it possible for
the user to use the facilities remotely. In fact in many cases where access
to the digitised material is restricted to a special group of users
(e.g. the faculty members of a university or the customers of a publisher)
the search capability may have unrestricted access.
- The access is faster and easier
- The physical presence of the user at the place where the material is held is no longer
required. A user can remotely access the electronic form of the material in a very short time.
The explosive growth of the Internet has given a whole new perspective to this.
Even if the material is not accessible through the Internet but rather through a local network,
retrieving the digitised material is usually easier and faster than retrieving the printed
prototype. The enhanced search capabilities we discussed above and the ease of storage
and reproduction in electronic and/or printed form are some of the reasons.
Apart from this retrodigitising is a way to access rare materials which are
difficult to be loaned if at all. Finally, an electronic document is always available, unlike
a printed one which might be on loan, be misplaced somewhere in the library, etc.
- The retrieved material is more usable
- We already mentioned storing and reproduction, but electronic documents can offer even more.
Depending on the format of a retrieved electronic document, users may be able to incorporate
parts of it (e.g. portions of text, images, figures) in their documents. Of course this
is a side effect that is not always desirable by the providers of the electronic documents
or their authors. The same applies to the reproduction of the material.
- More services are offered
- An electronic document can be more content rich than the printed counterpart.
For example useful shortcuts (hypertext, hyperlinks) may be provided, thus making
it easy to locate the references, be informed about the authors or citations or be redirected
to other relevant documents. Databases or electronic dictionaries (e.g. etymological,
technical, terminology) could be linked with them. Automatic translation to other
languages or to Braille is another possibility. Hypermedia (video, sound, etc.), software,
datafiles etc can also be incorporated. Most of the above are not commonplace yet but the
fact is that providers of electronic information are in a path of extending the content
and services associated with the final product.
Why are libraries interested in digitised documents? We can note the following:
- Services to the users are improved
- As we discussed above a user can benefit in quite some ways by using the digitised versions
and the facilities and services that come along with that. But there is more to it. The fact
that the users undertake the document location and retrieval process, relieves the personnel
of the library from the associated workload. If remote access
to digitised material is provided then the library also gets less crowded.
- Better handling of the collection of printed material
- There is no need for multiple printed copies of material. The cases of material been
lost or damaged resulting in the need to re-acquire it are eliminated. Labour intensive
tasks, like the conventional loan process or the need for restoring the items at their
positions afterwards, are also cut down. Another important parameter for many libraries
is that the extraction of usage statistics is also easier in this case. It becomes easy
to find out what is and what is not of interest to the users just by recording access data.
This can lead to a more rational structure of the collection.
- Preservation of the originals
- Printed material is subject to aging, damage, theft, etc. This usually causes strict
access policies to the holdings of the library. In fact rare items are usually
unavailable to most users. Retrodigitising eliminates the wear of the prototypes
and unlike other ways of providing copies of the prototypes guaranties that the
fidelity of the reproduction will also be unaltered by time. actualy microforms
are still considered the best way for long-term preservation but they are more difficult
to handle and the process of finally retrieving the required document is usually slow
It is time to see how DIEPER fits into the picture. The DIEPER partners are convinced
that digitising is going to continue at an increasing rate. Especially when it comes to
Europe, it should be pointed out that although much hard work is done in this area,
the amount of coordination is low. Many different initiatives of varying scales are
undertaken by the private and public sector at levels from local to European. Thus the
whole effort is not very well coordinated, resulting in overlapping, poor inter-operability
and other unwanted effects. This situation is expected to improve in the future. In the
meantime there are things which should be done:
- Provide a consistent central access point to existing digitised material
- Create a solid set of guidelines for future digitising projects.
Those are exactly the points that DIEPER intends to address. To keep the program as
realistic as possible, a hands-on approach was selected. The entire process of digitisation
will be explored by focusing on a carefully selected set of European periodicals.
The objectives of the DIEPER project are:
- Building a virtual library of periodicals
- A central access point for all digitised periodicals will be devised as a register built
on the model of the European Register of Microform Masters. Records of the register will
be linked to reliable and comprehensive archives of periodical literature working at different
sites throughout Europe. This means that only location information along with some cataloguing
information will be recorded for each item. The actual digital documents will remain at their
respective locations and not be mirrored. In addition a search engine accessible will allow
to do a keyword search against as much of the text (full text or tables of contents)
of the digitised periodicals as it is possible.
- Provision of (originally) printed journals in electronic format
on the Internet
- To provide the users with retrospective digitised periodicals an optimised document
management system will be developed in the course of the project.
- Digitisation (image capturing) of the printed material
- Digitisation (400-600 dpi, 1-8 bit) of the paper material using a special book scanner
or from microfilm using a microfilm scanner. Input of basic bibliographic data to the relevant
categories of the TIFF header. Storage as image files in the TIFF format in the highest quality
as the digital master file, and archiving on optical storage media (CD-R, DVD).
- Conversion of the digitised material (in part) to searchable
- This conversion will be done by OCR software without further intellectual correction of the text material.
For documents printed in gothic letters, it is expected that current OCR technology will not be sufficient.
Preparation of a text file for full text search. It is not planned to show the full text to the user.
Preparation of tables of contents files.
- Description of the document structure
- Description of the physical document structure using an XML-based DTD
(eXtensible Markup Language, Document Type Definition). This XML description of a period contains the document (images, full text)
together with its bibliographic, document structure, article structure, page numbering, index, and TIFF header
information applying the RDF (Resource Description Framework) model.
- Provision of access to the articles via library i.e. international
The DIEPER project started in November 1998 and will last 26 months.
It is managed and co-ordinated by the State
and University Library of Lower Saxony (Goettingen/Germany).
The following partners participate in the DIEPER project:
- Copenhagen/Denmark, Royal Library
- Frankfurt/Germany, ABC-Datenservice (subcontractor)
- Goettingen/Germany, State and University Library of Lower Saxony
- Graz/Austria, University Library/Karl-Franzens-University
- Heidelberg/Germany, Springer-Verlag Ltd & Co. KG
- Helsinki/Finland, Helsinki University Library/University of Helsinki
- Leuven/Belgium, University Library/Catholic University of Leuven
- Paris/France, University Library/Rene Descartes - Paris V University
- Patras/Greece, Library & Information Service/University of Patras
- Siena/Italy, Department of Information Engineering/University of Siena
- Tartu/Estonia, University Library/Tartu University
If you have any comments on this article, please contact the editors
Dipl. Mechanical Engineer
Library & Information Service
University of Patras
For citation purposes:
Massias, T,"The DIEPER Project (DIgitised European PERiodicals)", Exploit Interactive issue 4, January 2000
[HTML Validation] -