

Günter Mühlberger introduces the LAURIN Project in which 18 partners in 7 European countries are producing a modular system that, on the one hand, provides newspaper clipping archives with digital technology and, on the other hand, users with on-line access to these fine collections. The project is co-founded by the European Union and sponsoring partners. LAURIN is co-ordinated by the University of Innsbruck. Its official duration is from May 1998 to August 2000.
For more than 100 years an enormous number of libraries, archives, companies, NGOs and governmental institutions have been collecting newspaper clippings and maintaining clipping archives[1]. With the emergence of online-editions of newspapers and networks of such electronic editions some of them have closed their paper-based collections. The newspaper publishing houses themselves have ceased to clip their newspapers. But most of the other institutions still maintain their paper-based clipping collection, and they will continue to do so in the future. The reasons for this are simple: their users are not only interested in the pure text of a newspaper article but demand a higher level of authenticity. The typography, the layout, the pictures that come with it, the impression of the whole page - all this information is lost in the full-text database. As long as print editions are the main distribution media of newspapers, it will make sense to collect clippings of the print edition, because they have this added value of information - either by cutting them out manually, or by scanning them and cutting them electronically. But this is not the only reason why we are optimistic about the future of clipping archives. Another reason lies in the general role research libraries will play in the near future: The information society of the future will not suffer from a lack but from a surplus of information. Distinguishing relevant from non-relevant information will, therefore, be one of the main tasks of librarians in the future. Since the very beginning it has been the job of archivists to select from a wide range of newspapers specific articles related to the subject of the collection. When accessing a clipping archive of one of our consortium members, such as the clipping archive of the University Library of Uppsala [2], one can be sure that the archive has recorded every important article on Swedish politics, culture and contemporary history from all Swedish newspapers - and this simple fact gives the user an enormous amount of confidence. He can be sure that, in principle, all of the more than 3 million articles are "relevant" articles that have been selected by specialists. This argument becomes even more important if we do not think in months or years, but in decades. Searching a database containing the full text of all Swedish newspapers from 1990 to 2010 will produce such a huge number of hits that working with such a database will be a task for information specialists only. The majority of users, who may be able to use a database but who are no specialists, will prefer a system where librarians have reduced complexity to a degree which allows non-professional users to access relevant information in a comfortable way.
These considerations have been the starting point for the LAURIN project: It has been initiated in order to equip paper-based clipping collections with the main tools for doing their job in a digital environment. The project pursues three objectives:
The LAURIN project started in May 1998 and will last until August 2000. It is managed and co-ordinated by the Department of German Language and Literature of the University of Innsbruck. The German Department hosts the Innsbrucker Zeitungsarchiv [3], one of the most comprehensive collections of newspaper clippings pertaining to literature and criticism in Austria, Germany, and Switzerland. Seven other clipping collections from Italy, Spain, Germany, Norway, Sweden and Finland are in the consortium as well. Their collections mainly comprise articles about cultural, political and economic issues. Altogether they boast a total of some million clippings from the end of the nineteenth century up to the present. Six archives will implement the LAURIN system and use it in their daily work. Some tens of thousands of articles will be recorded in the LAURIN database and accessible via the LAURIN OPAC. Depending on the outcome of our negotiations with copyright holders, users will, in principal, be able to view and print the electronic facsimiles of the articles. For the last phase of the project we have planned to set up a non-profit successor to the network. This organisation will devote itself to the management and expansion of the future LAURIN network, and to the task of supporting the associated archives in their work.
The overall LAURIN architecture provides a set of nodes connected by the Internet: one node for each of the participating libraries, plus a central node collecting data from local nodes and providing the end user with a uniform query environment. The central node hosts a relational database in which metadata from the local nodes are being stored, such as titles of the articles, author names, dates, newspaper names, and keywords. Local nodes are in charge of clipping, scanning, and indexing; in addition to that, they store all the information about the clippings available: index data, the OCR-processed full text, and the electronic facsimile (image) of the article. Whenever a user formulates a query, the central node uses the central data as a basis for its response involving local nodes only when specific full-text queries are issued. The central node is in charge of collecting the responses coming in from the local nodes and provides the user with the final result. The central node also contains a Z39.50 interface allowing it to operate as a Z39.50 server, exporting all LAURIN index data to existing library networks. Depending on the decisions of the library, each local node may be directly queried by the end users through a WEB interface and/or a Z39.50 interface. LAURIN clippings are linked with the LAURIN thesaurus, which is stored in the central and local nodes. There is a constant flow of information from local nodes to the central node, continuously updating the central database with new index data and new thesaurus entries. Periodically, the thesaurus administrator validates the proposed thesaurus entries (candidates), and the central node forwards these validations to each local node. The LAURIN-thesaurus is organised by concept. This is done by linking names stemming from different languages but expressing the same concept, and by identifying functional and useful relationships between these different concepts. Every single concept will have a unique key to it but will be represented by several names (including name string, normalised name string, language flag, and preferred flag). The thesaurus will also contain information about the relationship between concepts (e. g. broader term, related term, ...) and some administrative information (who changed/added what and when in the thesaurus). The LAURIN system not only provides the infrastructure but also contains some authority data. Among others, the Getty Thesaurus of Geographic Names [4] as well as the NUTS [5] code have been incorporated.
Apart from working on the detailed design of the overall architecture during the first year, the focus of the LAURIN project was on the workflow in the clipping departments. The Innsbruck Press Archive (IZA) can serve as an example of the typical workflow within a clipping department. Since the 1960s articles concerning literature and language have been clipped from about 35 daily and weekly Austrian, German, Italian, and Swiss newspapers. 120 articles are processed per day, amounting to an annual output of some 30,000 articles. To facilitate convenient filing, the articles are clipped and pasted onto A4 sheets. They are classified according to subject, copied and, if necessary, multiplied, in order to file all the relevant articles under different subject headings. The articles can be accessed locally, and they are delivered as paper copies by snail mail to off-site users. The standard workflow of an electronic clipping archive will be organised in a similar way and will consist of the following steps:
The standard workflow will be organised by using libClip 1.0 [6] clipping software (see Figure 1).
Figure 1: libClip 1.0: Distinction of Articles |
The scanning module of libClip enables us to scan newspapers with common scanners from A4 to A2 both in black and white as well as in grey scale. Grey scale scanning is indispensable for achieving good quality pictures and graphs. It is also a valuable feature used for retrodigitisation in order to improve image quality by deskewing and applying noise reduction tools. The clipping module comprises an automatic layout analysis of the source page. This automatic layout analysis selects articles on a page as well as textual objects within the article. These textual objects are: Title, subtitle, author, abstract, body text, picture, and caption. LibClip highlights the recognised objects, and the clipper corrects the suggested result. In connection with the automated OCR-processing of the article, the recognised textual objects are used to automatically transfer the main bibliographic data into the LAURIN local database system. In order to facilitate the viewing and printing of the clippings, the article is automatically arranged on an A4 page. This target page contains the logo of the newspaper, related bibliographic data, address of the clipping archive and, in case the archive uses a large-format scanner, thumbnails of the source page. In the end, three outputs of the clipping process are exported to the LAURIN local database system: (1) The electronic facsimile of the article, (2) the OCR-processed full text of the article, and (3) the prime and bibliographic indexes (metadata). Adhering to common standards, the electronic facsimile will be scanned at 3-400 dpi and stored in TIFF (CCITT Group 4). Having exported the article to the LAURIN local database, the clipping process is completed. The last step of the acquisition process consists in finishing the indexing. This is done through a separate user-interface, provided by the LAURIN local database in connection with the LAURIN central database. In a first step, the LAURIN local node database matches the OCR-processed full text of the article with the controlled vocabulary of the thesaurus. A list of matching keywords is produced, and the indexer has to confirm, or to delete, the suggested keywords. In a further step the indexer can choose adequate thesaurus entries, and to connect them with the article. If the thesaurus contains no suitable entry for the article, the indexer may use freely chosen vocabulary (in his own language) as well. This free index will be translated into English and updated regularly by the thesaurus administrator. After having indexed the article, the librarian confirms the data (bibliographic data and keywords) in a one-stop quality check, and the article is finally stored.
The LAURIN data will be accessible in different contexts. First of all, the LAURIN OPAC will provide access to the metadata of all newspaper articles processed in the course of the project. At least two kinds of interfaces will provide occasional, as well as expert, users with the possibility to retrieve the LAURIN records. Access to the collection will be supported not only by providing a set of different search fields but also by the multilingual thesaurus. In fact, the thesaurus will be one of the main tools to navigate through the information space of the LAURIN database. Since all keywords of the LAURIN thesaurus are also represented in English, users will be able to search and find interesting articles even if they have only a rudimentary knowledge of the English language.
In addition to the LAURIN central database, the LAURIN data will be accessible via a z39.50 gateway as well. This feature facilitates the integration of the LAURIN data into established library networks.
Apart from the electronic catalogue, the LAURIN system will also be able to provide users with the electronic facsimile of the articles. Obviously, this feature is closely related to copyright issues. The LAURIN project is affected by copyright law in two respects: (1) the digitisation of newspaper articles and (2) the making available of these articles to the public via an electronic network. In the case of reproducing the articles, it has to be seen that most European countries have put special limitations on their copyright acts in favour of libraries and archives. The European directive on copyright, which will probably be ratified by the European Parliament in 2000, follows these traditions and allows member states to set up limitations to the reproduction right. Article 5(2)c says that "member states may provide for limitations to the exclusive right of reproduction [...] in respect of specific acts of reproduction made for archiving or conservation purposes by establishments which are not for direct or indirect economic or commercial advantage" [7]. It is very likely that countries that already have similar regulations will implement this article. After the ratification of the directive, the digitisation of newspaper clippings for archival and preservation purposes should therefore be possible in most European countries.
The delivery of the electronic facsimiles of the articles via the LAURIN network, however, is a much more complicated matter. At least three aspects have to be taken into account: (1) Existing library privileges, (2) the differences between closed and still growing collections, and (3) the question of "who is the real owner of a (paper-based) digitised newspaper article?";
Similar to the reproduction right, many European countries make certain exceptions for the use of copyright protected material for educational and research purposes. The EU copyright directive also follows these exceptions, and in Article 5(3)a it is said that "Member States may provide for limitations" in the case of "use for the sole purpose of illustration for teaching or scientific research". In the comments on the article it is said that member states are allowed "to exempt the use of a work [...] for instance for a compilation of an anthology" and that the article "might also serve to exempt certain uses in the context of on-demand delivery of works" (comments to article 5, no. 8, p. 40). From this point of view, it seems to be quite clear that, for instance, the use of scanned and archived clippings for producing an electronic online schoolbook that is accessible only to teachers and students may be possible in those countries which will implement this article. Nevertheless, the objective of the LAURIN network is that the whole database is accessible to researchers, teachers, and students. Therefore, separate solutions on the basis of license agreements will be necessary to guarantee broader access to the articles. But in order to enter into negotiations one has to find the copyright owner of an article - and this might be more difficult than one thinks at first glance. Finding the addressee for the licensing agreement is a question of rights management, and in this context the difference between closed and still growing collections is considerable. The two main characteristics of closed newspaper clipping collections are that (1) the digital kind of use was unknown at the time when the articles were published and (2) it is impossible to identify every single author of the articles collected. It is obvious that a contract between a publisher and a journalist, negotiated in the 1960s, covered only the right to publish that article in paper form [8]. The digitisation and distribution of this article via an electronic network is a new type of distribution and needs, therefore, the separate agreement of the originator. But how are we to find the authors of tens, or even hundreds, of thousands of articles from the 1920s up to the present? The problems that arise may be exemplified by the collection of the National Library of Norway. The National Library holds some 700,000 articles on Norwegian people published in Norwegian newspapers from the 1920s to 1985. These articles, mainly written on the occasion of birthdays, obituaries, advancements, etc., have been created very rarely by regular journalists writing for those newspapers, but more often by free-lancers and even friends of the people in question, and some of the articles do not contain the name of their author. Even if the library spares no effort to find the originators of the articles, there will always be authors (or heirs) that cannot be traced. Do we necessarily have to conclude from this that future use of the material will be impossible? We do think that there is a feasible solution which might also be of interest to similar collections, such as archives of photographs, illustrations, etc.: More than one hundred years ago Reproduction Rights Organisations have been founded in order to solve the problem of collective rights management [9]. They are a link between the large number of copyright owners and an enormous number of users. The German collecting society for authors, VG-WORT, has recently got the right to license collected editions, such as newspapers, journals, anthologies, etc., which were published at a time when their digital rebirth could not possibly have been forseen. The LAURIN team is therefore optimistic that other RROs will follow suit and that our chances of receiving a license for the retrospective digitisation of newspaper articles throughout Europe will increase in the future.
When considering clipping collection that are still growing, however, we are confronted with a completely different situation. This is particularly due to the new role of newspaper articles after the digital revolution. It is only recently that, under the influence of the Internet, newspaper articles have become a completely new product. Not too many years ago they were just written for the day, and their value diminished after a very short time. By contrast, the Internet permits various kinds of re-publication of electronic newspaper articles. Especially online editions of newspapers, online archives, networks of newspapers, homepages of electronic bookshops, companies, and information services, appreciate the availability of news articles. Obviously, the writers and publishers of these articles are fascinated by the new possibilities and have a strong interest in the exclusive exploitation of the new value of newspaper articles. On the other hand, the "new" newspaper article and the possibilities of its dissemination through the Internet are more akin to the broadcasting of music songs on the radio than to the traditional form of paper publishing. And what would Radio be like if songwriters had to exploit their songs without the support of collecting societies? It is quite likely that collecting societies will play an important role in the retrodigitisation of clippings, and they might play a similar role in the case of current articles, too [10]. There is a strong need to compare various possibilities in order to solve this problem. The LAURIN project will therefore organise a public copyright meeting in spring 2000 to discuss various approaches.
In the course of the next few months, the LAURIN clipping software and the local node software will be installed and validated at the participating libraries and archives. In January 2000 this process will be finished, and a first prototype of the LAURIN OPAC will be available on the Internet. During the demonstration phase of the project, new partners will be welcome to join the consortium.
If you have any comments on this article, please contact the editors (exploit-editor@ukoln.ac.uk).
Günter Mühlberger
Project manager
Department of German Language and Literature
Innsbrucker Zeitungsarchiv
University of Innsbruck, Austria
Email: guenter.muehlberger@uibk.ac.at
URL: http://laurin.uibk.ac.at/
Günter Mühlberger, project manager of the LAURIN project, is working at the Department of German Language and Literature of the University of Innsbruck. He is involved in several national digitisation projects (Austrian literature online, Censorship in Austria - 1792-1848). Further information is available at: <http://germanistik.uibk.ac.at/germ/leute/muehlberger.html> |
For citation purposes:
Günter Mühlberger, "Newspaper Clippings in a Digital World: The LAURIN Project,"
Exploit Interactive, issue 2, 20 July 1999
URL:
<http://www.exploit-lib.org/issue2/laurin/>
|
Issue Home | Editorial | Features | Regular Columns | News and Events | Et cetera | ||
|
| ||
| Go to Top |
A UKOLN Service. Contact Us. Copyright © 1999-2006
|
Last Updated: 20 July 1999 |