Exploit Interactive HomeHomeSearch
Issue CoverEditorialFeaturesRegular ColumnsNews and EventsEt cetera

Web Technologies: The Development Of Web Protocols And Formats

Many project developers will be making use of the Web, either as a integral part of the project or to provide information about the project. In a regular column on Web Technologies Brian Kelly reviews developments of Web protocols and formats. The article is intended to provide background information for projects which are about to make decisions on appropriate Web technologies to use.

Introduction

As all Web developers will know, the Web originally consisted of three architectural components:

  1. Data Formats: HTML
  2. Addressing: URLs
  3. Transport: HTTP

How have these developed since Tim Berners-Lee, the father of the Web, began development of the web in the early 1990s?

HTML Developments

HTML (Hypertext Markup Language) originally provided a simple mechanism for defining some simple structural elements common to many documents, such as paragraphs (the <P> element), headings (<H1>, <H2>, etc.), and simple character formatting (bold, italics, etc.)

As the Web grew in popularity during the mid 1990s we began to see a battle for market share in the browser market, with Netscape and Microsoft releasing numerous HTML extensions (<FONT> and <CENTER> and, more controversially, <BLINK> and <MARQUEE>, etc.).

Although the frequent release of new browsers with new functionality (including not only support for extended HTML tag sets but also improved bookmark managers, email and news interfaces, etc.) initially appealed to many Web users, as the set of HTML extensions began to grow concerns were expressed over (a) the loss of browser independence, (b) the difficulties of developing authoring tools which could support new elements, (c) the dangers of using proprietary rather than open standards and (d) the architectural flaws in many of the extensions, with their role in defining the appearance of Web resources, rather than the underlying document structure.

Fortunately pressures from commercial companies which were making large-scale use of the Web together with, no doubt, an awareness from the browser vendors of the architectural flaws in their approachs to extending HTML, seems to have had some success. Both the major browser vendors have now stated their commitment to two important developments in the area of data formats: Cascading Style Sheets and XML.

Cascading Style Sheets (CSS)

Cascading Style Sheets (CSS) have been developed to complement HTML. HTML, as an SGML application, was originally intended to define the structure of a document. CSS provides a mechanism for describing how a HTML document should appear. Although CSS level 1 was not widely deployed (due to the failure of Netscape, the browser vendor with the largest market share, to support it) CSS level 2 [1] is now supported (although admittedly not fully) by version 4 of Netscape and Internet Explorer.

The main advantages of using CSS rather than using HTML elements to describe the appearance of a page are:

XML

Although HTML 4.0 and CSS 2.0 are the currently recommended data format standards for the Web, they have their limitations. Adding new elements to HTML can be very time-consuming, unless you are Netscape or Microsoft - but as we have seen, introduction of new elements by browser vendors is unpopular. In addition there are many elements which we would not expect to become part of a future HTML standard. Mathematical elements, for example, (<INTEGRAL>) are too subject-specific. There are innumerable application-specific examples which could also be given (such as <STAFF-NUMBER> or <PART-NUMBER>).

XML, the Extensible Markup Language, addresses HTML lack of extensibility. With XML arbitrary new elements can be defined, as illustrated in Figure 1.

<MEMO>
<FROM>Jo Smith</FROM>
<TO>Hans Schmidt</TO>
<SUBJECT>XML</SUBJECT>
<CONTENT>Have you read the latest news about XML?
It seems <STRONG>very</STRONG> interesting!

</CONTENT>
</MEMO>
Figure 1: Example of an XML Document.

XML appears to have a great deal of momentum behind it, as can be seen by looking at web sites such as XML.COM [2], The SGML/XML Web Page [3], the XMLINFO.COM [4] and XMLSOFTWARE.COM [5] pages, W3C's Extensible Markup Language (XML) pages [6] and Peter Flynn's Frequently Asked Questions about the Extensible Markup Language [7].

Other Data Format Developments

HTML 4.0 and CSS 2.0 would appear to provide the recommended data formats for today, with a watching brief needed for XML - which could have a role to play for storing structured resources, which are delivered in HTML/CSS format. What other data formats developments may be relevant?

XLink [8] and XPointer [9] are proposals which will provide richer hypertext functionality for XML resources. As described in the What Are .. XLink and XPointer article published in the Web version of Ariadne [10], XLink is intended to provide richer forms of hyperlinking (e.g. to allow hyperlinked resources to be embedded in the document or displayed in a new window as well as replace the existing document, which the <A> element can do in HTML). XPointer will enable arbitrary portions of an XML resource to be processed, such as link to the fourth paragraph in the second sentence.

In the area of graphics, we are seeing many developments including WebCGM [11], HGML [12] and PNG [13]. The W3C's User Interface domain is active in coordinating such developments and has released an activity statement [14]. A Scalable Vector Graphics Working Group has been set up which has produced a document on Scalable Vector Graphics Requirements [15]. In the related area of multimedia the main development has probably been the release of the SMIL specification [16].

Although not a data format DOM, the Document Object Model [17], defines a object model for HTML, CSS and XML which will enable elements, attributes and content to be manipulated by client-side languages such as JavaScript. Note that the term DHTML (Dynamic HTML) is sometimes used to refer to use of the DOM.

Addressing

Resources on the web are referred to by use of a URL - Uniform Resource Locator [18]. For example the URL <http://www2.echo.lu/oii/en/oii-home.html> (normally) refers to a document held on a computer with the domain name www2.echo.lu with the file name oii-home.html which is stored in a directory /oii/en/ beneath the web root directory and which is accessed using the http protocol.

A major problem with URLs is that they confuse the name of a resource with its location. We are familiar with the distinction in the library world between an ISBN, which identifies a document, and an accession number, which defines the location of a document within a library. In the Web, however, there is no way of, for example, easily referring to multiple locations of a resource.

Uniform Resources Names (URNs) [19] are a proposed mechanism for identifying a resource. The resource identified by a URN may reside in one or more locations, may move, or may not actually be available at a given time. The URN, has two interpretations, the first is as a globally unique and persistent identifier for a resource (achieved though an institutional commitment) that is accessible over a network; the second is as the specific "URN" scheme which will embody the requirements for a standardised URN namespace.

The Document Object Identifier (DOI) is an example of a proposed URN application. The goals of the Digital Object Identifier (DOI) system [20] are to provide a framework for managing intellectual content, link customers with publishers, facilitate electronic commerce, and enable automated copyright management. The components of the DOI system are: (A) An Identifier which consists of a prefix that is assigned to a publisher by a registration authority and a suffix that is assigned by the publisher (note that publishers may choose to use existing international standard identifiers, for example, ISBN numbers for books); (B) A directory which forms the basis for a resolution system (the directory is centralised and provides the mapping of DOIs to URLs) and (C) A database by which detailed information on an object may be maintained by the publisher.

Transport

HTTP, the HyperText Transfer Protocol, governs the transfer of resources between a web browser and a web server. Typically clicking on a hypertext link in a web browser will sent a HTTP GET request to the server. The server will respond by send requested resource (if it exists) together with a series of headers.

Although HTTP/1.0 was used during the exponential growth of the web in the mid 1990s, there are a number of problems with it:

HTTP/1.1 [21] has been designed to address the deficiencies and to fix a number of bugs in HTTP/1.0. The HTTP/1.1 specification provides support for multiple TCP connections and improved support for caching.

Although HTTP/1.1 provides performance benefits, it is still not scalable and it has not been designed to be extensible. HTTP/NG, the Next Generation of HTTP, [22] is a complete redesign of HTTP, which addresses these deficiencies.

Metadata

The original version of the Web, which was developed by Tim Berners-Lee at CERN in the early 1990s was based on the three architectural components described above: Data Format (HTML 1.0, which has developed to HTML 4.0 and is now complemented by CSS and XML), Transport (originally HTTP/0.9, and now HTTP/1.1 with development work on HTTP/NG) and Addressing (originally URLs).

During the mid 1990s much development work began to take place in areas such as content filtering, digital signatures, enhanced navigation of web sites, resource discovery and web collections. This work included:

PICS
The development of PICS [23], coordinated by the W3C, as a means of filtering undesirable content, such as pornography.
MCF
The development of MCF [24] by Apple. MCF, the Meta Content Framework, was intended to provide a variety of enhanced user interfaces to websites.
Dublin Core
The development of the Dublin Core set of attributes to facilitate discovery of resources on the Web [25].
Web Collections
A proposal submitted by Microsoft to the W3C to provide a mechanism for grouped collections of related Web resources [26]

As it became apparent that these developments were all related to metadata the World Wide Web Consortium set up a group to coordinate these developments [27].

Influenced by work such as Dublin Core, PICS and MCF, the Metadata Coordination Group developed a framework for metadata developments known as the Resource Description Framework (RDF) [28]. The importance of RDF to the Web infrastructure is illustrated in Figure 2.

Metadata, the missing architectural component from the web
Figure 2: Metadata, the missing architectural component from the web

RDF Applications

Although the RDF specifications are relatively new, we are already seeing a number of other specifications and applications being developed using RDF. As illustrated in Figure 3 (taken from "W3C Data Formats" [29]) W3C regard RDF as the key metadata format.

The Role of RDF in the Data Format Architecture for the Web
Figure 3: The Role of RDF in the Data Format Architecture for the Web

A number of example of uses of RDF are given below.

Mozilla Support For RDF
Figure 5: Mozilla Support For RDF

In addition to these examples, we can expect to see a number of other specifications based on RDF being developed. For example, W3C are likely to be producing specifications in the areas of digital signatures and privacy.

What Next?

This article has given a brief introduction to recent developments to the web architecture, including developments to data formats, transport, addressing and metadata. A complete description of new web procotols and formats has not been given and areas such as electronic commerce, internationalisation, privacy and other social aspects (including accessibility issues) have not been addressed.

The article also does not address deployment issues. How can new protocols and formats, which are needed in order to develop richly functional and efficient services, when the user community is largely using older browsers? And without the widespread deployment of the latest generation of browsers, there seems to be little motivation for service providers to make use of the technologies described in this article. These issues will be addressed in the next edition of Exploit Interactive.

Finding Out More About Standards For Project

No doubt many projects funded by the European Union carry out surveys of standards relevant to their work, as well as European and national funding bodies. A quick survey revealed the following:

European Commission's Open Information Interchange service
The Open Information Interchange (OII) provide a valuable service in providing overviews of existing and emerging standards related to the exchange of information. in electronic form. See <http://www2.echo.lu/oii/en/oii-home.html>
BIBLINK
Project BIBLINK is funded by DG XIII/E-4 under the Telematics Application Programme of the European Union Fourth Framework Programme. It aims to establish a electronic link between national bibliographic agencies and publishers of electronic material, in order to establish authoritative bibliographic information that will benefit both sectors.
The first phase of the project included delivery of reports on Metadata Formats, Identification and Transmission of Data. See <http://www.ukoln.ac.uk/metadata/biblink/>.
DESIRE
DESIRE: Development of a European Service for Information on Research and Education is a large project funded by the Telematics for Research Sector of the Fourth Framework Programme of the European Union. The project is looking at Web technology and the implementation of pilot information services on behalf of European researchers.
The first phase of the project included delivery of reports on resource discovery including A review of metadata: a survey of current resource description formats and The role of classification schemes in Internet resource description and discovery. See <http://www.ukoln.ac.uk/metadata/desire/>.
PIPER
Project PIPER specialises in providing the free tools and required by EC TAP projects to convert their deliverables into HTML. Their Report D12.1, Study of Web-Based Dissemination reviewed identification developments, such as URNs and DOIs. See <http://piper.ntua.gr/reports/r6.html>.
eLib
The UK's eLib Programme is involved in the development of a wide variety of services related to electronic libraries. A document of relevent standards is available at: <http://www.ukoln.ac.uk/services/elib/papers/other/standards/>.

In addition to these reports, a excellent book on web standards is "Wilde's WWW: Technical Foundations of the World Wide Web" [34]. Further details about the book can be obtained from Springer's [35] or Amazon.com's [36] web sites.

Reader Response

If you have any comments on this article, please contact the editors (exploit-editor@ukoln.ac.uk).

References

  1. Cascading Style Sheets, level 2 (CSS2) Specification, W3
    URL: <http://www.w3.org/TR/REC-CSS2/>
  2. XML.COM, Organisational Home Page
    URL: <http://www.xml.com/>
  3. The SGML/XML Web Page, Organisational Home Page
    URL: <http://www.oasis-open.org/cover/xml.html>
  4. XMLINFO.COM, Organisational Home Page
    URL: <http://www.xmlinfo.com/>
  5. XMLSOFTWARE.COM, Organisational Home Page
    URL: <http://www.xmlsoftware.com/>
  6. Extensible Markup Language (XML), W3C
    URL: <http://www.w3c.org/XML/>
  7. Frequently Asked Questions about the Extensible Markup Language, Peter Flynn, University of Cork
    URL: <http://www.ucc.ie/xml/>
  8. XLink, W3C
    URL: <http://www.w3.org/TR/WD-xlink>
  9. XPointer, W3C
    URL: <http://www.w3.org/TR/WD-xptr>
  10. What Are ... XLink and XPointer, Ariadne issue 16
    URL: <http://www.ariadne.ac.uk/issue16/what-is/>
  11. WebCGM Profile, W3C
    URL: <http://www.w3.org/TR/NOTE-WebCGM/>
  12. Hyper Graphics Markup Language (HGML), W3C
    URL: <http://www.w3.org/TR/NOTE-HGML>
  13. PNG (Portable Network Graphics) Specification, W3C
    URL: <http://www.w3.org/TR/REC-png>
  14. Graphics Activity, W3C
    URL: <http://www.w3.org/Graphics/Activity>
  15. Scalable Vector Graphics (SVG) Requirements, W3C
    URL: <http://www.w3.org/TR/WD-SVGReq>
  16. Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, W3C
    URL: <http://www.w3.org/TR/REC-smil/>
  17. Document Object Model (DOM) 1.0 Specification, W3C
    URL: <http://www.w3.org/TR/REC-DOM-Level-1/>
  18. Uniform Resource Locators (URL), W3C
    URL: <http://www.w3.org/Addressing/rfc1738.txt>
  19. URN, W3C
    URL: <http://www.w3.org/Addressing/>
  20. HTTP/1.1, W3C
    URL: <http://www.w3.org/Protocols/rfc2068/rfc2068>
  21. HTTP/NG, W3C
    URL: <http://www.w3.org/Protocols/HTTP-NG>
  22. DOI, W3C
    <http://www.doi.org/>
  23. PICS, W3C
    <http://www.w3.org/PICS/>
  24. Meta Content Framework Using XML, Apple
    <http://www.w3.org/TR/NOTE-MCF-XML/>
  25. Dublin Core, W3C
    <http://purl.org/DC/>
  26. Web Collections, Microsoft
    <http://www.w3.org/TR/NOTE-XMLsubmit.html>
  27. Metadata and Resource Description, W3C
    <http://www.w3.org/Metadata/>
  28. RDF, W3C
    <http://www.w3.org/RDF/>
  29. W3C Data Formats, W3C
    <http://www.w3.org/TR/NOTE-rdfarch>
  30. Reggie, DSTC
    <http://metadata.net/dstc/>
  31. DC-dot, UKOLN
    <http://www.ukoln.ac.uk/metadata/dcdot/>
  32. Document Content Description for XML, W3C
    <http://www.w3.org/TR/NOTE-dcd>
  33. Mozilla.org, Mozilla
    <http://www.mozilla.org/rdf/doc/>
  34. Wilde's WWW: Technical Foundations of the World Wide Web, Erik Wilde, ISBN 3-540-64285-4, Spinger
  35. Wilde's WWW: Technical Foundations of the World Wide Web, Springer
    <http://www.springer.de/cgi-bin/bag_generate.pl?ISBN=3-540-64285-4>
  36. Wilde's WWW: Technical Foundations of the World Wide Web, Amazon.com
    <http://www.amazon.co.uk/exec/obidos/ASIN/3540642854/qid%3D916932222/026-1791844-8088430>

Author Details

Brian Kelly
UK Web Focus
UKOLN: http://www.ukoln.ac.uk/
Tel: +44 1225 323943
Email: B.Kelly@ukoln.ac.uk
Address: UKOLN, University of Bath, Bath, UK, BA2 7AY

Brian Kelly

Brian Kelly is employed as UK Web Focus, at UKOLN (UK Office for Library and Information Networking) at the University of Bath, England. Brian's responsibilities include keeping the UK Higher Education community informed of web developments.

For citation purposes:
Brian Kelly, "The Development Of Web Protocols And Formats," Exploit Interactive, issue 1, 10 April 1999
URL: <http://www.exploit-lib.org/issue1/web/>