Exploit Interactive HomeHomeSearch
Issue CoverEditorialFeaturesRegular ColumnsNews and EventsEt cetera

In this issue's Web Technologies column we ask Brian Kelly to tell us more about XHTML.

The XHTML Interview

What Is XHTML?
The answer to the Webmaster's nightmares. One of the technical highlights of the recent WWW 9 conference.
 
Can you be slightly more explicit, please! What does it stand for? How is if different from HTML? Who developed it? Why are you so excited about it?
XHTML stands for "Extensible HyperText Markup Language". It was developed by the World Wide Web Consortium (W3C) and is now a W3C Recommendation [1].
XHTML is a reformulation of HTML 4 in XML 1.0. This means that the benefits provided by XML will be available to XHTML.
 
But how does HTML differ from XHTML?
XHTML has a small number of differences. The most noticeable being the requirement for elements to be lowercase (e.g. <p> and not <p>) and elements to be closed (e.g. paragraphs must end with a </p>).
 
That's a pain. I prefer to type my tags in uppercase, and I never bother closing my paragraphs. Why do I have to do this?
For reasons on internationalisation XML elements are case sensitive. A choice had to be made, and lowercase won on the day.
 
What about the need for end tags?
Remember that XHTML is an XML application.
 
So?
Have a look at the markup fragments in the following table.
Markup Comments
<part-number>273</part-number>
wheel
Invalid XML
<part-number>273</part-number>
<part-type>wheel</part-type>
Well-formed XML
<h1>Introduction</h1>
Welcome to this document on XHTML.
Valid HTML but invalid XHTML
<h1>Introduction</h1>
<p>Welcome to this document on XHTML.</p>
Valid HTML and well-formed XHTML
Since XML documents can use arbitrary elements an XML application cannot know how the document is structured. Web browsers, however, do know something about the document structure. For example, text that occurs immediately after a heading is normally assumed to be part of a paragraph, and a <p> element is assumed. XML applications can't make such assumptions, so more rigourous markup is required.
 
Since XML documents can use arbitrary elements an XML application cannot know how the document is structured. Web browsers, however, do know something about the document structure. For example, text that occurs immediately after a heading is normally assumed to be part of a paragraph, and a <p> element is assumed. XML applications can't make such assumptions, so more rigourous markup is required.
 
OK. But what about elements that don't have a close tag, such as <IMG> (sorry I mean <img>!) and <hr>
There are two solutions. You could use a close tag (e.g. <img src="logo.gif" ...></img>). However the best solution is to simply include a forward slash in the element: <img src="logo.gif" ... />
 
Will this work?
As long as you include a space before the slash it will cause no problems in most Web browsers - although there have been reports of problems with some embedded HTML viewers such as Java's Swing HTML editor.
 
Are there any other differences between HTML and XHTML?
Attribute values must be in quotes (e.g. <img src="logo.gif" alt="University logo" height="50" width="75">).
 
Sorry for pestering you, but why?
Remember that XML applications don't know what the tags mean. Do you know what <jnh tsd=logo.gif bmu=University logo ifjhiu=50 xjeui=75> means? To save confusion and ambiguity all attributes must be quoted.
 
Any other differences?
Some, but I've covered the main ones. I should also point out that the XHTML document should begin with an XML Processing Instruction and then be followed by the XHTML DTD. It will normally look something like this:
 
<?xml version="1.0" encoding="UTF-8"?>
<DOCTYPE PUBLIC "-//W3C/DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/TR/xhml1">
 
You still haven't explained what the benefits of XHTML are.
As XHTML is an XML application, you will benefit from developments in the XML world. For example XML tools such as editors, converters, browsers, etc. can be used with XHTML resources. In addition there are developments to the XML family of protocols and formats which will provide additional functionality for XHTML.
 
Go on.
XLink [2] [3], for example, will provide richer hyperlinking functionality and XML Namespaces [4] will support the deployment of modular XML DTDs. XHTML, for examples, consists of a series of modular DTDs.
 
Why do I need modular DTDs?
An application may wish to support only a subset of XHTML. For example a mobile phone, an Internet TV or even a Web-aware cooker may only require a subset of XHTML. Also modularity makes it easier to deploy new developments. Once XForms [5], for example, has been finalised it will be much easier to deploy documents which make use of the enhanced forms capabilities which this proposal will bring.
 
Any other important new developments?
Yes: XSLT, XSL Transformations [6] [7]. XSLT provides a transformation language which can be used to transform XML documents into other formats. XSLT can be used to transform documents from one XML DTD to another, or even to transform an XML document to an alternative format such as RTF or PDF.
 
Why is this important?
You've heard all the hype about mobile phones and WAP haven't you? How do you think the WAP world, which expects documents to be in WML format, to be populated? Rather than manually creating WML markup, XSLT will enable XHTML documents to be automatically converted to WML.
 
So XHTML should be the master storage format for my resources?
NO! XHTML still lacks semantics. Ideally your resources should be stored in an appropriate XML format. XSLT can then be used to convert the resources to XHTML (for Web browsers), WML (for mobile phones), etc. XHTML is a useful intermediate stage.
 
Can we get down to practicalities. How do I create XHTML pages?
The eGroups XHTML-L Web site provides links to XHTML tools, including conversion tools and editors [8]. A couple of free tools are available (HTML-Kit, 1st Page 2000). Mozquito Factory appears to be the first licensed package on the market.
 
Hmm. So there's not many authoring tools, and none I've heard of.
That's true. But you can expect the usual suspects (Microsoft, Dreamweaver, etc) to bring out new versions of their products with XHTML support.
 
What about conversion of existing HTML pages - especially bulk conversion, as I have many thousands of HTML files!
Dave Raggett, W3C has written a utility program called Tidy [9] which can be used to convert HTML pages to XHTML. Tidy can be used in batch mode to bulk-convert documents. Tidy is an open source program, which has been incorporated into an number of authoring tools, most notably HTML-Kit [10], which is illustrated below.
 
Figure 1: HTML-Kit
Figure 1: HTML-Kit
Are there any problems you haven't mentioned?
XHTML documents should start with an XML Processing Instruction: <?xml version="1.0" encoding="UTF-8"?>. It should be noted that some browsers (e.g. Netscape versions 1-3, Mosaic 3 [11]) will display the Processing Instruction in the browser.
 
Is this a problem?
Probably not. If you are concerned you could "user-agent negotiation" so that the processing instruction is not sent to those browsers.
 
The 64 thousand dollar question: Should I be using XHTML?
It is the approved W3C Recommendation, so if you are committed to support for standards you should be using it. However telling your users that they should stop using FrontPage, HoTMetal and DreamWeaver and start using HTML-Kit is probably not a sensible idea. I would say that XHTML should be recommended for use if you do not have users of current HTML authors tools. It should definitely be used by software developers who generate HTML on the fly.
 
How do I find out more?
Beginning XHTML XHTML books are being written. One of the first to be published is "Beginning XHTML" [12]. The book is available from Amazon for £21-74 [13]. Note that one of the authors is Dave Raggett, a W3C employee who has been involved in HTML developments since the early days.
 
Another very useful resource is eGroup's XHTML-L mailing list and accompanying Web site [14]. Although the mailing list is active and provides a useful source of advice, the best feature of this resource is the accompanying Web site which provides many links to additional resources, as shown below.
Figure 2: eGroups XHTML Web Site
Figure 2: eGroups XHTML Web Site
Another useful resource is the W3School, which not only provides useful information about XHTML [15] but also on technologies such as XML, WML, etc.
 
Thank you
You're welcome.

References

  1. XHTMLTM 1.0: The Extensible HyperText Markup Language, W3C,
    URL: <http://www.w3.org/TR/xhtml1/> Link to external resource
  2. XML Linking Language (XLink), W3C
    URL: <http://www.w3.org/TR/xlink/> Link to external resource
  3. What Are .. XLink and XPointer?, Ariadne issue 16
    URL: <http://www.ariadne.ac.uk/issue16/what-is/> Link to external resource
  4. Namespaces in XML, W3C
    URL: <http://www.w3.org/TR/REC-xml-names/> Link to external resource
  5. XForms 1.0: Data Model, W3C
    URL: <http://www.w3.org/TR/xforms-datamodel/> Link to external resource
  6. XForms Requirements, W3C
    URL: <http://www.w3.org/TR/xhtml-forms-req> Link to external resource
  7. XSL Transformations (XSLT) Version 1.0, W3C
    URL: <http://www.w3.org/TR/xslt> Link to external resource
  8. XHTML - Links : Tools, eGroups
    URL: <http://www.egroups.com/links/XHTML-L/Tools_000957360438/> Link to external resource
  9. Tidy, W3C
    URL: <http://www.w3.org/People/Raggett/tidy/> Link to external resource
  10. HTML-Kit, Chami
    URL: <http://www.chami.com/html-kit/> Link to external resource
  11. XML Declaration test results, Robin Lionheart, posting to XHTML-L list, 3 June 2000
    URL: <http://www.egroups.com/message/XHTML-L/288?&start=266> Link to external resource
  12. Beginning XHTML, Boumphrey, Greer, Raggett, Raggett, Schnitzenbaumer and Wugofski, Wrox Press Ltd,
  13. A Glance: Beginning XHTML, Amazon.co.uk
    URL: <http://www.amazon.co.uk/exec/obidos/ASIN/1861003439/o/qid=961063119/sr=8-1/026-2492660-4333201> Link to external resource
  14. XHTML-L, eGroups
    URL: <http://www.egroups.com/group/XHTML-L> Link to external resource
  15. Welcome to XHTML School, W3Schools
    URL: <http://www.w3schools.com/xhtml/> Link to external resource

Author Details

Picture of Brian Kelly Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath
England
BA2 7AY

URL: <http://www.ukoln.ac.uk>
Email: b.kelly@ukoln.ac.uk

For citation purposes:
Brian Kelly, "The XHTML Interview", Exploit Interactive, issue 6, 26th June 2000
URL: <http://www.exploit-lib.org/issue6/xhtml/>


[HTML Validation] - [Accessibility check]