Q. & A.: The X(HTML) FilesSkip other details (including permanent urls, DOI, citation information)
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact firstname.lastname@example.org for more information. :
For more information, read Michigan Publishing's access and usage policy.
Last time around ("Looking Good"), we surveyed the challenges of delivering consistent Web pages in an environment that seems to have no standards. From all indications, that challenge is going to greatly intensify soon.
By 2002, it is estimated that as much as 75 percent of Internet document viewing will be done on televisions, cell phones, palm-sized computers and other alternate devices. Granted, few people will want to read the entire text of a scientific journal on a Palm Pilot, but many may use such devices to locate citations, look over the latest Table of Contents, and search for archived material.
When the first Palm Pilot user calls up your site, will you be ready for the challenge? If you're coding your pages with HTML, the answer likely will be an emphatic "No." Unlike Web browsers, these new devices will have a very limited amount of horsepower and will not be able to cope with the poorly written HTML that's rampant on many Web sites.
Fortunately, there is an easy way to prepare for this next wave of electronic content distribution — and to address a few other problems in the process. The solution is dipping a toe into the waters of XML by moving your documents to the hybrid XHTML. The good news is that creating XHTML documents is almost easier than saying "XHTML."
XML: Gaining momentum
Unless you've been on a deserted island the last few years, you've at least heard of XML. The endlessly hyped markup language provides a means of representing and describing data. The "X" stands for extensible; that is, users can define their own markup tags to describe data. While much has been made of XML's role in e-commerce, it also offers tremendous benefits for publishers.
For starters, once you create an XML document, it's easy to use the companion Extensible Style Language (XSL) to reformat it to ASCII, RTF, LaTeX — or any number of browser-specific versions of HTML. XML is at the heart of the integration of Quark's XPress and Vignette's StoryServer products — via another Quark product, avenue.quark — a union that has made it easier for publishers to move print information to the Web. In addition, XSL makes it possible to offer database functionality from XML documents, with no actual database needed.
For all its appeal, however, XML will not sweep the electronic-publishing industry anytime soon. Too many publishers are too heavily invested in creating HTML documents to make the switch immediately. It is possible, however, to begin the transition rather painlessly by moving to the intermediate XHTML.
XHTML: Ready for prime time
XHTML is an XML application. Documents coded in XHTML share the overall structure of XML documents but forego the tags used to describe data. The difference between the two languages is that XHTML is based on HTML 4.01. As such, XHTML documents will work well not only on XML-ready browsers (such as Microsoft Internet Explorer 5.0 and Netscape Navigator 6.0) and other devices (such as the cell phones, palm computers and even that fabled Web-enabled toaster) but also on most older HTML-only browsers and WebTV.
"Some of the changes are obvious, but some are more subtle"
As with XML, a key aspect of XHTML is its insistence on well-formed and consistent code. During the years of browser battles, HTML browsers were built to forgive messy markup: Get the ball into the park and you'd have the equivalent of a home run. But different browsers forgave different sins. For example, leave the closing </table>; tag out of a document, and Internet Explorer takes it in stride. Browse the same page with Navigator, and all you'll see is a blank window. Such coding errors are always fatal in XHTML: Either your code is correct or the document will not render. By cracking down on sloppy code, XHTML promises publishers that their files will render more consistently from one browser to the next and not tax the limited processors of non-PC-based browsers. (At least that's the theory. Microsoft has already created several XHTML and XML "extensions" that don't conform to the official specifications, and if history is any indication, Netscape will follow suit. There is a ray of hope, however: Microsoft has been getting blasted for its lack of support for standards in the latest versions of MSIE, and Netscape — the company that introduces non-standard extensions — has taken a strong stance in support of standards.)
A look at the code
Perhaps the best place to begin a discussion of how XHTML differs from HTML is by looking at a short document coded both ways. First, we'll see the code for good old HTML:
<title>HTML 4.0 Sample File</title>
Welcome to the sample file for my latest <B><I>JEP</B></I> column.
<A NAME="picture">Here is a picture.</A>
<BR>; <IMG SRC="test.gif" width=130 height=200>
And now in XHTML:
<!DOCTYPE HTML PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
<title>XHTML 1.0 Sample File</title>
Welcome to the sample file for my latest <b><i>JEP</i></b> column.
<a id="picture">Here is a picture.</a>
<img src="test.gif" width="130" height="200" />
Some of the changes are obvious, but some are more subtle. Taking it from the top:
- The document starts with an XML declaration. This declaration is optional but is "strongly encouraged" by the World Wide Web Consortium.
The DOCTYPE declaration is now mandatory and it must appear before the <html> tag. The main difference between HTML and XHTML is that they use different Document Type Definitions. Actually, XHTML documents can use one of three different DTDs:
XHTML-1.0-Strict (for documents that do not use HTML for presentation — no <font> or <table> tags allowed) <!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML-1.0-Transitional DTD (used in this example, and the most likely candidate for most publishers to use for the foreseeable future) <!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
XHTML-1.0-Frameset(for documents that use frames) <!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
The <html> tag must be the first and last element of the document (only the previous two items can appear before it) and must designate the XHTML namespace. Opening and closing the document with the <html> tag is required because XML insists on a single root element, that is, an element in which all other elements nest. In contrast, an HTML document will render properly even if it lacks the <html> tags and therefore has two root elements: the head element and the body element.
Element and attribute names must be lowercase. XML is case-sensitive, so <b> and <B> are different tags.
Documents must be well formed. In part, that means that all elements must nest properly: An end tag must have the same name as the most recent unmatched start tag. When elements overlap, as the bold and italic tags do here, they must be closed in the opposite order of which they were opened.
The id attribute replaces the name attribute. In all instances in which you would have used the name attribute in the past — a, applet, form, frame, iframe, img and map — you should substitute id.
Non-empty elements — such as the paragraph, meta and list item tags — must have end tags. In HTML, you can omit the end tag and the browser will assume you meant to include it.
Empty elements — those that do not enclose any data, such as the horizontal rule tag, the image tag and the break tag — must be terminated with a slash preceding the closing bracket.
Attribute values must always be quoted. In the past, browsers permitted coders to leave quote marks off numeric values.
There are, of course, more rules in the XHTML specification, but for most publishers, these are the ones that count.
Creating the code and checking it twice
Obviously, you do not have to reinvent the wheel to craft XHTML documents. Most of the changes from HTML are minor and readily incorporated into your document coding. If you're unwilling to make even a little effort, however, fear not. Several very nice XHTML editors are yours for the taking. The first, Amaya, was developed by the World Wide Web Consortium and lets you create XHTML documents graphically (with ready access to the code). Another, HTML-Kit, will convert existing HTML documents to XHTML (or XML, for that matter) with two clicks. (I used it to create an XHTML version of this document.) Both programs include XML compliant browsers.
As noted earlier, XHTML is picky. Browsers will simply refuse to render XHTML documents that are not perfectly coded. The HTML Tidy utility checks for and corrects markup errors. HTML Tidy is available free as a standalone command line program, and it's also included in HTML-Kit.
In a future column, we'll move deeper into XML itself and its promises for online publishers. In the meantime, if you're an early adopter, I'd appreciate hearing about your experiences with XML. Please send me your gripes, praise, tips, etc.
The author thanks Maureen Sullivan of Turner Consulting Group for editorial review.
Thom Lieb is an associate professor of journalism and new media at Towson University in Baltimore. Among his courses is Writing for the Web. He is the author of Editing for Clear Communication and has written and edited for magazines, newspapers, newsletters and online publication. He holds a Ph.D. in Public Communication from the University of Maryland at College Park and a master's of science in Magazine Journalism from Syracuse University. You may contact him by e-mail at email@example.com.
Links from this article:
Thom Lieb, "Looking Good," Journal of Electronic Publishing, December 1999.
"Web Standards Project Calls On Public to Pressure Microsoft to Fully Support Web Standards in their Browser" http://www.webstandards.org/press/releases/2000-get-it-right
"W3C Standards Support in IE and the Netscape Gecko Browser Engine" http://wp.netscape.com/browsers/future/standards.html
XHTML 1.0 Specification, http://www.w3.org/TR/xhtml1/
Amaya editor and browser (http://www.w3.org/Amaya/)
HTML Tidy (http://www.w3.org/People/Raggett/tidy/)
The author thanks Maureen Sullivan of Turner Consulting Group for editorial review.