Archive for May, 2007

Writing Good XHTML

Tuesday, May 22nd, 2007

A recent project has forced me to take a closer look at how valid HTML code really is. My task was to improve performance, validate, and standardize the code. In later articles I will discuss my research, development, and conclusions to improving the company’s site performance. But for now, I am going to focus on how to write the perfect XHTML document.

XHTML is a set of document types that reproduce and extend HTML 4, are XML based, and are designed to work with both XML-based and HTML-based user agents. That is, XHTML must conform not only to HTML standards, but conform strictly to XML standards as well.

The differences in HTML and XHTML are strict conformity. A best practice for both standard HTML and XHTML is to conform to one of three DTD’s, Strict, Transitional, or Frameset, and to declare the DOCTYPE. Which can be written as follows:

1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The root element of the document must be html, and must contain the XML namespace (xmlns) declaration.

1
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

According to W3C, its a good idea to have an xml declaration, but it is not required. I personally leave xml declarations for xml only documents.

As said earlier, the major difference between HTML and XHTML is strict conformity, the documents must be well-formed. Elements must be properly nested.

Correctly nested element:

1
<p>Lorem ipsum dolor sit amet, <i>consectetuer</i> adipiscing elit.</p>

Incorrecly nested element:

1
<p>Lorem ipsum dolor sit amet, <i>consectetuer adipiscing elit.</p></i>

Because XHTML is interpreted as XML documents, all tags must be lowercase, because XML is case-sensitive. This also pertains to tag attributes as well. It is best practice to create all markup language in lowercase whenever possible.

Correct

1
<strong>Hello World</strong>

Incorrect

1
<STRONG>Hello World</STRONG>

XML does not allow end tags to be omitted, thus, all non-empty tags must be closed. If an element is empty, it must be properly closed. The only tag that does not close is the DOCTYPE declaration as it is not part of the XHTML document.

Good

1
<br />

Bad

1
<br>

All attribute values must be contained in quotes and minimized attributes are unsupported.

Good

1
<input checked="checked" type="checkbox" />

Bad

1
<input checked type=checkbox />

It is best practice to wrap your script content in CDATA elements to avoid parsing of HTML markup such as < and &.

1
2
3
4
5
&lt;script type="text/javascript"&gt;
&lt; ![CDATA[
// script content here
]]&gt;
&lt;/script&gt;

The id attribute is replacing the name attribute in future versions of XHTML. Currently it is best practice to have both named attributes of the same value until future releases of XHTML where then it will be best practice to remove the name attribute all together.

1
&lt;form id="commentform" name="commentform" method="post"&gt;&lt;/form&gt;

Today it is standard to have alt tags for images, objects, and buttons. However, not all browsers support the alt attribute, so title is used instead.

1
&lt;img src='image.jpg' title='My Image' /&gt;

As long as you follow these standards throughout your entire document you will have a valid XHTML document.