Graham Bancroft is not the only one really confused lately about whether XHTML or HTML is actually better/preferred/more correct/The Right Choice (that’s not to say that it’s a new debate). Ever since I had a chat with Anne about it a week or so ago, this issue has been playing on my mind more and more. I’ve invested some time in reading lots of articles and bits of specs in order to try to find out what the answer is but I’m still confused. The intention of this post is to have a balanced look at the advantages and disadvantages of each, in order to collect my thoughts and come to some sort of decision.

Start with an assumption

Serving XHTML with a MIME type of text/html is wrong. The whole point of XHTML is that it’s XML so that you can benefit from namespaces and the like. If you serve it as text/html, you can’t:

In particular, ‘text/html’ is NOT suitable for XHTML Family document types that adds elements and attributes from foreign namespaces, such as XHTML+MathML [XHTML+MathML].

You will also be serving your “XHTML” document as tag soup; it will be parsed as HTML:

XHTML documents served as ‘text/html’ will not be processed as XML [XML10], e.g. well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see C.11 and C13 of [XHTML1] respectively).

This is covered in greater detail in Ian Hickson’s excellent article, Sending XHTML as text/html Considered Harmful.

I can already hear someone shouting “backwards compatibility!” at me, so I will state that I do consider Content Negotiation an acceptable solution to the problem of Internet Explorer:

Authors who wish to support both XHTML and HTML user agents MAY utilize content negotiation by serving HTML documents as ‘text/html’ and XHTML documents as ‘application/xhtml+xml’.

So here’s the assumption. As it stands, at the moment, we, as web page authors, have two choices:

  1. XHTML 1.0 served as application/xhtml+xml to conforming UAs, and text/html to Internet Explorer
  2. HTML 4.01, served as text/html (of course)

XHTML 1.1 is not an option because it mandates a MIME type of application/xhtml+xml which is incompatible with Internet Explorer (I’m assuming you want to support Internet Explorer; I certainly do).

XHTML

Let’s start by looking at the XHTML option. It has the advantages of:

  • Mixed namespaces
  • Much simpler to work with (for programs, at least) than HTML
  • You will immediately know when your document is not well-formed due to an error from your UA. (Yes; I am assuming you use a real browser)

Notice how the first advantage doesn’t count because you are trying to maintain “HTML compatibility”.

Then there are some disadvantages:

  • You are essentially maintaining two different documents: a HTML one, and an XML one; Javascript works a bit differently, CSS works a bit differently, etc.
  • As Anne notes, there will not be incremental loading for Gecko browsers

Some further notes:

  • I don’t consider parse errors a problem; well-formedness should be checked on the server-side
  • XHTML 1 has no more/different semantics than HTML

HTML

Advantages of using HTML:

  • None particularly

Disadvantages of using HTML:

  • None particularly

Make a decision and go with it

I bet you’ve already guessed what my opinion is now. I believe the logical choice has got to be HTML. Not because it’s particularly amazing in any way, but because I feel there are no particular advantages or disadvantages of using it, whereas there are disadvantages of using XHTML (which outweigh the advantages). The one thing that would make XHTML 1 worthwhile is namespaces. But you can’t do that if you want to support Internet Explorer. I don’t need to do that anyway.

My redesign will be done with HTML, served as HTML, validating as HTML.

Anne has seen quite a lot of hosility due to his opinions on this subject, so I ask that you respect my decision here; I certainly respect yours.

Right now I feel like an aetheist who set out to prove Christianity wrong, only to end up believing in it (actually it’s nearer the other way around for me; I used to be Christian). I always thought that XHTML was the Right Thing to use. Hell, I didn’t even know what I thought when I started writing this article. If you are unsure yourself, I strongly urge you to investigate. Read the specs, read what people are saying on the subject, and if you can’t justify using whatever combination of markup and MIME you currently use, something is up.

I don’t consider XHTML to be useless

Not by a long way. If you want to mix namespaces (and in doing so, sacrifice IE support), then it is useful now. The forthcoming XHTML 2.0 also looks very promising, although in order to use it we’ll need support from IE—I sincerely hope that Microsoft make that one of their bug fixes for IE 7.

(Yes, I know XHTML 2.0 isn’t backwards compatible. The web needs to keep moving forward. One day (in the very distant future), non-conforming UAs will be a minority, and we will need to sacrifice backwards compatibility. I think that’s acceptable.)

When you feel frustrated about this all…

Remember that you are writing semantic markup. Remember that you are applying your page presentation with CSS. Remember that you give a damn about accessibility. Remember that your websites are better than 99% of the crap out there—whether you decide to use XHTML or HTML.

(You are aren’t you?)

Acknowledgements

Thanks to:

  • Mathias for making me say I’d sort out my MIME type when I redesign
  • Anne for giving me doubts

Some more links