Wednesday 22nd June 2005
Note: This guide is intended for those that already have an understanding of HTML and CSS, and want to know how to convert from HTML to XHTML. Those looking to learn (X)HTML from scratch should take a look at the Beginner's Guide to XHTML.
XHTML can be considered the next version of HTML, but there are quite a few differences. Rather than just being HTML with bits stuck on, XHTML changes the way HTML must be written, and what code goes where. The main idea of XHTML is to make HTML cleaner - there are stricter rules, meaning that the look of webpages is more consistant, and the code is easier to edit and understand. The purpose of this guide is to help you learn XHTML and be able to convert existing web pages into XHTML.
First of all, you have to choose which XHTML version you want to use. At this point in time, the main versions are XHTML1.0 Strict, Transitional and Frameset, and XHTML1.1, which is essentially XHTML1.0 Strict. Frameset is, funnily enough, for use if you have frames in your webpage. However, it's probably a good idea to get rid of frames and use an alternative, such as divs. Strict is just that, cutting out many features of HTML, largely presentational. The idea is, instead of defining how the page looks in HTML, to define the appearance of the page in CSS, while the HTML deals with the actual content. As such, you will almost certainly want to use style sheets with XHTML1.0 Strict and XHTML1.1. Transitional allows some presentational features, such as background colour, in HTML. For the most of the rest of this guide, I will be assuming XHTML1.0 Strict or XHTML1.1 is to be used. If designing a new website, it is probably best to head straight for XHTML1.0 Strict. I don't recommend using XHTML1.1 since its MIME type cannot be defined as text/html, which can cause problems.
So, once you have decided your XHTML version, it's time to actually start editing the code. Firstly, every tag (apart from the doctype, which is mentioned later) must be in lower case, as must every attribute. So, instead of
<P CLASS="DATE">, you must now use
<p class="date">. You must also make sure all of the attributes are in quotation marks. Instead of
<p class=date>, you want to use
Every tag must also be closed. This means that, for tags such as
<table>, you must also have the closing tag, in this case
</table>. Some tags, such as
<img>, don't need a closing tag - instead, there is a forward slash at the end of the tag (with a space beforehand for compatibility) e.g.
<br /> and
The various tags must also be in the correct order i.e. nested properly. For example, instead of
<p><code>This is code.</p></code>, use
<p><code>This is code.</code></p>. In other words, each tag must be closed in the reverse order that they were opened.
You must state that the page is using XHTML. This means placing a doctype at the top of the code, before the
For XHTML1.0 Frameset, you need:
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
For XHTML1.0 Transitional, you need:
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
For XHTML1.0 Strict, you need:
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
For XHTML1.1, you need:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
Next, you need to change the
<html> tag to this:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">, changing the "en" to whichever language you are using.
There is another part that is not required, but recommended. Since this now an XML document, you should add the XML declaration at the top:
<?xml version="1.0" encoding="ISO-8859-1"?>
ISO-8859-1 with whatever character encoding you want to use. I'm in Western Europe, so ISO-8859-1 is the right one for me. If you have already defined the character encoding elsewhere in the document, you should use the same encoding here.
The structure of the document must be correct as well. This means you need
<body> tags, along with their closing counterparts.
Thus far, the changes have been largely to make the code clean. However, as mentioned, XHTML also means that presentational features should be moved into CSS. Due to this, tags such as
<body bgcolor="#ffffff"> and
<font face="arial"> are no longer allowed. Things such as the background colour can be handled by CSS.
One area where you may have difficulty is formatting short pieces of text. Changing the appearance of all headings or paragraphs is simple, yet a different method is required when you want to change the appearance of a short piece of text. The solution is to use classes. In the CSS file, you define a class by a full stop and then the attributes e.g.
If the changes are to be made to entire sections between tags, use the attribute
class="classname". For example, if you wanted an entire paragraph to be formatted with the class 'highlight', you would type
<p class="highlight">This text is highlighted</p>. If there aren't existing tags where you require them, you can use the span tags instead, like so:
<span class="highlight">This text is highlighted</span>.
That's it! (For this guide at least). Remember, this is not a complete list of changes from HTML to XHTML. However, this guide has covered the main points, so that the remaining points can be solved easily by use of the validator.
- All tags and attributes in lower case
- All attributes in quotation marks
- Every tag must be closed
- Tags must be nested properly
- A doctype must be stated
- Presentational features should be included in CSS