XML Syntax Rules
The syntax rules of XML are very simple and very strict. The
rules are very easy to learn, and very easy to use.
Because of this, creating software that can read and manipulate
XML is very easy.
An Example XML Document
XML documents use a self-describing and simple syntax.
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
|
The first line in the document - the XML declaration - defines
the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification
of XML and uses the ISO-8859-1 (Latin-1/West European) character set.
The next line describes the root element of the document (like it was saying:
"this
document is a note"):
The next 4 lines describe 4 child elements of the root (to, from, heading, and
body):
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
|
And finally the last line defines the end of the root element:
Can you detect from this example that the XML document contains a Note to
Tove from Jani? Don't you agree that XML is pretty self-descriptive?
All XML Elements Must Have a Closing Tag
With XML, it is illegal to omit the closing tag.
In HTML some elements do not have to have a closing tag. The following code
is legal in HTML:
<p>This is a paragraph
<p>This is another paragraph
|
In XML all elements must have a closing tag, like this:
<p>This is a paragraph</p>
<p>This is another paragraph</p>
|
Note: You might have noticed from the previous example that the XML declaration
did not have a closing tag. This is not an error. The declaration is not a part of the
XML document
itself. It is not an XML element, and it should not have a closing tag.
XML Tags are Case Sensitive
Unlike HTML, XML tags are case sensitive.
With XML, the tag <Letter> is different from
the tag <letter>.
Opening and closing tags must therefore be written with the same case:
<Message>This is incorrect</message>
<message>This is correct</message>
|
XML Elements Must be Properly Nested
Improper nesting of tags makes no sense to XML.
In HTML some elements can be improperly nested within each other like this:
<b><i>This text is bold and italic</b></i>
|
In XML all elements must be properly nested within each other like this:
<b><i>This text is bold and italic</i></b>
|
XML Documents Must Have a Root Element
All XML documents must contain a single tag pair to define a root element.
All other elements must be within this root element.
All elements can have
sub elements (child elements). Sub elements must be correctly nested
within their parent element:
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
|
XML Attribute Values Must be Quoted
With XML, it is illegal to omit quotation marks around attribute
values.
XML elements can have attributes in name/value pairs just like in HTML. In
XML the attribute value must always be quoted. Study the two XML documents below.
The first one is incorrect, the second is correct:
<?xml version="1.0" encoding="ISO-8859-1"?>
<note date=12/11/2002>
<to>Tove</to>
<from>Jani</from>
</note>
|
<?xml version="1.0" encoding="ISO-8859-1"?>
<note date="12/11/2002">
<to>Tove</to>
<from>Jani</from>
</note>
|
The error in the first document is that the date attribute in the note
element is not quoted.
This is correct: date="12/11/2002". This is incorrect: date=12/11/2002.
With XML, White Space is Preserved
With XML, the white space in your document is not truncated.
This is unlike HTML. With HTML, a sentence like this:
Hello
my name is Tove,
will be displayed like this:
Hello my name is Tove,
because HTML reduces multiple, consecutive white space characters to a single
white space.
With XML, CR / LF is Converted to LF
With XML, a new line is always stored as LF.
Do you know what a typewriter is? Well, a typewriter is a mechanical device
which was used last century to produce printed documents. :-)
After you have typed one line of text on a typewriter, you have to manually
return the printing carriage to the left margin position and manually feed the
paper up one line.
In Windows applications, a new line is normally stored as a pair of
characters: carriage return (CR) and line feed (LF). The character pair bears
some resemblance to the typewriter actions of setting a new line. In Unix
applications, a new line is normally stored as a LF character. Macintosh
applications use only a CR character to store a new line.
Comments in XML
The syntax for writing comments in XML is
similar to that of HTML.
<!-- This is a comment -->
There is Nothing Special About XML
There is nothing special
about XML. It is just plain text with the addition of some XML tags
enclosed in angle brackets.
Software that can handle plain text can also handle XML. In a simple text
editor, the XML tags will be visible and will not be
handled specially.
In an XML-aware application however, the XML tags can be handled specially. The tags may or may not be visible,
or have a functional meaning, depending on the nature of the application.
|