8
XML document types
Introduction
This chapter goes beyond the basic knowledge that you need in order to write an XHTML page, and if you wish you can give it a miss on first reading. We can in fact write XHTML documents, or HTML documents without understanding a thing about the underlying grammars that govern them. A large proportion of pages on the web today are written by authors who couldn't tell an ATTLIST declaration from an ENTITY declaration. However if we want to write good XHTML pages, and particularly if we want to exploit the power of modularization we should understand something about DTD's and validation.
We had a brief look at XML in chapter 1. We in fact looked at what was meant by the term a "well-formed" document, and we looked briefly at the requirements of a well-formed document. We also talked about Document Type Definitions in XML, and in SGML, and how Document Type Definitions lay out the rules that a documents markup must take, what types of elements are allowed, what they may contain, and what attributes they may take. We can say that a DTD defines a 'class' of documents, and that every document written according to the rules of a particular DTD belongs to that 'class'. In other words a document written in XHTML is a member of the XHTML class of documents.
We could also use the language of object orientated programming. We could say that a given document is an 'object', and if written to the rules laid out in a certain DTD a document is an 'instance' of that class of document.
In this chapter we will take a slightly broader look at well-formed documents, and then we will look at DTD's and Valid documents. We will look at the syntax of a DTD, and we will look at how to write one. We will in fact make up our own XML markup language called 'html-lite', which will contain some of the elements found in XHTML.
We will also look at how to validate a document using a validating parser. In fact we will build our own validating parser using the parser that comes with IE5.
We will finish of this chapter with a brief look at how we can modularize a DTD. In other words we can take a large DTD and break it up into smaller parts. This will be a prelude to the next chapter where we actually look at XHTML's modules.
What we will cover
* We will review the concept of well formedness, and what is necessary for a document to be well formed.
* We will look at what is meant by a valid document, and validation.
* We will build our own validator using the parser built into IE5.
* We will write our own DTD for a 'lite' version of XHTML
Well Formed XML Documents
We had a look at what constitutes a well-formed XML document in chapter 1. Let's just review this now.
* There must be a root element that contains the whole document.
* All the elements must nest.
* All the elements must be of the correct form
* The element names must match the 'name production'.
* All the attributes must match the name production
* The attribute values must all be quoted
Let's examine this in a bit more detail
Root elements
There must be a root element that encloses the whole document. Note that this element can contain it self, a process know as recursion.
The following are legal XML documents
this is a legal XML document
this is a legal XML document
This however is not
this is NOT a legal XML document
there is no enclosing root element
The tags and elements must be of the correct form
The rules governing the forms of tags and elements can be summed up as follows
* All tags must open with a 'less than' angle bracket, and close with a 'greater than' angle bracket.
* For every opening tag there must be a matching closing tag (except for empty elements).
* Empty elements must take a special form.
Tag delimiters
The general syntax for an opening tag is <[element name]>.
The general syntax for a closing tag is [element name]>
Tags must match
Except for the special case of empty elements there MUST be a closing element, and the tag name must be the same. e.g.
...
XML is case sensitive so atag is not the same as Atag, or ATAG.
The following constructs are illegal
.......
Empty elements
Empty elements must take a special form
<[element name]/>
e.g. in XHTML, and
Note that for backward compatibility with old browsers it is recommended that one leave a space before the forward slash. E.g. . Leaving white space after the name (but not before) is legal XML.
Nesting
All the elements must nest. By this we mean that every element must be completely contained within another.
These elements nest
These elements DO NOT nest
Name productions
All the element and attribute names must match the so called XML "name production"
In order to match the name production, the name must begin with either a colon, a underscore, or a letter, and also must contain only alpha numeric characters, the underscore (_), a colon(:), a dash(-), or a period(.).
The use of the colon however is reserved for the use of the W3C, so don't YOU use it!
There are numerous other non-latin characters that are also legal. This is out of scope for this book, but if you are writing in a non-latin tongue and not using the western alphabet, consult the XML recommendation.
Note the following are legal names
<_firstname>
<:first.name>
The following are not
<1stname>
< firstname> (space between angle bracket and f)
Attributes
Attributes must start with white space, and their names must follow the name production, they must be followed by an 'equals' sign, and then a quoted value.
The general syntax is
[white space][attribute name]='[quoted value]'
There must be no white space either side of the 'equals' sign
There can only be one attribute of the same name on any given element.
Attribute values must be quoted
The values of an attribute must be quoted with matching single '[att value]'or double "[att value]" quotes.
The value of a value quoted with double quotes cannot contain the ampersand (&) character, the less than (<) character or a double quote (")
The value of a value quoted with single quotes cannot contain the ampersand (&) character, the less than (<) character or a single quote (')
Prologs
All XML documents must have prolog, however the prolog can contain absolutely nothing!
When the prolog does contain content it can contain the following
* A Version declaration
* A document type declaration
* Comments
* Processing instructions
Version declarations
When a version type declaration is present it must be the first thing in the document. The syntax is very exact
A version can also include an encoding declaration, but this is out of scope for this book.
There can be only one version declaration.
At present there is only one version of XML.
Document type declaration
We will be looking at document type declarations when we look at valid XML documents and DTD's. An XML document can only have one Document Type Declaration.
Processing Instructions
Processing instructions (PI) can occur anywhere in either the prolog, or the document. They take the general syntax
[target] white-space [data]?>
The target is the application at which the instruction is directed, and the data are the instructions. The user-agent must of course understand the processing instruction in order to implement it!
The only PI that is implemented at the moment is the processing instruction that connects an XML document to a style sheet. We saw this in chapter 1 but here it is again.
Let's try linking a simple xml document with a 'lite' version of XHTML to a style sheet.
Try it out. Linking an 'html-lite' document to a style sheet.
Our html-lite document only has a few of xhtml's elements. Type up this well formed xml document. Save it as html-lite1.xml
HTML-lite example 1
HTML-lite example 1
This is a paragraph with some emphasized text.
This is what you will see when you run it in IE5
What's going on
We have a well formed xml document not associated with any style sheet. The default behavior for IE5 is to display this document as a tree view.
Now let's add a reference to a style sheet. Add the following line to the document.
HTML-lite example 1
HTML-lite example 1
This is a paragraph with some emphasized text.
Note that we have not yet written a style sheet!
Run this again in IE5. This is what you will see.
What's going on
Here are the relevant parts of the processing instruction we have added.
* 'xml-stylesheet'. This is the target of the PI. It tells the user agent that we are dealing with an xml style sheet
* ' type="text/css" '. This tells the use agent the type of style sheet it is dealing with. It is a style sheet in text format written to the rules of CSS. This is part of the 'data' of the PI.
* 'href="html-lite1.css" '. This is also part of the data of the PI. It tells the user agent where to find the style sheet.
Because of the processing instruction the conforming user agent (IE5) knows that it is mean to display the document. In other words it is to discard the markup in the display and display it without the tags. However it can't find the style sheet because we have not yet written it. It therefore displays the content of the elements in its default display, which happens to be as an inline flow object with 12-point 'Times New Roman' font style (for my PC).
Let's now write a style sheet.
Try it out. Adding the style sheet.
Type this up and save it in the same folder as 'html-lite1.css'. Don't worry if you don't understand the details of the style sheet, it will be explained in chapter 11. However the syntax of CSS is almost intuitive!
Note however that we have set the display property of our head element to none, which means that none of its content, and this includes the child element 'title' will be displayed.
/*This is html-lite1.css*/
head{
display:none;
}
body{
display:block;
background-color:white;
font-size:12pt;
}
h1{
display:block;
font-size:24pt;
}
p{
display:inline;
}
em{
font-style:italic;
display:inline;
}
pre{
font-family:courier new,monospace;
white-space:pre;
display:block;
}
now when we refresh our page we will see the following
What's going on
The user agent (IE5) now goes to the style sheet, looks up the appropriate style for each element, and then decorates the parse tree with the appropriate style.It then displays it. All this thanks to a processing instruction!
Comments
Comments are very important in any programming language. They are notes to ourselves, and other interested parties, explaining why we have done what we have done. They are particularly important when we write more than the most trivial DTD. At a minimum we should explain the rationale for every element we add and why we are seleting the attributes that we are selecting.
In XML they take the form
Comments form part of the parse tree but are never displayed or interpreted. Because they form part of the parse tree it does mean that they can be manipulated using the DOM. See chapter ?
CDATA Sections
There will be occasions in our pages when we will not want our markup to be processed, we want to pass it right through the user agent and display it. If you ever get round to writing a book on XML or XHTML you will use this feature all the time!
To do this we wrap it in a CDATA section. CDATA stands for character data. A CDATA section tells the User Agent "Treat every character in this section as a character to be displayed, and not as markup".
The general syntax to create a CDATA section is
The only thing you can't put in a CDATA section is ']]>' (For obvious reasons, otherwise when the parser saw it, it would end the CDATA section!)
Let's create a CDATA section now.
Try it out. Creating a CDATA section
Make the following alterations to html-lite1.xml, and save it as cdata1.xml
HTML-lite example 1
HTML-lite example 1
This is a paragraph with some emphasized text.
Here is how the previous three flow objects were marked up
HTML-lite example 1
This is a paragraph with some emphasized text.
]]>
This is what you will see when you run it in IE5
What's going on
As you can see the entire markup that was enclosed in the CDATA section was not parsed but was passed on to the screen for display.
We placed this in 'pre' tags and asked the style sheet to preserve the white space, but unfortunately IE5 does not support this CSS property.
That's all we want to say about well-formed XML, so let's move on to look at Valid XML
Valid XML documents
A valid XML document is a document that is not only well-formed but conforms to the structure laid out in its Document Type Definition (DTD). The DTD is contained or referenced in the Document Type Declaration that is found in the prolog. Let's have a look at the document type declaration right now.
It is unfortunate that Document Type Definition, and Document Type Declaration share the same acronym, and are so similar, and this can cause much confusion to beginning authors. If you remember that the Declaration declares the document type, and also contains the definition, and the Definition Defines the document you are writing you will be OK.
Document Type Declarations
As we saw earlier, a document type declaration is part of the prolog of the document. It must come after the version declaration (if any), and before the root element.
We announce a document type declaration with the following syntax
For a document with the root element called greeting .xml.
Note that the normal convention is to put variable names in square brackets, but here square brackets are part of the syntax so we have used '..' instead.
In order to validate a document of any size you must either be a computing Genius, or have a validating parser. As few of us can lay claim to the first accolade, let's look at how we can build our own validating parser.
Making a Validating Parser
We can construct a Validating parser from the ActiveX control (MSXML.DLL) that comes with IE5. We have done this for you. You can either download the file as an HTML file from www.wrox.com. or copy it from the listing given in appendix ? Just as you don't have to have any knowledge of programming to use an application, you don't have to have any knowledge of Scripting languages to use this html file as a parser.
Using the Parser
The diagram below shows a screen shot of the parser in operation. Note the following points
* As the parser uses the IE5 DLL MSXML.DLL as an active X object it will only work in IE5 or better.
* You have the choice of parsing in the validating or Well-formed mode.
* You can choose to expand any external entities or not.
* Either browse for a file on your computer, or type in an URL
* The 'browse' function is designed to be run on your desktop. If you run it from the web and try and access a file on your own machine you will more than likely get an "Access denied" error message.
* The result of the parsing operation is given in the text box.
The above shot shows that the document greeting.xml has been successfully validated.
Document Type Definitions
The Document Type Definition sets out the rules of structure for any given XML document. A DTD will contain some or all of the following.
1. A listing of all the allowed element types in the document.
2. A listing of all the allowed content for any given element.
3. 1 and 2 are contained in a construct called the ELEMENT declaration
4. A listing of all the allowed attributes for an element.
5. A description of the attribute type and whether or not the attribute is required.
6. Optionally the default value for the attribute.
7. 4,5, and 6 are contained in the ATTLIST declaration
8. A declaration of all the entities in the document. These can be parameter entities or general entities.
9. A declaration of all the Notations and their attributes
We will look at how these are declared and the syntax employed in the sections below.
First however lets have a look at where in the document we actually place our DTD.
Internal and external DTD's
The DTD may either be contained in a separate document, or it may be embedded in the document type declaration. When the latter method is employed it is called an internal DTD, when the former is employed it is called an external DTD.
For anything other than the most trivial document we will be using an External DTD. This has several advantages, chief of which is that we can reference a single DTD from several documents.
Lets have a look at DTD, for our simplest well formed document, 'greeting .xml'.
Internal DTD's
Here is greeting.xml with an internal DTD that consists of a single element, and a single element declaration.
Try it out. Writing a simple Intrnal DTD
Type up the following, then run it through your parser to make sure it validates. If it doesn't validate, correct the errors (which will be Typo's I promise you!) and run it again. The screen shot above shows the results of running this document.
]>
Hello Valid XML!
What's going on
First we have our version declaration,
followed by a comment.
Now we have our document type declaration.
]>
Note the syntax. We announce the beginning of our Declaration with the following statement. It MUST be in upper case.
is the Document Type Declaration.
We will look in more detail at element declarations shortly. For now just note the keyword that starts an element declaration
Hello Valid XML!
The DTD is contained in a separate file. Note that this file must contain nothing except comments and declarations.
Now that we've seen where to put our DTD, lets jump right in and build a simple DTD for a 'lite' version of XHTML. We will start off with a simple xml file and add to it as we go along.
Element declarations
We are going to define the allowed elements for our first pass at a document of the type 'html-lite'.
Here is the first version of a file of the document type 'html-lite'. We have a root element which contains a head and a body element which also have child elements. We have indented the children to make things clearer.
HTML-lite example 1
HTML-lite example 1
This is a paragraph with some emphasized text.
Note in particular the structure of this document. The following diagram lays out the relationship of the elements one to the other.
We will declare each element in turn, and for each element we will declare the allowed content.
Try it out. Writing a DTD for html-lite1.xml
Here is the DTD for html-lite. Type it up and save it in the same folder as html-lite1.xml
Now make the following changes to 'html-lite1.xml' and save it as 'html-lite2.xml'
HTML-lite example 1
HTML-lite example 1
This is a paragraph with some emphasized text.
Run it in your validating parser. It should validate. If not fix any typos in your files (I promise you the above is valid XML!), and run again.
What's going on
First of all we have to add a document type declaration to our XML file, and reference our DTD.
Now we declare each of the elements in turn in our DTD, 'greeting.dtd'. Here is the general syntax for declaring an element in a DTD.
General Syntax for element declarations.
Note there must be white space between the keyword ELEMENT and the element name, and between the element name and the element content.
Element content
Here we declare the root element html-lite.
Note the content (head, body) separated by a comma.
This means that the html-lite element can contain only these two elements. They can only occur once, and they MUSTappear in this order.
Here is the declaration for the head element
Here we have a single allowed element, title. The syntax says that the head element MUST have a title, and it must occur once and only once.
Constraining element numbers.
The above construct
Severely constrains the content of head to one and only one element called 'title'. We can in fact allow more than one element in the content model.
We could have written this as follows
Both of these constructs mean the same. This would mean that a title is optional, but if we do have one we can only have one title.
This construct means that we must have at least one title element in our head element, but we can have as many tile elements as we want.
This construct means that we needn't have any title elements in our head element, but we can have as many tile elements as we want.
Obviously we want only one tile, and we want to make a title compulsory, and that is why we have chosen the form we have chosen.
Constraining element order
Similarly with the root element we have severely constrained the element content
This construct means that we can only have one head element, and one body element, and they must be in that order. This is in fact what we want, but consider the body element. What would have happened if allowed content was written as follows
This would mean that we could only have one h1 element, and one p element and they must occur in that order. Obviously we would like to have as many of these elements as we want, and they should be able to occur in any order. To do this we must make use of the 'pipe-stem ' separator (|) combined with an asterix which allows one or several of the elements.
The pipe stem operator tells the parser "either or", the asterix outside the parentheses says 'repeat this pattern as often as you want or not at all'.
For the record here are the constraints that other combinations would place on the parser.
Have either a single 'p' element OR a single 'h1' element.
Have as many p elements as your want OR as many h1 elements as you want
Have as many h1 followed by a p as you want
I.e.
firstheading
first paragraph
secondheading
second paragraph
third heading
third paragraph
etc.
Summary of element content constraints
* A comma separator constrains element content to the exact order that they appear.
* A pipe stem separator means either ..or ..or
* No suffix means once and once only
* A 'hook' or query means once or not at all.
* A plus or addition sign means at least once and then as often as you want
* An asterix or star means not at all or as many as times as you want.
Parsed character data content
The way we have written our dtd, the title element, the h1 element, and the em elements can only contain what is known as parsed character data. In simple words this means plain text! In other words these elements cannot contain any mark up, just text.
The following syntax is also correct
but the former is preferred.
Mixed content
When an element can take both parsed character data, and markup the content declaration must take a special form. This is called mixed content, and the general syntax is
(#PCDATA|[1st element type]| [2nd element type]..etc)*
We have one element that takes mixed content, the 'p' element that can take both PCDATA and the em element.
If we want our elements to contain mixed content this is the only permissible way to mark up the content model.
empty elements
You will have noticed that we don't have any empty elements in our html-lite document type. Let's rectify that now and add an 'img' element. We will want the img element to be able to appear any number of times in both the 'p' and the 'body' elements.
Try it out. Adding an empty element to html-lite.dtd
Make the following changes to the dtd
What's going on
We have added an element declaration for img. Notice the general syntax for an empty element declaration.
in our case for the img element
We also added the img element to the permitted content of the body element
and to the p element
which means that body can now take any number of p, h1, or img elements, and p can take any PCDATA and any number of img, and em elements.
The ANY content
We can also declare an element to have 'ANY' content, in which case it can take PCDATA plus any of the declared elements in any number or in any order. This really puts no constraints on content and is really only of use in the developmental stage of a project. The syntax is
Now that we have looked at how to declare elements, and how to mandate their content, let's look at how to add attribute declarations to the DTD.
Attribute declarations
We would like to add an 'align' attribute to both our 'h1' and our 'p' elements. We would also like to add an 'href' and an 'alt' attribute to our 'img' element.
We want the 'href' and the 'alt' attribute to be able to take a text string, and we would like the align attribute to be able to take the following values 'left', 'right', and 'center'.
We would also like to be able to add an id attribute of the type 'ID' to our 'h1' and to 'p' elements.
Let's see how we go about this.
General syntax for attribute declarations
Attributes for an element can be declared anywhere in the DTD after the parent element declaration. However the usual place to do it is right after the element declaration. The general syntax for an attribute declaration is.
There must be white space between each of the productions.
* The 'element name' is the name of the element that is going to take these attributes.
* The attribute name is the name of the attribute. The attribute name must be a 'name production'. The rules for naming attributes were discussed in the 'well formed ' section of this chapter.
* 'attribute type' can be one of three types, a string type, a tokenized type, or an enumerated type.
* The 'default declaration' is either default value or one of these three keywords. #REQUIRED, #IMPLIED, OR #FIXED.
* We can declare the contents of several attributes (provided they are on the same element!) using a single
Add the following line some where in the body of to html-lite2.xml.
HTML-lite example 1
This is a paragraph with some emphasized text.
Save it as html-lite3.xml and run it through the parser. You will get the following error message.
parsing:
C:\BOOK_XHT\CH8\EXAMPLES\html-lite3.xml
Parsing in the validating mode.
The above document does not parse. The following information is available about this error. Fix the error and try again.
The error is:
Required attribute 'alt' is missing.
The error occured in the following document:
file://C:\BOOK_XHT\CH8\EXAMPLES\html-lite3.xml
at line number:
10
The character position is:
25
The text of the line is
The absolute file position of the error is:
262
When we declared our alt attribute we said that it was required, and we ommited to add it. Change line 10 to read
and re-parse. You should now get no errors.
What's going on.
Enumerated attribute(1)
When we declare the attribute align for the h1 element we are in fact declaring an enumerated attribute type. We give a choice of three values separated by a pipe-stem. As with element declarations this means either or.
In this case we have given a literal value for our default declaration, so if in fact no attribute is put on the h1 element, the parser will assume a value of "left". Note the following two points.
* The default value must be quoted.
* The default value must be one of the enumerated values!
Enumerated attribute(2)
With the p element we have used a different default declaration. We have used the XML keyword #IMPLIED.
This signals to the parser that the attribute can be put there at the discretion of the author. If the attribute is omitted then no value is assumed.
In fact to give different defaults to the same attribute is very bad document architecture. We have done it here for illustration purposes. In the real xhtml dtd 'align' always has a default declaration of #IMPLIED.
CDATA attribute
CDATA in this case just means a string of text. The rules for what may be in this string were explained earlier in this chapter. To sum up CDATA can consist of any character except & and < or the same kind of quotation mark that encloses the value.
In this case the default declaration has been given a value of #REQUIRED. If the validating parser does not see an attribute and its value (The value could just be an empty quote!) it will flag an error as we saw earlier.
Tokenized attributes.
For most practical purposes a tokenized attribute will be of the type ID. Remember that an attribute with the type ID must have a unique value through out the document, and the value must match the "name" production.
In our html-lite DTD we have given id attributes to both our h1 and our p elements.
id ID #IMPLIED
Note the following points.
* It is traditional to give an element of the type ID the name 'id' but this is not compulsory.
* The value of the attribute of the type ID must be unique in the document.
* The value must match the 'name' production (i.e. start with a letter or an underscore and contain only alphanumeric content and the characters "-|_|:|." Remember that the colon is reserved for w3c usage.
In fact it can take any UNICODE extender or combining character. These are laid out in the XML spec, but is out of scope for a beginners book.
* An element can only have one attribute of the type ID. (Obvious when you think about it).
* An ID attribute can only take a default value of #IMPLIED or #REQUIRED. (Also obvious when you think about it).
There are other types of TOKENIZED attribute types, but they are really used, and a discussion of them is out of scope for this book. You are referred to the XML Spec or to a specialized XML book if you want to know more about the other tokenized attribute types.
We will also look at an example of adding attributes of a value ID when we have a look at parameter entities below.
#FIXED default declaration
If an attributes default declaration is said to be fixed the element is always assumed to take the default declaration. For example in the XHTML DTD the pre element always takes the 'preserve' value for xml:space.
Now a conforming XML browser would actually open up my draw program every time it came across a vector image.
If you want to know more about notations we would refer you to a book on XML such as XML Applications.
Entities, Entity references and Entity declarations
An entity is a storage unit. A full treatment of entities is beyond the scope of this book, but we will briefly look at character entities, general entities, and parameter entities.
Let's start with a quick look at what is meant by the above terms.
Entity
The entity is the actual storage unit that contains material. In fact the complete document is an entity called a 'document entity'. Entities have names, and the name given to the entity must match the 'name production'.
Entity reference
An entity is referred to by an 'entity reference'. This may be a character entity reference, a general entity reference, or in the DTD a parameter entity reference. When a parser comes across an entity reference, it will expand it so that the reference is replaced by the content as set forth in the entity declaration.
Entity declaration
Entities other than character entities have to be declared, and they are declared in the DTD. They can be declared anywhere in the DTD, but it is usual to put general declarations at the end of the DTD, and parameter entities at top of the DTD. As we will see parameter entities have to be declared before they are referenced.
Let's have a look at the different kind of entities
Character Entities
A character entity references a single Unicode character by using its Unicode number. This can be either a decimal number or a hexadecimal number.
The general syntax for this is:
[Decimal Unicode number];
or
[Hexadecimal Unicode number];
This can be very useful for certain symbols. For example accented characters, the copyright symbol (Unicode decimal number 169, hexadecimal number a9), or the registered trademark symbol (Unicode number 174, hexadecimal number ae)
UNICODE
The latin character set can be represented by the ASCII numbers from 0 to 255. Each number represents one bit. However if we want to represent the character sets of other languages then we need far more than the 255 characters represented by ASCII. XML uses UNICODE characters. Each character is represented by 2 bits and is known as a 'wide character'. Because of this UNICODE can represent 256 x 256 or 65536 characters. Until we start trading with the Klingons (their written language is notoriously complex. That's why they are always in such an ugly mood and fighting.) this should be sufficient to represent the character sets of this world!
A good introduction to Unicode is given at:
http://www.unicode.org
Try it out. Using Character entities in an XHTML document.
Fire up your editor and type the following.
Examples of character entities.
Examples of character entities.
This is the registered trademark symbol, unicode decimal number '174'. ®
as containing a hexadecimal reference, and displays it accordingly.
CAUTION.
Older HTML browsers may not recognize hexadecimal entity references, nor may they recognize decimal numbers greater than 255. Therefore use these with caution in your XHTML pages.
General Entities
A general entity allows an entity reference in our document to refer to some text in our dtd. The entity is declared in the DTD using an entity declaration.
The entity declaration
The entity declaration must be declared in the DTD. It takes the following syntax
Now when ever the entity name is referenced in the body of the document the parse will substitute the text in the entity declaration for the reference.
The entity name must be a 'name production'
The entity reference
An entity is referenced with the following syntax.
&[entity name];
Note there are no spaces between the ampersand, the entity name, and the colon. Lets try this out.
Try it out. Using an entity reference
Make the following changes to you basic html-lite document and save it as entity2.xml
]>
HTML-lite example 1
Note that we have deliberately given the id a bad name and duplicated it!
Run it through the parser. Here is the error message we will recieve.
The error is:
A name was started with an invalid character.
The error occured in the following document:
file://C:\BOOK_XHT\CH8\EXAMPLES\html-lite4.xml
at line number:
9
The character position is:
11
The text of the line is
This is a paragraph with some emphasized text.
The absolute file position of the error is:
184
Fix that error by adding an underscore in front of the number and re run. Now we get this error message.
The error is:
The ID '_1' is duplicated.
Now give both of them a unique name and they will validate.
What's going on
Declaring the entity
Parameter entities, like general entities, have to be declared before they are referenced.
note the following.
* The beginning of a parameter entitity declaration is announced in just the same way as the beginning of the general entity reference by the keyword '
Validating XML Parser
XML parser
This parser uses the microsoft XML validating parser built into IE5.
You must be running IE5 or better to use this parser.
Type in the path name or browse for the file you wish to parse.