Introduction to XML

XML, which stands for Extensible Markup Language, is a text-based format that can be used to represent any kind of data. For example, the XML representation of a student record might look like this:
<student>
     <id>123456789</id>
     <name>Bill White</name>
     <address>300 West 721 North Provo, UT 84604</address>
     <phone>(801)375-1234</phone>
     <major>Computer Science</major>
     <gpa>3.56</gpa>
</student>

XML is a text-based format (as opposed to a binary format). This provides two major benefits:
  1. XML data is portable because it is not tied to any particular application program or operating system
  2. XML data can be easily processed using just about any programming language because most languages provide facilities for processing text data
These benefits make XML ideal for sharing data with someone else. If you have some data that you want to share with a friend, all you need to do is convert the data to XML format and send the resulting XML file to your friend. After receiving the XML file, your friend can process the XML data on any kind of computer using the programming language or application program of their choice.

XML Basics

In XML, data is represented using elements where each element represents a data value. The example above contains 7 elements: student, id, name, address, phone, major, and gpa. Each element has a name (student, id, etc.) and begins with a start tag and ends with an end tag. A start tag looks like this: <elemname> and an end tag like this: </elemname>. Start tags and end tags look the same except that end tags have an extra / character after the opening <. In between an element's start tag and end tag is the element's content. An element's content may be any of the following:
  1. text data
  2. nested elements
  3. a combination of 1 and 2
For example, the content of the student element above consists entirely of other elements that are nested within it (id, name, address, phone, major, gpa). In contrast, the content of the id element is a simple text string that represents the student's ID (123456789). Although we don't show it here, an element's content can also be a combination of nested elements and text data appearing in any order.

Attributes

In addition to the content between its start tag and end tag, an element may also have attributes. An element's attributes are name/value pairs that appear within its start tag. For example, an alternative XML representation for a student record might be the following:
<student id="123456789" gpa="3.56" phone="(801)375-1234">
     <name>Bill White</name>
     <address>300 West 721 North Provo, UT 84604</address>
     <major>Computer Science</major>
</student>
In this case, the student's ID, GPA, and phone number are represented using attributes rather than nested elements. The start tag for an element may contain any number of attributes, where each attribute has the form name='value' or name="value" (you can use either double or single quotes).

Frequently, there is a choice between using an attribute or a nested element to represent a particular value. Either choice is valid, and the decision is primarily one of personal taste. As a rule of thumb, attributes are normally used to represent simple values, and nested elements are used to represent larger, more complex values.

Empty Elements

Very often XML data contains empty elements. An empty element is an element that contains no content between its start tag and end tag. For example, this is an empty element that represents a point on the Cartesian plane:

<point x="0" y="0"></point>

In this case, the element contains no content because all of the information about the element is stored in the element's attributes. It is also possible that an empty element will have neither attributes nor content between its start tag and end tag. In this case, the mere presence of the element conveys all of the necessary information.

XML provides an alternative syntax for representing empty elements, which looks like this:

<point x="0" y="0"/>

This syntax combines the start tag and end tag into a single tag (notice the / character that comes right before the closing > character). This results in a more compact representation for empty elements. Although it is more compact, this alternative syntax has precisely the same meaning as the more verbose syntax shown above (i.e., a start tag followed by an end tag).

A More Complicated Example

Because elements can be nested, XML can be used to represent complex hierarchical data structures. For example, the following XML file represents a class roll:
<class department="Computer Science" number="235" section="1">
     <instructor>
          <faculty id="1000" phone="(801)422-1111">
               <name>Joe Smart</name>
               <address>1111 TMCB</address>
               <title>Professor</title>
               <department>Computer Science</department>
          </faculty>
     </instructor>
     <students>
          <student id="111111111" gpa="3.30" phone="(801)375-11111">
               <name>Susan Green</name>
               <address>100 West 111 North Provo, UT 84604</address>
               <major>Computer Engineering</major>
          </student>
          <student id="222222222" gpa="3.73" phone="(801)375-2222">
               <name>Greg Jones</name>
               <address>200 West 222 North Provo, UT 84604</address>
               <major>Computer Science</major>
          </student>
          <student id="333333333" gpa="2.90" phone="(801)375-3333">
               <name>Randy Cox</name>
               <address>300 West 333 North Provo, UT 84604</address>
               <major>Computer Science</major>
          </student>
     </students>
     <ta>
          <student id="123456789" gpa="3.56" phone="(801)375-1234">
               <name>Bill White</name>
               <address>300 West 721 North Provo, UT 84604</address>
               <major>Computer Science</major>
          </student>
     </ta>
</class>