XML Demystified…
I don’t know how many times I hear the question, “I should learn XML so my web pages can be standards compliant.” Huh? XML? Oh, you mean XHTML? Yeah, you should learn that.
There is a whole group of technologies involved when you start talking about XML. There are also so many products out there that advertise its usage. It has become cool to work with XML. In computer land, it is normal for an acronym or term to catch on and be used out of context by those who don’t know better, and even by those who should know better.
Case in point. Every one wants to be on the web, but most people think of the web as what you see in your browser. They don’t understand that the web is really a combination of all the interconnected computers and is protocol indipendant. It has gotten so that when you say, “let’s surf the web”, every one (including techs) assume you mean to open up a browser and use the HTTP protocol to download web pages. While HTTP traffic accounts for a significat part of the web, it is by no means the only traffic flowing through the web. Web traffic includes email, FTP, SNMP, File Sharing, Telnet, SSH, Instant Messengers, online TV/Movies or Radio stations, and the list goes on and on.
When referring to XML, there is the same type of misunderstanding taking place, even among those who should know better. We will try and break down the seperate components that we refer to as XML and show you the differences.
First, XML is a thing all by itself. It isn’t really that exciting. It actually has a number of weekneses, and is kind of wordy which increases your data size (some times significantly). It is a text based specification and requries encoding (base 64 encoding is popular) for binary data which can again increase data size (base 64 ads one third to the size), and it requires a nice XML library to really utilize it in your software (unless you want to rewrite the code to parse and manipulate XML yourself).
Why would you use it then? The main reason is that it allows interoperability in a generic manor. It does this specifically because many ports of the XML processing library have been done. You can use XML in just about any language by plugging in free libraries. This is an important concept. People have been using CSV files for years, but you don’t find much support from language companies for this format. Most languages already have an XML library ready to go or you can get one as an add-on.
One it’s weaknesses is also it’s strengths. It is wordy. If done right, it is often self describing. It makes it easy for developers to simply read your XML file and to find ways to integrage with it.
With this little bit of background, lets actually answer the question, “What is XML?”
XML is a generic way to store data in a self describing, orderly, hierarchal manor. More importantly, people started getting together and promoting a standard which every one else has done a good job of following. This standard is what makes it so people can write software that interacts in an easy and platform independant way. This in and of itself isn’t revolutionary. People have been writing software that interacts with other software for years (web servers and web browsers). The problem with other formats is that they were too specific in their features (why do you think we still use FTP when we have web pages?) or they were too difficult for other developers figure out what went where (try parsing an ms word document even with the complete file specifications). The amazing part is that the industry got together and agreed on something and actually started supporting and using it. Now you can have java web services supplying information to .NET driven applications.
XML itself is just a generic way to store data (store meaning transfer to other software or save as a file, or manipulate in any way you need). Below is a small sample.
<ROOT>
<PERSON name="bob" age="84" height="5.0">
<OPERATION date="5/5/05" type="heart transplant" />
</PERSON>
<PERSON name="joy" age="13" height="5.4">
<OPERATION date="1/1/06" type="hip fracture" />
<OPERATION date="1/3/06" type="mole removal" />
</PERSON>
</ROOT>
The example shows how you can stack related information on people (or patients in this case). Each person gets his/her own tag. Notice that the tag opens “PERSON” and closes “/PERSON”. EVERY TAG has to open and close. This is a requirement in XML. NO EXCEPTIONS! Really. None.. Not even the OPERATION tag… Notice at the end, it has a />. The / character at the end is a short cut to close the tag. It is important to note that you can’t use the short and long hand versions at the same time.
You would use the long hand version when you have more data to nest in your current item (as one or more operations that belong to a particular person). If the current item has no child items, then you can use the short hand version (or you could use the long hand version).
Another thing about XML is it HAS to have a single ROOT element. The ROOT element can be called anything you want, but there has to be one. Why? I don’t know, look it up and get back to me. Guess it just likes a single starting point in its hierarchy.
Now that you have seen a basic XML file, you will notice that it is really just a data dump that has been formatted in a generic way. Any XML parser can now read that in and you can grab values at each node. It looks very much like HTML, but it IS NOT! There are a few rules that come with XML, and one is that all XML is well formed. This is an important concept and relates to the requirement above that each tag is closed properly and in the correct order. This is what we call well formed XML code.
When you hear about XHTML, one of the main differences between HTML and XHTML is that the XHTML be well formed also. It was common pracitce to put some tags in without a closing tag (P, BR, IMG, etc..). In XHTML, you HAVE to close these tags. You can use the short or long hand method where appropriate. There are many other enhancements that came along at the same time as the XHTML spec came out, and those really have nothing to do with XML itself. Ehancements to the DOM object model made working with page and content of that page more uniform, but most of the actual XML related features most HTML designers don’t even use.
Now XML itself is just data formatted in a generic way. There are some other tools in available that can take XML and do stuff with it. XSL is basically a style sheet for XML files. Looking at an XML file itself is boring and technical. You wouldn’t want to display results on your website using just XML. You would wrap it with rendering instructions in an XSL file that would apply styles and HTML tags around the XML data and make it into a presentable page. This doesn’t save you from doing HTML, but it can save you from having to write some programming code. XSL can sort, select individual or multiple records and format the display of the data, all with no programming. Will it take the place of languages like PHP? No way. The XML still needs to be generated some how. It could be used in conjunction with current technologies though.
XML is used in many different places, for data storage, for interprocess communications, for impressing your friends, but where is XSL used? There are actually a suite of technologies that go along with XSL including XPath (the part that allows you to query and sort XML files in an SQL like syntax). Generally when you move away from the web, you aren’t worried about the display of the XML file as much as you are worried about having the software on the other end be able to understand what you have sent. Here is an example where we used XML to make a project communicate with the server.
One client of mine has a server where data is stored and entered on a daily basis. This data is manipulated through a web page interface. This works great. At times, they need to go on the road and gather results in places where internet access is limited. We wrote an applicaiton that allows you to download the needed information from the server and take it with you. This information comes down in the form of an XML document generated by a PHP file. The PHP file adds the neede XML tags around the data and includes extra attributes that affect the display properties.
The application itself opens the XML file using the microsoft XML library which parses the XML file for us. We can then ask for the nodes we want andchange them as information is manipulated. The application also uses the extra information in the attributes that can tell it which menu to show when they right click a record, or to hide records in the GUI, or a number of other features. The nice thing about the XML output is that it is generic and extensible which allows us to ad attributes and elements as we go without it breaking everything. We don’t have to worry as much about file version (ever try and open up a word 2003 document in word 97?). As updates are released for the application, we can modify the base XML file without the program blowing up.
After hearing all this great stuff about XML, it is kind of an anti-climax to hear what XML really is. It is just a generic data container. The fact that every one in the world seems to have adopted it is what makes it so great. I can write a service that uses XML to communicate and it is easy for others to write software that can interact with my software without worriyg about languge or operating system
One area where XML is getting a lot of use is in the Ajax libraries. There is plenty of descriptions of Ajax and what it does (including on this website), but not all of them explain how XML fits in. In our previous example, we use XML to transfer database information to the applicaiton in a generic way that can be parsed and understood. Ajax does the same thing. When you add Ajax to a website, you make it behave more like an application. In doing so, you still need to make requests to the web server. These requests are sent via HTTP, but the requests still need to be understood by both parties. HTTP is like the telephone in this case and XML is like the conversation. The XML that comes back from the server can be interperated in the background and used to populate information on the web page in realtime. The XML is again, just a generic way to do this as often times the XML returned is customized for the particular application and instance. They really could have used any file format (CSV, etc..) but XML was chosen because of its generic intreface and for JavaScripts built in ability to parse and work with XML.
If you want all the other bells and whistles associated with XML, you should look further into XSL to see how easy it is to manipulate XML without being a programmer. We are actually using XSL to format a data driven printout. If I ever get done with this articile, I am off to finish it.
In any attempt to get people to adopt your product, you have to remove the barriers to entry so that it is as easy as possible. XML libraries are available for all langauges and platforms and most often now included in development tools. Browsers have built in support for XML. People have been using XML forever now and haven’t even realized it. They have done a good job of defining XML (at least for the programmers) and making it easy to use. The marketing has also been particularly good at making people aware. Even though many people don’t totally understand what it is, they know they want it. I have used it in several projects to date and have found that it can save me significant time because the libraries are so avialable and they do what we would have to do manually with our own file formats. This alone saves me enough time to adopt XML in my projects.
Ray Pulsipher
Owner
Computer Magic And Software Design