XML Monkey

I’m trying to come to terms with the reality that is XML. I may not like the format but that won’t change the fact that I have to interoperate with various XML data formats already in the wild. In other words, treat it like any random multimedia format. For example, suppose I want to write software to interpret the various comics that I’ve created with Taco Bell’s series of Comics Constructors CD-ROMs.


Amazon Raiders: XML Monkey, top panel


Amazon Raiders: XML Monkey, bottom panels

The comics are saved as XML files that look something like this:

[xml]








… …

[/xml]

How to even begin with this? Sometimes a good book can help. Yesterday, I found an old book from 1999 called “Just XML” by John E. Simpson. It weighs in at nearly 400 pages. I thought XML was supposed to be relatively straightforward to understand.

The book is supposed to be geared toward web programmers. I’m not a web programmer, but I do wish to know how to programmatically access this data. I have seen that Python has interfaces to libraries that parse XML. So I shoved xml-monkey.xml through the example code shown at the end of Python’s xml.parser.expat documentation. This yields:

Start element: COMIC {}
Start element: PAGE0 {u'name': u'pgt1'}
Start element: SQ1 {u'scale': u'350', u'bg': u'bg07', 
  u'mirror': u'0', u'y': u'2
  83', u'x': u'388', u'rotation': u'0'}
Start element: OBJECT {u'scale': u'100', u'name': u'ch01', 
  u'sq': u'1', u'depth': u'1', u'mirror': u'0', u'y': u'368', u'x':
  u'196', u'rotation': u'0', u'libType': u'characters'}
End element: OBJECT
Start element: OBJECT {u'scale': u'100', u'name': u'ch10', 
  u'sq': u'1', u'depth': u'2', u'mirror': u'1', u'y': u'370', u'x': 
  u'338', u'rotation': u'0', u'libType': u'characters'}
End element: OBJECT
Start element: OBJECT {u'scale': u'100', u'name': u'0', u'sq':
  u'1', u'depth': u'3', u'mirror': u'0', u'y': u'376', u'x': u'342',
  u'rotation': u'0', u'libType': u'characters'}
End element: OBJECT
Start element: OBJECT {u'scale': u'100', u'name': u'ob02', 
  u'sq': u'1', u'depth': u'4', u'mirror': u'0', u'y': u'367', u'x': 
  u'469', u'rotation': u'0', u'libType': u'objects'}
End element: OBJECT
Start element: OBJECT {u'scale': u'100', 
  u'cont': u"We might as well face it-- XML isn't going away", 
  u'name': u'bu01', u'sq': u'1', u'txtColor': u'', u'depth': u'5', u'mirror': 
  u'1', u'y': u'265', u'x': u'216', u'libType': u'bubbles', u'rotation': u'0'}
...

So that’s something. I thought XML documents were required to start with a little more boilerplate such as <?xml version=”1.0″ encoding=”UTF-8″?>. I see that there are a few levels to XML validity, the first is “well-formed” in which the document adheres to basic XML syntactic rules. Then there’s actually being “valid” which requires a document type definition to validate against. That DTD, I do not have.

But this is still a good start. I can see how I might start processing the data using Python. This is good since I am encountering more and more XML files that I’m interested in manipulating.

9 thoughts on “XML Monkey

  1. DrV

    I have in my possession a book entitled “XML Bible (Gold Edition)” from 2001 (it was given to me; I am not exactly an XML fan either) that weighs in at a hefty 1565 pages, if you count the index. It makes a great monitor stand – the binding is nearly three inches thick.

  2. Multimedia Mike Post author

    Then again, it’s probably not fair to judge the complexity of a given computer topic by the thickness of the tomes published on the topic. Some publishers can publish 1600 pages about anything, usually by reprinting the publicly-available API documentation for a language (see also: any Java book).

  3. SvdB

    Brevity isn’t everything. And you’ll warm up to XML when you get to XPath.
    And you don’t need a 400 pages book. I’d start with a simple online tutorial.

    P.S. Could you add a “preview” button to the comment form?

  4. Tomer Gabel

    Sorry to say this, but anyone who writes up over 1000 pages on XML is an idiot (and/or full of horseshit).

    XML, in its raw, simple form, is nothing much than an information interchange format. Any junior programmer can learn to use XML (conceptually as well as programmatically) in a couple of hours, and you can take your time learning XML schema, XSLT, XQuery or even none of the above. The only thing you “really” need to learn is XPath, and it makes so much intuitive sense that you’re unlikely to ever need to open up the reference.

    The whole “XML is the bomb” vs “XML is crap” debate is getting really, really old, and makes about as much sense as the VHS vs Beta argument of old.

  5. Mans

    Calling xpath intuitive is the single most stupid thing I’ve heard all week. Sure, it allows some complex things to be expressed, but INTUITIVE? No way! The same goes for xslt, which relies heavily on xpath. Both xpath and xslt are prime examples of the wrong tool being used for the job.

    I do agree about someone writing 1000 pages on xml being an idiot; it does not deserve that much attention.

  6. Tomer Gabel

    @Mans: What exactly isn’t intuitive about XPath? “/root/somePath/@someAttribute” takes about three seconds to learn. Want predicates? Right: “/root/somePath[@someAttribute=”someValue”]”.

    If you don’t use complex functions (or user-defined functions), axes and XML namespaces you can pick it up in minutes. Using any of the above is almost always unnecessary anyway.

Comments are closed.