I’m trying to come to terms with the reality that is XML. I may not like the format but that won’t change the fact that I have to interoperate with various XML data formats already in the wild. In other words, treat it like any random multimedia format. For example, suppose I want to write software to interpret the various comics that I’ve created with Taco Bell’s series of Comics Constructors CD-ROMs .
The comics are saved as XML files that look something like this:
[xml]
…
…
[/xml]
How to even begin with this? Sometimes a good book can help. Yesterday, I found an old book from 1999 called “Just XML” by John E. Simpson . It weighs in at nearly 400 pages. I thought XML was supposed to be relatively straightforward to understand.
The book is supposed to be geared toward web programmers. I’m not a web programmer, but I do wish to know how to programmatically access this data. I have seen that Python has interfaces to libraries that parse XML. So I shoved xml-monkey.xml through the example code shown at the end of Python’s xml.parser.expat documentation. This yields:
Start element: COMIC {}
Start element: PAGE0 {u'name': u'pgt1'}
Start element: SQ1 {u'scale': u'350', u'bg': u'bg07',
u'mirror': u'0', u'y': u'2
83', u'x': u'388', u'rotation': u'0'}
Start element: OBJECT {u'scale': u'100', u'name': u'ch01',
u'sq': u'1', u'depth': u'1', u'mirror': u'0', u'y': u'368', u'x':
u'196', u'rotation': u'0', u'libType': u'characters'}
End element: OBJECT
Start element: OBJECT {u'scale': u'100', u'name': u'ch10',
u'sq': u'1', u'depth': u'2', u'mirror': u'1', u'y': u'370', u'x':
u'338', u'rotation': u'0', u'libType': u'characters'}
End element: OBJECT
Start element: OBJECT {u'scale': u'100', u'name': u'0', u'sq':
u'1', u'depth': u'3', u'mirror': u'0', u'y': u'376', u'x': u'342',
u'rotation': u'0', u'libType': u'characters'}
End element: OBJECT
Start element: OBJECT {u'scale': u'100', u'name': u'ob02',
u'sq': u'1', u'depth': u'4', u'mirror': u'0', u'y': u'367', u'x':
u'469', u'rotation': u'0', u'libType': u'objects'}
End element: OBJECT
Start element: OBJECT {u'scale': u'100',
u'cont': u"We might as well face it-- XML isn't going away",
u'name': u'bu01', u'sq': u'1', u'txtColor': u'', u'depth': u'5', u'mirror':
u'1', u'y': u'265', u'x': u'216', u'libType': u'bubbles', u'rotation': u'0'}
...
So that’s something. I thought XML documents were required to start with a little more boilerplate such as <?xml version=”1.0″ encoding=”UTF-8″?>. I see that there are a few levels to XML validity, the first is “well-formed” in which the document adheres to basic XML syntactic rules. Then there’s actually being “valid” which requires a document type definition to validate against. That DTD, I do not have.
But this is still a good start. I can see how I might start processing the data using Python. This is good since I am encountering more and more XML files that I’m interested in manipulating.
Post navigation
← Ramping Up On JavaScript
Eee PC And Chrome →