The lowest-level layer offered by itools.xml is an event driven parser. See this usage example:
>>> from itools.xml import (XMLParser, START_ELEMENT,
... END_ELEMENT, TEXT)
>>>
>>> data = 'Hello <em>Baby</em>'
>>> for type, value, line in XMLParser(data):
... if type == START_ELEMENT:
... tag_uri, tag_name, attributes = value
... print 'START TAG :', tag_name
... elif type == END_ELEMENT:
... tag_uri, tag_name = value
... print 'END TAG :', tag_name
... elif type == TEXT:
... print 'TEXT :', value
...
TEXT : Hello
START TAG : em
TEXT : Baby
END TAG : em
This example just prints a message to the standard output each time the start of an element, the end of an element or a text node is found.
The parser returns a list of events, where every event is a tuple of three values: the event type, the value (which depends on the event type) and the line number. The events implemented are:
Event
Value
XML_DECL
(version, encoding, standalone)
DOCUMENT_TYPE
(name, doctype)
START_ELEMENT
(tag uri, tag name, attributes)
END_ELEMENT
(tag uri, tag name)
TEXT
value
COMMENT
value
PI
(name, value)
CDATA
value
All values (text nodes, comments, attribute values, etc.) are returned as byte strings, in the source encoding. doctype is an instance of a DocType object.
The element attributes are returned as a dictionary where the key is a tuple of the namespace URI and the local name of the attribute, and the value is the value of the attribute.
For example, when processing the XML fragment:
<x xmlns="namespace1" xmlns:n2="namespace2" >
<test a="1" n2:b="2" />
</x>
For the tag "test", the parser will return the attributes this way:
('namespace1', 'test',
{('namespace2', 'b'): '2', (None, 'a'): '1'})
The parser always resolves the element and attribute prefixes and returns the namespace URIs instead. The namespace declarations are returned as attributes.