16.2 Namespaces

If the parser returns always byte strings for all text nodes and attribute values, it is up to the programmer to correctly interpret them; for example to transform the value of href attributes to URI references so we can work with them more comfortably (see ChapterĀ 10 for details on URI references).

To make this task easier itools offers support, out of the box, for several common XML namespaces. One of them is XHTML:

   >>> from itools.xml import XMLParser, START_ELEMENT
   >>> from itools.xml import get_namespace
   >>> import itools.html
   >>>
   >>> data = ('<a xmlns="http://www.w3.org/1999/xhtml"'
   ...         ' href="http://www.example.com"'
   ...         ' title="Example" />')
   >>>
   >>> for type, value, line in XMLParser(data):
   ...     if type == START_ELEMENT:
   ...         tag_uri, tag_name, attributes = value
   ...         namespace = get_namespace(tag_uri)
   ...         element = namespace.get_element(tag_name)
   ...         for attr_uri, attr_name in attributes:
   ...             type = element.get_attr_datatype(attr_uri, attr_name)
   ...             attr_value = attributes[(attr_uri, attr_name)]
   ...             attr_value = type.decode(attr_value)
   ...             print attr_name, type
   ...             print repr(attr_value)
   ...             print
   xmlns <class 'itools.datatypes.primitive.String'>
   'http://www.w3.org/1999/xhtml'

   href <class 'itools.datatypes.primitive.URI'>
   <itools.uri.generic.Reference object at 0x8462d9c>

   title <class 'itools.datatypes.primitive.Unicode'>
   u'Example'

The function get_namespace will return the namespace handler for the given URI. Then we can use the get_attr_datatype method to get the datatype (see ChapterĀ 3) that will allow us to deserialize the attribute value.

The package itools.html is the one that actually implements the namespace handler for XHTML.