10.2 Structure of a URI

The general structure of a URI is:

    uri = <scheme>:<scheme-specific-part>

Basically this means that there are different types of URIs, or to speak more precisely, different schemes. So the actual structure of a URI depends on its scheme.

The mailto scheme

An example of a pretty particular scheme is mailto:

    >>> r2 = get_reference('mailto:jdavid@itaapy.com')
    >>> print r2
    mailto:jdavi@itaapy.com
    >>> r2
    <itools.uri.Mailto object at 0x403f45ec>
    >>> print r2.scheme
    mailto
    >>> print r2.username
    jdavid
    >>> print r2.host    
    itaapy.com

All URI objects have the scheme variable that identifies the scheme they belong to. But the rest of the information that makes up a URI depends on that scheme, so it may be different one from another.

For the mailto scheme this information are the variables username and host.

10.2.1 Generic URIs

However, most URI schemes (like http) have the same general structure, they are called Generic URIs:

    <scheme>://<authority><absolute path>?<query>#<fragment>

As it is easy to guess a generic URI has one variables for every URI component: scheme, authority, path, query and fragment. Follows a code snippet to illustrate this:

    >>> r1
    <itools.uri.generic.Reference object at 0x403ebc4c>
    >>> print r1
    http://www.w3.org/TR/REC-xml/#sec-intro
    >>> print r1.scheme
    http
    >>> print r1.authority
    www.w3.org
    >>> print r1.path
    /TR/REC-xml/
    >>> print r1.query
    {}
    >>> print r1.fragment
    sec-intro

Now we are going to quickly see each of these components.

The Scheme

Identifies the type of URI. Typically it will define the method or protocol used to reach the resource: HTTP, FTP, etc.

The Authority

Defines the server address (hostname and port) where the resource is. And maybe the user information required to access the resource:

    authority = [<userinfo>@]<hostport>

Schemes like file don’t have an authority.

Absolute path

Within the scope of the authority, the resources are organized in a tree structure, so the path identifies the resource within the tree. It consists of a sequence of segments:

    absolute path = /<relative path>
    relative path = <segment>[/<relative path>]

Query

While the RFC2396 does not define a structure for the Query, we have chosen to interpret it as defined by the application/x-www-form-urlencoded mimetype1, since it is most often used this way.

Fragment

The fragment is an internal reference within the resource.

Footnotes

  1. http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.4.1