14.6 Searching

The method search provided by catalog objects is the entry point to the search programming interface. Here is its prototype and definition:

There are two ways to define the query, either we build it and then pass it to the search method, or we use the named arguments that this method accepts.

See now an example that shows the two ways to perform the same query. Imagine we have a catalog of books that we index by the author and the title; and we want to find out all the books written by somebody called Marx that talk about money.

We can either explicitly build the query:

    >>> from itools.xapian import PhraseQuery, AndQuery
    >>>
    >>> q1 = PhraseQuery('author', 'marx')
    >>> q2 = PhraseQuery('title', 'capital')
    >>> query = AndQuery(q1, q2)
    >>> results = catalog.search(query)

Or use the named arguments:

    >>> results = catalog.search(author='marx', title='capital')

The second method is more compact, but less powerful. A query made implicitly from named arguments will always be an “and” query of one or more “phrase” queries.

If we want to make an “or” or “range” query, we need to build it explicitly.

14.6.1 Queries

Simple Queries

The two most simple queries are EqQuery and PhraseQuery:

Typically we will use phrase queries when looking for in a text field, because in this context the phrase query is a generalisation of the equal query:

  # These two are the same
  >>> EqQuery('author', 'marx')
  >>> PhraseQuery('author', 'marx')
  # This is non-sense, because 'karl marx' is not a word but two
  >>> EqQuery('author', 'karl marx')

The equal query (EqQuery) will be typically used for any other kind of fields (keyword, boolean or integer). Because a phrase query is a non-sense in this context.

To perform a EqQuery or PhraseQuery on a field, this one had to be declared indexed.

Range Queries

The simple queries seen above are for exact matches. If we want to match all values within a range we use the RangeQuery:

Let’s see an example with dates. If we index documents by their last modification time (mtime), we could search all documents that have been modified since the last week:

    >>> from datetime import date, timedelta
    >>> from itools.xapian import RangeQuery
    >>>
    >>> today = date.today()
    >>> last_week = today - timedelta(7)
    >>>
    >>> last_week = last_week.strftime('%Y-%m-%d')
    >>> query = RangeQuery('mtime', last_week, None)

Note that since we don’t have a field type for dates, we have to transform the date values to strings (the field type used would be KeywordField).

To perform a RangeQuery on a field, this one had to be declared stored.

Boolean Queries

We support three boolean queries:

Boolean queries can be combined to build very complex queries.

14.6.2 Results

Now that we have built a query and performed a search, how to retrieve the documents found? Remember that the value returned by the search method is an object, instance of the SearchResults class. This object offers two methods:

Note that to sort by a field, it must be stored (see Section 14.4).

Now let’s see again the initial example:

    >>> results = catalog.search(body='python')
    >>> for document in results.get_documents():
    ...     print document.url
    ...
    http://www.python.org
    >>>

The thing is, the documents returned are not the original objects, but instances of the Document class defined by itools.xapian. These documents offer access to the stored fields, so we can show some info to the users without having to load the original document.

And if we want to load the original document we use the external id (see Section 14.4.2):

    >>> results = catalog.search(body='python')
    >>> for document in results.get_documents():
    ...     handler = get_handler(document.url)
    ...     # Do something