To illustrate the usage of itools.xapian, we are going to index and search the Web! I mean, a couple of pages.
To create a new (and empty) catalog we use the function make_catalog:
>>> from itools.xapian import make_catalog
>>>
>>> catalog = make_catalog('catalog_test')
The parameter is the path where the catalog will be created. The value returned by make_catalog is a catalog object, which offers an API for indexing, unindexing and searching.
Objects to be indexed must inherit from the base class CatalogAware, and implement the two methods get_catalog_fields and get_catalog_values:
>>> from itools.xapian import CatalogAware
>>> from itools.xapian import KeywordField, TextField
>>> from itools.html import HTMLFile
>>>
>>> class Document(CatalogAware, HTMLFile):
... def get_catalog_fields(self):
... return [KeywordField('url', is_stored=True),
... TextField('body')]
... def get_catalog_values(self):
... return {'url': str(self.uri), 'body': self.to_text()}
...
Now we are going to index a couple of web pages:
# Load support for the HTTP protocol
>>> import itools.http
>>>
# Index a couple of web pages
>>> for url in ['http://www.python.org', 'http://git.or.cz/']:
... document = Document(url)
... catalog.index_document(document)
...
# Save changes
>>> catalog.save_changes()
Note that all changes are made in memory, and not saved to the file system until the call to save_changes is made.
Time to search:
>>> results = catalog.search(body='python')
>>> for document in results.get_documents():
... print document.url
...
http://www.python.org
>>>