xml-xindice-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sandy pittendrigh <sa...@cns.montana.edu>
Subject Lucene as an alternate Query Mechanism
Date Thu, 06 Mar 2003 17:43:56 GMT
I have an off-the-cuff idea I wonder if anybody else
has considered: "does it make any sense to think
about using apache::lucene as an alternate, fuzzy-search
mechanism over collections of XML files, rather than, or
in addition to xpath?"

Lucene appears to provide a way of indexing words
and word proximities in otherwise free-form text
documents. You could, for instance, use a term modifier
like ["jakarta apache" ~10]to find all the documents that
contained the fields jakarta and apache, that appear no
more than ten fields apart from each other.

To the extent this query language is useful over
completely unstructured, free-form text, it seems likely
that it (the lucene query language) would be even more
powerful operating over more regularly structured text, like XML files.

Lucene is more of a search-engine technology than a database
technololgy....where answer sets are expected to have an attractive ratio
between relevant and irrelevant data, rather than
the rigid, 100% metadata criteria matches possible with
xpath queries over XML data.

Does it make sense for projects like Xindice to have alterate,
plug-in-like ways to search and query the same datasets? Or should alterate
query technologies exist as disparate, separate software entities?

/* Sandy Pittendrigh  >--oO0>
 * sandy@cns.montana.edu
 * http://cns.montana.edu/~sandy */

View raw message