lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Hargrave" <>
Subject Re: Are deleted words allowed in a sloppy phrase query?
Date Sat, 10 Jan 2004 00:03:29 GMT
Thanks Eric, Sorry about the personal post. Groupwise must not be posting as it should - I
see it locally but must not have gone out to the mailing list. 
>From your description I may have no choice but to hack a custom version of Lucene. I do
think that a "string edit distance" version of PhraseQuery would be benificial. If you break
your words into character ngrams it would allow you to search languages which have no easy
stemming algorithms or word boundries (like Thai, Cambodian, Laotion etc..). There are some
ngram based IR systems out there that show this works pretty good for English at least. Since
we are only interested in key word matching it does a fair job for the languages we have tried.
If anybody else has an idea that would allow me to modify PhraseQuery to do a full "String
edit distance" search I would appreciate it. 
Jim Hargrave

>>> "Erik Hatcher" <> 01/08/04 01:43PM >>>
On Jan 7, 2004, at 3:54 PM, Jim Hargrave wrote:
> Looks like I will have to implement my own PhraseQuery that uses a 
> standard string edit distance measure. What is the easiest way to do 
> this? Should I override PhraseQuery - then override the 
> SloppyPhraseScorer? I have my own query parser so I can make any 
> adjustments needed when building aquery.

Probably best to keep this on the lucene-user e-mail list, but it is 
non-trivial to implement a custom Query.   While PhraseQuery itself can 
be extended, there are several pieces it uses which are currently 
scoped at package visibility level only.

Even if you are using the built-in QueryParser, you can override the 
method that constructs the PhraseQuery.

>  BTW: We have implemented a multilingual key word in context 
> application that provides exact, stemmed and fuzzy search for ANY 
> language. Well we will have fuzzy search when I finish these 
> modifications. Lucene rules!



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message