lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: positional token info
Date Wed, 22 Oct 2003 00:35:02 GMT
On Tuesday 21 October 2003 17:31, Otis Gospodnetic wrote:
> > It does seem handy to avoid exact phrase matches on "phone boy" when
> > a
> > stop word is removed though, so patching StopFilter to put in the
> > missing positions seems reasonable to me currently.  Any objections
> > to that?
> So "phone boy" would match documents containing "phone the boy"?  That

Hmmh. WWGD (What Would Google Do)? :-)

> doesn't sound right to me, as it assumes what the user is trying to do.
>  Wouldn't it be better to allow the user to decide what he wants?
> (i.e. "phone boy" returns documents with that _exact_ phrase.  "phone
> boy"~2 also returns documents containing "phone the boy").

As long as phrase queries work appropriately with approximity modifiers, one
alternative (from app standpoint) would be to:

(a) Tokenize stopwords out, adding skip value; either one per stop word,
  or one for non-empty sequence of key words ( "top of the world" might
 make sense to tokenize as "top - world", "-" signifying 'hole')
(b) With phrase queries, first do exact match.
(c) If number of matches is "too low" (whatever definition of low is),
  use phrase query match with slop of 2 instead.

Tricky part would be to do the same for combination queries, where it's
not easy to check matches for individual query components.

Perhaps it'd be possible to create Yet Another Query object, that would,
given a threshold, do one or two searches (as described above), to allow
for self-adjusting behaviour?
Or, perhaps there should be container query, that could execute ordered
sequence of sub-queries, until one returns "good enough" set of matches, then
return that set (or last result(s), if no good matches) and above-mentioned 
"sloppy if need be" phrase query would just be  a special case?

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message