lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Shalyminov <ishalymi...@yandex-team.ru>
Subject Re: Multiple PositionIncrement attributes
Date Thu, 25 Apr 2013 12:16:34 GMT
Thanks Jack, that's a very good option indeed.

But this method in some sense lacks precision with sentence boundaries.
Thinking of alternatives, is it possible to encode multiple values in a single int for storing
position increment and decode it the same way for SpanNearQueries, and is it not a totally
terrible idea?)

-- 
Igor

25.04.2013, 15:26, "Jack Krupansky" <jack@basetechnology.com>:
> You can use SpanNearQuery to seek matches within a specified distance.
>
> Lucene knows nothing about "sentences". But if you have an analyzer or
> custom code that artificially bumps the position to the next multiple of
> some number like 100 or 1000 when a sentence boundary pattern is
> encountered, you could use that number times n to match within n sentences,
> roughly, plus or minus a sentence or two - there is nothing to cause the
> nearness to be rounded or truncated exactly to one of those boundaries.
>
> Maybe you want two numbers: 1) sentence separation, say 1000, and 2) maximum
> sentence length, say 500. The SpanNearQuery would use n-1 times the sentence
> separation plus the maximum sentence length. Well, you have to adjust that
> for how you count sentences - is 1 the current sentence or is that 0?
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: Igor Shalyminov
> Sent: Thursday, April 25, 2013 6:54 AM
> To: java-user@lucene.apache.org
> Subject: Multiple PositionIncrement attributes
>
> Hi all!
>
> I use PositionIncrement attribute for finding words at some distance from
> each other. And I have two problems with that:
> 1) I want to search words within one sentence. A possible solution would be
> to set PositionIncrement of +INF (like 30 :) ) to the sentence break tag.
> 2) I want to use in my search both word-distance and sentence-distance
> between words (e.g. find the word "Putin" within 3 sentences after the word
> "Obama" or find the words "cheese" and "bacon" in one sentence within 3
> words of each other).
>
> For the 2nd problem, is there a way of storing multiple position information
> sources in the index and using them for searching? Say, at least choosing
> one of those for a query.
>
> --
> Best Regards,
> Igor Shalyminov
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message