lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to get all matched terms in a PrefixQuery
Date Tue, 13 Sep 2016 18:55:53 GMT
You can't do this very easily, unfortuantely.

The way PrefixQuery runs is to find (globally, across the index) all
terms that have that prefix.  If there are enough of them, it goes
term by term marking the documents in a bitset, and then iterates that
bitset in the end.  So the information of which term matched which
document is long gone.

If there are few enough terms, it makes a BooleanQuery with N SHOULD
clauses, and in that limited case, since the child clauses are all
visiting the same document when it's collected, you might be able to
use the Scorer.getChildren API in a custom Collector to see (per doc
collected) which terms are "on" that one document.

You could alternatively store term vectors (but these are slow and
costly) and load them for each document and iterate the matched prefix
terms by creating a PrefixTermsEnum.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Sep 13, 2016 at 11:25 AM, Rajnish kamboj
<rajnishk7.info@gmail.com> wrote:
> Hi
>
> How can I get all matched terms of a document in PrefixQuery?
>
> Term t2 = new Term("contents", "br");
> PrefixQuery query = new PrefixQuery(t2);
>
> Suppose I have few documents with 1000 different terms.
> Search is showing me the document in which it find the br words.
>
> Now, how can I get all the br words in the document?
>
>
>
> Thanks
> Raj

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message