lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajnish kamboj <rajnishk7.i...@gmail.com>
Subject Re: How to get all matched terms in a PrefixQuery
Date Wed, 14 Sep 2016 03:10:46 GMT
Thanks Mike

I would rather go with first approach with Scorer.getChildren API. (will
try).
The second approach I have thought of but you are right, it is costly.

Regards
Raj

On Wednesday 14 September 2016, Michael McCandless <
lucene@mikemccandless.com> wrote:

> You can't do this very easily, unfortuantely.
>
> The way PrefixQuery runs is to find (globally, across the index) all
> terms that have that prefix.  If there are enough of them, it goes
> term by term marking the documents in a bitset, and then iterates that
> bitset in the end.  So the information of which term matched which
> document is long gone.
>
> If there are few enough terms, it makes a BooleanQuery with N SHOULD
> clauses, and in that limited case, since the child clauses are all
> visiting the same document when it's collected, you might be able to
> use the Scorer.getChildren API in a custom Collector to see (per doc
> collected) which terms are "on" that one document.
>
> You could alternatively store term vectors (but these are slow and
> costly) and load them for each document and iterate the matched prefix
> terms by creating a PrefixTermsEnum.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Sep 13, 2016 at 11:25 AM, Rajnish kamboj
> <rajnishk7.info@gmail.com <javascript:;>> wrote:
> > Hi
> >
> > How can I get all matched terms of a document in PrefixQuery?
> >
> > Term t2 = new Term("contents", "br");
> > PrefixQuery query = new PrefixQuery(t2);
> >
> > Suppose I have few documents with 1000 different terms.
> > Search is showing me the document in which it find the br words.
> >
> > Now, how can I get all the br words in the document?
> >
> >
> >
> > Thanks
> > Raj
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message