lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Smith <sheb...@gmail.com>
Subject Re: How to get all matched terms in a PrefixQuery
Date Wed, 14 Sep 2016 15:16:08 GMT
Rajnish,

Thought you should be aware of LUCENE-6229
<https://issues.apache.org/jira/browse/LUCENE-6229> which discusses the
possibility of removing the Scorer.getChildren API.

--Terry


On Tue, Sep 13, 2016 at 11:10 PM, Rajnish kamboj <rajnishk7.info@gmail.com>
wrote:

> Thanks Mike
>
> I would rather go with first approach with Scorer.getChildren API. (will
> try).
> The second approach I have thought of but you are right, it is costly.
>
> Regards
> Raj
>
> On Wednesday 14 September 2016, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
> > You can't do this very easily, unfortuantely.
> >
> > The way PrefixQuery runs is to find (globally, across the index) all
> > terms that have that prefix.  If there are enough of them, it goes
> > term by term marking the documents in a bitset, and then iterates that
> > bitset in the end.  So the information of which term matched which
> > document is long gone.
> >
> > If there are few enough terms, it makes a BooleanQuery with N SHOULD
> > clauses, and in that limited case, since the child clauses are all
> > visiting the same document when it's collected, you might be able to
> > use the Scorer.getChildren API in a custom Collector to see (per doc
> > collected) which terms are "on" that one document.
> >
> > You could alternatively store term vectors (but these are slow and
> > costly) and load them for each document and iterate the matched prefix
> > terms by creating a PrefixTermsEnum.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Tue, Sep 13, 2016 at 11:25 AM, Rajnish kamboj
> > <rajnishk7.info@gmail.com <javascript:;>> wrote:
> > > Hi
> > >
> > > How can I get all matched terms of a document in PrefixQuery?
> > >
> > > Term t2 = new Term("contents", "br");
> > > PrefixQuery query = new PrefixQuery(t2);
> > >
> > > Suppose I have few documents with 1000 different terms.
> > > Search is showing me the document in which it find the br words.
> > >
> > > Now, how can I get all the br words in the document?
> > >
> > >
> > >
> > > Thanks
> > > Raj
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message