lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shouvik Bardhan <>
Subject Re: High frequency terms in results document....
Date Thu, 19 Feb 2015 13:44:08 GMT
Thanks for your input Uchida. I will try that out. I wonder what is the
magic sauce in Luke's set of calls which allows it to create say top 100
terms even from a index with 100 million docs (small docs though for me).
Looks like it goes thru every term and puts them in a priority queue and
takes the top N.


On Thu, Feb 19, 2015 at 2:10 AM, Tomoko Uchida <
> wrote:

> Hi,
> I'm afraid there are no easy or straight way for your requirement.
> I would try create an temporary tiny index from search results on the fly
> in memory, and get top N terms from it by HighFreqTerms.
> (The logic is almost same to Luke's top N terms feature)
> I have not tried ant not sure about this is practical approach in
> performance, just an idea...
> Hope for it's help
> Tomoko
> 2015-02-16 1:58 GMT+09:00 Shouvik Bardhan <>:
> > Apologies if I have missed it in discussions prior but I looked all
> over. I
> > looked at the Luke code and it does find high frequency terms on the
> entire
> > index. I am trying to get the top N high frequency terms in the documents
> > returned from a search result. I came across something called
> > FilterIndexReader but I don't think it is part of 4.X codebase. Any
> pointer
> > is appreciated.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message