lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <ben...@basistech.com>
Subject Re: Exploiting a whole lot of memory
Date Thu, 10 Oct 2013 18:19:13 GMT
On Wed, Oct 9, 2013 at 7:18 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Wed, Oct 9, 2013 at 7:13 PM, Benson Margulies <benson@basistech.com>
> wrote:
> > On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >> DirectPostingsFormat?
> >>
> >> It stores all terms + postings as simple java arrays, uncompressed.
> >>
> >
> > This definitely speeded things up in my benchmark, but I'm greedy for
> more.
> >  I just made a codec that returns it as the postings guy, is that the
> whole
> > recipe?. Does it make sense to extend it any further to any of the other
> > codec pieces?
>
> Yes, that's all you should need to do (you should have seen RAM usage
> go up too, to confirm :) ).
>
> Really this just addressed one "hotspot" (decoding terms/postings from
> the index); the query matching + scoring is also costly, and if you do
> "other stuff" (highlighting, spell correction) that can be costly too
> ... what kind of queries are you running / where are the hotspots in
> profiling?
>



Profile shows a lot of time in   org.apache.lucene.search.BooleanScorer$
BooleanScorerCollector.collect(int).

We know that a typical query inspects about 1/2 of the documents in the
index.



>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message