lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Grobler <impalah...@googlemail.com>
Subject Re: Slow facet sorting - lex vs count
Date Wed, 25 Aug 2010 14:55:25 GMT
Hi Yonik,

Thanks for the technical explanation.
I will in general try to use lex and sort by count in the client if there
are not too many rows.

Have a nice day.

Regards
ericz


On Wed, Aug 25, 2010 at 4:41 PM, Yonik Seeley <yonik@lucidimagination.com>wrote:

> On Wed, Aug 25, 2010 at 10:07 AM, Eric Grobler
> <impalaherd@googlemail.com> wrote:
> > I use Solr 1.41
> > There are 14000 cities in the index.
> > The type is just a simple string: <fieldType name="string"
> > class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
> > The facet method is fc.
> >
> > You are right I do not need 5000 cities, I was just surprised to see this
> > big difference, there are places where I do need to sort count and return
> > about 500 items.
> >
> > If Solr was also slow in locating the highest count city it would be less
> > surprising.
> > In other words, if I set the limit to 1, then solr returns Berlin as the
> > city with the highest count within 3ms which seems to indicate that the
> > facet is internally sorted by count.
> > However, the speed regresses linearly, 30ms for 10, 300ms for 1000 etc.
>
> The priority queue collecting values will be larger of course, but in
> this specific instance I bet most of  the time is being taken up in
> converting from term number to term value.  Here's a snippet of a
> comment from the implementation:
>  *   To further save memory, the terms (the actual string values) are
> not all stored in
>  *   memory, but a TermIndex is used to convert term numbers to term values
> only
>  *   for the terms needed after faceting has completed.  Only every
> 128th term value
>  *   is stored, along with it's corresponding term number, and this is
> used as an
>  *   index to find the closest term and iterate until the desired
> number is hit (very
>  *   much like Lucene's own internal term index).
>
> This is something that Lucene has improved in trunk, and that solr can
> make improvements to also.
> Besides optimizations, we could also implement options to store all
> values and eliminate the need to read the index to do the ord->string
> conversions.
>
> -Yonik
> http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8
>
>
> > Regards
> > Eric
> >
> > On Wed, Aug 25, 2010 at 3:28 PM, Yonik Seeley <
> yonik@lucidimagination.com>
> > wrote:
> >>
> >> On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler <
> impalaherd@googlemail.com>
> >> wrote:
> >> > There is a huge difference doing facet sorting on lex vs count
> >> > The strange thing is that count sorting is fast when setting a small
> >> > limit.
> >> > I realize I can do sorting in the client, but I am just curious why
> this
> >> > is.
> >>
> >> There are a lot of optimizations to make things fast for the common
> >> case - and setting a really high limit makes some of those
> >> ineffective.  Hopefully you don't really need to return the top 5000
> >> cities?
> >> What version of Solr is this? What faceting method is used? Is this a
> >> multi-valued field?  How many unique values are in the city field?
> >> How many docs in the index?
> >>
> >> -Yonik
> >> http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
> >>
> >>
> >> > FAST - 16ms
> >> > facet.field=city
> >> > f.city.facet.limit=5000
> >> > f.city.facet.sort=lex
> >> >
> >> > FAST - 20 ms
> >> > facet.field=city
> >> > f.city.facet.limit=50
> >> > f.city.facet.sort=count
> >> >
> >> > SLOW - over 1 second
> >> > facet.field=city
> >> > f.city.facet.limit=5000
> >> > f.city.facet.sort=count
> >> >
> >> > Regards
> >> > ericz
> >> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message