lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: Slow facet sorting - lex vs count
Date Wed, 25 Aug 2010 14:41:16 GMT
On Wed, Aug 25, 2010 at 10:07 AM, Eric Grobler
<impalaherd@googlemail.com> wrote:
> I use Solr 1.41
> There are 14000 cities in the index.
> The type is just a simple string: <fieldType name="string"
> class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
> The facet method is fc.
>
> You are right I do not need 5000 cities, I was just surprised to see this
> big difference, there are places where I do need to sort count and return
> about 500 items.
>
> If Solr was also slow in locating the highest count city it would be less
> surprising.
> In other words, if I set the limit to 1, then solr returns Berlin as the
> city with the highest count within 3ms which seems to indicate that the
> facet is internally sorted by count.
> However, the speed regresses linearly, 30ms for 10, 300ms for 1000 etc.

The priority queue collecting values will be larger of course, but in
this specific instance I bet most of  the time is being taken up in
converting from term number to term value.  Here's a snippet of a
comment from the implementation:
 *   To further save memory, the terms (the actual string values) are
not all stored in
 *   memory, but a TermIndex is used to convert term numbers to term values only
 *   for the terms needed after faceting has completed.  Only every
128th term value
 *   is stored, along with it's corresponding term number, and this is
used as an
 *   index to find the closest term and iterate until the desired
number is hit (very
 *   much like Lucene's own internal term index).

This is something that Lucene has improved in trunk, and that solr can
make improvements to also.
Besides optimizations, we could also implement options to store all
values and eliminate the need to read the index to do the ord->string
conversions.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


> Regards
> Eric
>
> On Wed, Aug 25, 2010 at 3:28 PM, Yonik Seeley <yonik@lucidimagination.com>
> wrote:
>>
>> On Wed, Aug 25, 2010 at 7:22 AM, Eric Grobler <impalaherd@googlemail.com>
>> wrote:
>> > There is a huge difference doing facet sorting on lex vs count
>> > The strange thing is that count sorting is fast when setting a small
>> > limit.
>> > I realize I can do sorting in the client, but I am just curious why this
>> > is.
>>
>> There are a lot of optimizations to make things fast for the common
>> case - and setting a really high limit makes some of those
>> ineffective.  Hopefully you don't really need to return the top 5000
>> cities?
>> What version of Solr is this? What faceting method is used? Is this a
>> multi-valued field?  How many unique values are in the city field?
>> How many docs in the index?
>>
>> -Yonik
>> http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
>>
>>
>> > FAST - 16ms
>> > facet.field=city
>> > f.city.facet.limit=5000
>> > f.city.facet.sort=lex
>> >
>> > FAST - 20 ms
>> > facet.field=city
>> > f.city.facet.limit=50
>> > f.city.facet.sort=count
>> >
>> > SLOW - over 1 second
>> > facet.field=city
>> > f.city.facet.limit=5000
>> > f.city.facet.sort=count
>> >
>> > Regards
>> > ericz
>> >
>
>

Mime
View raw message