lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lei <simpl...@gmail.com>
Subject Re: Performance on faceting using docValues
Date Mon, 09 Mar 2015 21:24:57 GMT
The term histograms are shared in this link. Sorry for the confusion.

https://docs.google.com/presentation/d/1tma4hkYjxJfBTnMbO6Pq_dUHqZ0wI_UTlgoVqXtW4ZA/pub?start=false&loop=false&delayms=3000&slide=id.p


> On Mon, Mar 9, 2015 at 10:56 AM, Anshum Gupta <anshum@anshumgupta.net>
> wrote:
>
>> Hi Lei,
>>
>> The mailing list doesn't allow attachments. Can you share these via a file
>> sharing platform?
>>
>> On Mon, Mar 9, 2015 at 12:48 AM, lei <simplely@gmail.com> wrote:
>>
>> > The Solr instance is single-shard. Index size is around 20G and total
>> doc
>> > # is about 12 million. Below are the histograms for the three facet
>> fields
>> > in my query. Thanks.
>> >
>> >
>> > On Thu, Mar 5, 2015 at 11:57 PM, Toke Eskildsen <te@statsbiblioteket.dk
>> >
>> > wrote:
>> >
>> >> On Thu, 2015-03-05 at 21:14 +0100, lei wrote:
>> >>
>> >> You present a very interesting observation. I have not noticed what you
>> >> describe, but on the other hand we have not done comparative speed
>> >> tests.
>> >>
>> >> > q=*:*&fq=country:"US"&fq=category:112
>> >>
>> >> First observation: Your query is '*:*, which is a "magic" query. Non-DV
>> >> faceting has optimizations both for this query (although that ought to
>> >> be disabled due to the fq) and for the "inverse" case where there are
>> >> more hits than non-hits. Perhaps you could test with a handful of
>> >> queries, which has different result sizes?
>> >>
>> >> > &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000
>> >>
>> >> The combination of index order and a high limit might be an
>> explanation:
>> >> When resolving the Strings of the facet result, non-DV will perform
>> >> ordinal-lookup, which is fast when done in monotonic rising order
>> >> (sort=index) and if the values are close (limit=2000). I do not know if
>> >> DV benefits the same way.
>> >>
>> >> On the other hand, your limit seems to apply only to material, so it
>> >> could be that the real number of unique values is low and you just set
>> >> the limit to 2000 to be sure you get everything?
>> >>
>> >> > &facet.field=manufacturer&facet.field=seller&facet.field=material
>> >> >
>> >>
>> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
>> >> >
>> >>
>> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
>> >> > &f.material.facet.mincount=1&sort=score+desc
>> >>
>> >> How large is your index in bytes, how many documents does it contain
>> and
>> >> is it single-shard or cloud? Could you paste the loglines containing
>> >> "UnInverted field", which describes the number of unique values and
>> size
>> >> of your facet fields?
>> >>
>> >> - Toke Eskildsen, State and University Library, Denmark
>> >>
>> >>
>>
>>
>> --
>> Anshum Gupta
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message