lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Does DocValues improve Grouping performance ?
Date Sat, 31 Jan 2015 19:47:13 GMT
Michael,

Please check two questions inlined below

On Sat, Jan 31, 2015 at 10:14 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> We were using grouping (no DocValues, though) and recently switched to
> using block-indexing and joins (see https://cwiki.apache.org/
> confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers).
> We got a nice speedup on average (perhaps 2x faster) and an even better
> improvement in the worst times; overall the performance is much more
> predictable and better, and I suspect (haven't checked) that we may be
> using less heap too.  The block indexing is cutting edge, a little
> complicated to get right, and I had to make some custom java code to get
> things just the way I wanted, but for best performance it does seem to be
> the way to go.
>
> Beware some gotchas:
>
> You have to reindex all the docs that participate in the parent-child
> relation so that each parent-child block gets indexed at once.  This might
> cause difficulties, but for us and I suspect most people, it's the natural
> thing to do anyway.
>
> You can only handle a single relation this way since you have to
> restructure your index to use it; grouping is more flexible.
>
Michael,
would you mind to comment which relations you need to model particularly?
BJQ is definitely much restrictive than grouping, but still have some
flexibility to cover the most frequent demands.


>
> Clients may not support the new block-indexing syntax (I think SolrJ has
> it, but the python client we were using did not);
>
> Converting an existing index requires special care; you basically have to
> delete all documents you are re-indexing
>
> The Solr query parsers don't support scoring the joined-from documents
> (child docs in the to-parent query, parent docs in the to-child query).
> This might not matter to you, but it was important for our use case
>
Would you mind to leave your vote
https://issues.apache.org/jira/browse/SOLR-5662 it's not a big deal to
implement.


> So there are some kinks still, but if you can make it work for you, it
> does seem to perform better than grouping.
>
> -Mike
>
>
> On 1/30/2015 4:10 PM, Cario, Elaine wrote:
>
>> Hi Shamik,
>>
>> We use DocValues for grouping, and although I have nothing to compare it
>> to (we started with DocValues), we are also seeing similar poor results as
>> you: easily 60% overhead compared to non-group queries.  Looking around for
>> some solution, no quick fix is presenting itself unfortunately.
>> CollapsingQParserPlugin also is too limited for our needs.
>>
>> -----Original Message-----
>> From: Shamik Bandopadhyay [mailto:shamikb@gmail.com]
>> Sent: Thursday, January 15, 2015 6:02 PM
>> To: solr-user@lucene.apache.org
>> Subject: Does DocValues improve Grouping performance ?
>>
>> Hi,
>>
>>     Does use of DocValues provide any performance improvement for
>> Grouping ?
>> I' looked into the blog which mentions improving Grouping performance
>> through DocValues.
>>
>> https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/
>>
>> Right now, Group by queries (which I can't sadly avoid) has become a huge
>> bottleneck. It has an overhead of 60-70% compared to the same query san
>> group by. Unfortunately, I'm not able to be CollapsingQParserPlugin as it
>> doesn't have a support similar to "group.facet" feature.
>>
>> My understanding on DocValues is that it's intended for faceting and
>> sorting. Just wondering if anyone have tried DocValues for Grouping and saw
>> any improvements ?
>>
>> -Thanks,
>> Shamik
>>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message