lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RAUNAK AGRAWAL <agrawal.rau...@gmail.com>
Subject Re: Solr Streaming Queries Performance Issues [v7.2.1]
Date Fri, 28 Sep 2018 21:42:35 GMT
Thank you Joel. Looking forward to the latest version of solr.

Thanks

On Fri, Sep 28, 2018 at 12:22 PM Joel Bernstein <joelsolr@gmail.com> wrote:

> The facet expression is currently not as expressive as the JSON facet API.
> So for very demanding use cases you can create more highly tuned JSON facet
> API call.
>
> The good news is we are working this. And also working on other expressions
> that can be wrapped around the facet expression to implement parallelism
> and scaling. We hope to have this ready for Solr 8, which is just around
> the corner.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Sep 28, 2018 at 2:52 PM RAUNAK AGRAWAL <agrawal.raunak@gmail.com>
> wrote:
>
> > Thanks a lot Toki. I will get back to you soon regarding patch update
> after
> > having discussion with the team.
> >
> > Thanks & Regards
> >
> >
> > On Fri, Sep 28, 2018 at 11:30 AM Toke Eskildsen <toes@kb.dk> wrote:
> >
> > > RAUNAK AGRAWAL <agrawal.raunak@gmail.com> wrote:
> > >
> > > > curl http://localhost:8983/solr/collection_name/stream -d
> > > > 'expr=facet(collection_name,q="id:953",bucketSorts="week
> > > > desc",buckets="week",bucketSizeLimit=200,sum(sales),
> > > > sum(amount),sum(days))'
> > >
> > > Stats on numeric fields then.
> > >
> > > > Also in my collection, I have almost 10 Billion documents
> > > > with many deletions (close to 40%).
> > >
> > > Quite a lot of documents and in this case deletions counts, as the
> > > internal structures for the deleted documents still needs to be
> iterated.
> > > In scale this looks somewhat like our 18 billion document setup, with
> the
> > > addendum that we use quite large segments (900GB).
> > >
> > > The performance regressions we encountered with Solr 7 lead to
> > > https://issues.apache.org/jira/browse/LUCENE-8374 which helped a lot
> > > (performance testing has not finished). If you have or can easily
> create
> > a
> > > test server where your shard(s) is the same size as your production
> > shards,
> > > I'd be happy to port the patch to Solr 7.2.1 to see it it helps. I am
> > > looking for independent verification, so it is no bother.
> > >
> > > > I was planning to run optimise to merge the segments but
> > > > spoke to admin team and lucidworks guys and they were
> > > > against it saying that it will make very large segment file.
> > >
> > > If your bottleneck is the same as ours, the large segment would mean
> > worse
> > > performance (with Solr 7).
> > >
> > > > Is it true that optimise in solr should not be used, as it comes with
> > > other issues?
> > >
> > > No simple answer there. If you have an index that you update very
> rarely,
> > > it can save memory and processing power. If you have a live index where
> > you
> > > add and delete documents, it will probably be a bad idea. One strategy
> > used
> > > with time series data is to have old and immutable data in dedicated
> > > collections, which can then be optimized.
> > >
> > > - Toke Eskildsen
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message