drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Mahapatra <saurabhmahapatr...@gmail.com>
Subject Re: Benchmark numbers using Drill
Date Tue, 24 Oct 2017 07:22:30 GMT
Thanks Divya. Why dont you go ahead and create a JIRA (with the above info)
for this and assign it to Bridget Bevens.

I can create one but I would rather have someone from the community ask for
it.

Best,
Saurabh

On Mon, Oct 23, 2017 at 7:14 PM, Divya Gehlot <divya.htconex@gmail.com>
wrote:

> Yes a very good info which helps a lots of ppl like me who is using Drill
> as one of their production environment
> cant we share this information as recommendation to Dril users on the
> Apache Drill KB ?
>
> On 20 October 2017 at 01:58, Saurabh Mahapatra <
> saurabhmahapatra94@gmail.com
> > wrote:
>
> > I do not think you will get such information about benchmarks from
> > customers on production workloads. But from the customers I have worked
> > with who have taken Drill to production, here is some information that
> may
> > be of use to you:
> >
> > 1. The trend universally has been to use beefier machines for in-memory
> > query engines. We see 256GB RAM and 32 cores as the most frequent
> > configuration. On the network side, it is 2x10GbE.
> >
> > 2. The most commonly sized dedicated cluster for starting out with Drill
> in
> > production has been around 16-20 nodes with the above configuration. I
> have
> > several customers who have deployed this on 200+ nodes as well but in
> those
> > scenarios, it is a service among many.
> >
> > 3. The concurrency we see in the above settings is a function of the size
> > of the dataset and the complexity of the customer query. In general,
> > Little's law holds. The smaller the chunk of work is to be processed, the
> > faster will be the throughput. Our understanding of this changes further
> > with the new releases of Drill where spill to disk features will make it
> > more of a pessimistic execution engine. Also, the use of queues can also
> > change this understanding.
> >
> > 4. From my company side, we do have TPCH and TPCDS benchmarks that I do
> > share with customers. But such benchmarks are flawed because they come
> from
> > the world of traditional warehousing where the competition was among
> > general purpose query engines. For example, our tests show that at higher
> > and higher data scale, Drill beats Impala on these benchmarks. The same
> is
> > touted by the Hive LLAP folks as well. But they do not necessarily imply
> > that it is the best tool choice for the production environment. It is a
> > reason why I am resistant getting into the war of the query engines in
> > which every query engine beats the other under a given set of primed
> > conditions.
> >
> > 5. It is an absolute most that you understand the query patterns that the
> > system will have to withstand with the data characteristics specific to
> > your use case. I would only trust that. Big data systems are going to be
> > application specific and will require tuning. Which also means that you
> > have to revisit the kinds of analytics you would like your end users to
> > have. Which again raises the question-what kinds of analytics truly
> > generate value for the BI user?
> >
> > Best,
> > Saurabh
> >
> > On Wed, Oct 18, 2017 at 10:26 PM, PROJJWAL SAHA <proj.saha@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Is there any public performance benchmark that users have achieved
> using
> > > Drill in production scenarios ? It would be useful if someone can pass
> me
> > > any links for customer user stories.
> > >
> > > Regards
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message