drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Divya Gehlot <divya.htco...@gmail.com>
Subject Re: Benchmark numbers using Drill
Date Tue, 24 Oct 2017 02:14:27 GMT
Yes a very good info which helps a lots of ppl like me who is using Drill
as one of their production environment
cant we share this information as recommendation to Dril users on the
Apache Drill KB ?

On 20 October 2017 at 01:58, Saurabh Mahapatra <saurabhmahapatra94@gmail.com
> wrote:

> I do not think you will get such information about benchmarks from
> customers on production workloads. But from the customers I have worked
> with who have taken Drill to production, here is some information that may
> be of use to you:
> 1. The trend universally has been to use beefier machines for in-memory
> query engines. We see 256GB RAM and 32 cores as the most frequent
> configuration. On the network side, it is 2x10GbE.
> 2. The most commonly sized dedicated cluster for starting out with Drill in
> production has been around 16-20 nodes with the above configuration. I have
> several customers who have deployed this on 200+ nodes as well but in those
> scenarios, it is a service among many.
> 3. The concurrency we see in the above settings is a function of the size
> of the dataset and the complexity of the customer query. In general,
> Little's law holds. The smaller the chunk of work is to be processed, the
> faster will be the throughput. Our understanding of this changes further
> with the new releases of Drill where spill to disk features will make it
> more of a pessimistic execution engine. Also, the use of queues can also
> change this understanding.
> 4. From my company side, we do have TPCH and TPCDS benchmarks that I do
> share with customers. But such benchmarks are flawed because they come from
> the world of traditional warehousing where the competition was among
> general purpose query engines. For example, our tests show that at higher
> and higher data scale, Drill beats Impala on these benchmarks. The same is
> touted by the Hive LLAP folks as well. But they do not necessarily imply
> that it is the best tool choice for the production environment. It is a
> reason why I am resistant getting into the war of the query engines in
> which every query engine beats the other under a given set of primed
> conditions.
> 5. It is an absolute most that you understand the query patterns that the
> system will have to withstand with the data characteristics specific to
> your use case. I would only trust that. Big data systems are going to be
> application specific and will require tuning. Which also means that you
> have to revisit the kinds of analytics you would like your end users to
> have. Which again raises the question-what kinds of analytics truly
> generate value for the BI user?
> Best,
> Saurabh
> On Wed, Oct 18, 2017 at 10:26 PM, PROJJWAL SAHA <proj.saha@gmail.com>
> wrote:
> > Hi,
> >
> > Is there any public performance benchmark that users have achieved using
> > Drill in production scenarios ? It would be useful if someone can pass me
> > any links for customer user stories.
> >
> > Regards
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message