drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PROJJWAL SAHA <proj.s...@gmail.com>
Subject Re: Benchmark numbers using Drill
Date Sat, 21 Oct 2017 03:37:52 GMT
Thanks for this very useful info..

On 19 Oct 2017 11:28 pm, "Saurabh Mahapatra" <saurabhmahapatra94@gmail.com>
wrote:

> I do not think you will get such information about benchmarks from
> customers on production workloads. But from the customers I have worked
> with who have taken Drill to production, here is some information that may
> be of use to you:
>
> 1. The trend universally has been to use beefier machines for in-memory
> query engines. We see 256GB RAM and 32 cores as the most frequent
> configuration. On the network side, it is 2x10GbE.
>
> 2. The most commonly sized dedicated cluster for starting out with Drill in
> production has been around 16-20 nodes with the above configuration. I have
> several customers who have deployed this on 200+ nodes as well but in those
> scenarios, it is a service among many.
>
> 3. The concurrency we see in the above settings is a function of the size
> of the dataset and the complexity of the customer query. In general,
> Little's law holds. The smaller the chunk of work is to be processed, the
> faster will be the throughput. Our understanding of this changes further
> with the new releases of Drill where spill to disk features will make it
> more of a pessimistic execution engine. Also, the use of queues can also
> change this understanding.
>
> 4. From my company side, we do have TPCH and TPCDS benchmarks that I do
> share with customers. But such benchmarks are flawed because they come from
> the world of traditional warehousing where the competition was among
> general purpose query engines. For example, our tests show that at higher
> and higher data scale, Drill beats Impala on these benchmarks. The same is
> touted by the Hive LLAP folks as well. But they do not necessarily imply
> that it is the best tool choice for the production environment. It is a
> reason why I am resistant getting into the war of the query engines in
> which every query engine beats the other under a given set of primed
> conditions.
>
> 5. It is an absolute most that you understand the query patterns that the
> system will have to withstand with the data characteristics specific to
> your use case. I would only trust that. Big data systems are going to be
> application specific and will require tuning. Which also means that you
> have to revisit the kinds of analytics you would like your end users to
> have. Which again raises the question-what kinds of analytics truly
> generate value for the BI user?
>
> Best,
> Saurabh
>
> On Wed, Oct 18, 2017 at 10:26 PM, PROJJWAL SAHA <proj.saha@gmail.com>
> wrote:
>
> > Hi,
> >
> > Is there any public performance benchmark that users have achieved using
> > Drill in production scenarios ? It would be useful if someone can pass me
> > any links for customer user stories.
> >
> > Regards
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message