drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Apache Drill Support concurrent parallel Request
Date Wed, 08 Apr 2020 19:23:22 GMT
Another thing that user's will see when they start trying to use Drill for
concurrent queries is that Drill assumes that it is OK to spend quite a bit
of time optimizing a query before running it. Taking 500 ms to optimize the
query can be a really bad trade-off if your query only takes 100ms to run.

It is possible to tune this very differently, but that exercise is
definitely not a task for a user (or even a less-than-advanced developer).
In the MapR connection between the OJAI API to MapR DB, for instance, the
clear assumption is that queries will be relatively simple and all that
really needs to be done is look for good join ordering and make sure that
secondary indexes are used reasonably well. This meant that retuning for
fast optimization was very worthwhile.

A similar thing was done by Alibaba in their time series query engine.
There, the primary data source is a variant of Open TSDB and query costs
are dominated by the primary facts (the time series itself). Tuning the
optimizer to not think too much is a good thing.

So, could you say more about your workload so that the Drill community can
say more about what Drill will (or won't) do for you?

On Wed, Apr 8, 2020 at 12:02 PM Paul Rogers <par0328@yahoo.com.invalid>

> Hi Ramasamy,
> Let's define some terms. By "parallel requests" do you mean multiple
> people submitting queries at the same time? If so, then Drill handles this
> just fine: Drill is designed to run multiple queries from multiple users
> concurrently.
> There is a caveat. Many people run Drill in embedded mode when they get
> started. Embedded mode is a single user, single-machine setup that is great
> for testing Drill, exploring small data sets and so on. However, to support
> multiple concurrent queries, the proper way to run Drill is as a service,
> preferably across multiple machines. Further, if you are running a cluster
> of two or more machines, you need some kind of distributed file system: S3,
> Hadoop, etc.
> Once you start running concurrent queries, memory becomes an important
> consideration, especially if your JSON files are large and you are doing
> memory-intensive operations such as sorting and joins. The Drill
> documentation explains the correct configuration steps.
> Thanks,
> - Paul
>     On Wednesday, April 8, 2020, 11:00:14 AM PDT, Ramasamy Javakar <
> ramasamy@ezeeinfosolutions.com> wrote:
>  Hi, I did an analytics web application on drill, data set in json file.
> We
> are facing issues while getting multiple parallel requests. Does Apache
> Drill support concurrent requests?. Please let me know
> Thanks & Regards
> Ramasamy
> Product Manager
> EzeeInfo Cloud Solutions
> +91 95000 07269

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message