drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@gmail.com>
Subject Re: exec.queue.enable in drill-embedded
Date Mon, 29 Jun 2020 00:52:45 GMT
Hi Avner,

Query queueing is not available in embedded mode: it uses ZK to throttle
the number of concurrent queries across a cluster; but embedded does not
have a cluster or use ZK. (If you are running more than a few concurrent
queries, embedded mode is likely the wrong deployment model anyway.)

The problem here is the use of the REST API. It has horrible performance;
it buffers the entire result set in memory in a way that overwhelms the
heap. The REST API was designed to power the Web UI for small queries of <
few hundred rows. Drill was designed assuming "real" queries would use the
ODBC, JDBC or native APIs.

That said, there is an in-flight PR designed to fix the heap memory issue
for REST queries. However, even with that fix, your client must still be
capable of handling a very large JSON document since rows are not returned
in a "jsonlines" format or in batches. If you retrieve a million rows, they
will be in single huge JSON document.

How many rows does the query return? If a few thousand or less, we can
perhaps finish up the REST fix to solve the issue. Else, consider switching
to a more scalable API.

How many rows are read from S3? Doing what kind of processing? Simple WHERE
clause, or is there some ORDER BY, GROUP BY or joins that would cause
memory use? If just a scan and WHERE clause, then the memory you are using
should be plenty - once the REST problem is fixed.

Thanks,

- Paul


On Sun, Jun 28, 2020 at 3:17 PM Avner Levy <avner.levy@gmail.com> wrote:

> Hi,
> I'm using Drill 1.18 (master) docker and trying to configure its memory
> after getting out of heap memory errors:
> "RESOURCE ERROR: There is not enough heap memory to run this query using
> the web interface."
> The docker is serving remote clients through the REST API.
> The queries are simple selects over tiny parquet files that are stored in
> S3.
> It is running on in 16GB container, configured with a heap of 8GB, and 8GB
> direct memory.
> I tried to use:
>   exec.queue.enable=true
>   exec.queue.large=1
>   exec.queue.small=1
>
> and verified it was configured correctly, but I still see queries running
> concurrently.
> In addition, the "drill.queries.enqueued" counter remains zero.
> Is this mechanism supported in drill-embedded?
>
> In addition, it seems there is some memory leak, since after a while even
> with no query running for a while, running a single tiny query still gives
> the same error.
> Any insight would be highly appreciated :)
> Thanks,
>   Avner
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message