drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: ECS parquet files query timing out
Date Sat, 28 Mar 2020 03:57:56 GMT
Hi Navin,

You had mentioned your ECS solution in an earlier note. What are you using to access data
in your container? Is your ECS container running HDFS? Or, do you have some other API?

Do you have Drill running in a container on ECS, or is that were your data is located? It
would be helpful if you could perhaps describe your setup in a bit more detail so we can offer
suggestions about where to look for an issue.

By the way: the query profile is often a good place to start. You'll find them in the Drill
Web Console. Looking at each operator you can see how much memory was used and how long things
took. Specifically, look at the time taken by the scan: is the slowness due to reading the
data, or is some other part of the query taking the time?

When you get the error, what is the stack trace? Is the error coming from some particular
HDFS client? In some particular operation?

- Paul


    On Friday, March 27, 2020, 6:59:42 AM PDT, Navin Bhawsar <navin.bhawsar@gmail.com>

We are facing performance issue where apache drill query on ecs time out
with below error "ConnectionPoolTimeoutException: Timeout waiting for
connection from pool"

However  same query works fine on hdfs single node with execution time of
2.1 sec.(planning =.483s)

Parquet file size <1.5 GB
Total parquet files scanned = 8( total 19 in directory)
Apache drill version 1.17
JDK 1.8.0_74
Total rows returned from query =71000

There are 2 drillbits running in distributed mode .
13 GB default allocated per drill bit.

Any ideas why ecs performance so bad when compared with hdfs for drill  ?
Please advise if drill provides options to optimize ecs querying .

Please let me know if you need more details.

Thanks & Regards,
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message