drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raz Baluchi <raz.balu...@gmail.com>
Subject Parquet on S3 - timeouts
Date Thu, 01 Jun 2017 19:55:41 GMT
Now that I have Drill working with parquet files on dfs, the next step was
to move the parquet files to S3.

I get pretty good performance - I can query for events  by date range
within 10 seconds. ( out of a total of ~ 800M events across 25 years)
 However, there seems to be some threshold beyond which queries start
timing out.

SYSTEM ERROR: ConnectionPoolTimeoutException: Timeout waiting for
connection from pool

My first question is, is there a default timeout value to queries against
S3? Anything that takes longer than ~ 150 seconds seems to hit the timeout

The second question has to do with the possible conditions that trigger the
prolonged query time. It seems that if I increase the filters beyond a
certain number - it doesn't take much - the query times out.

For example the query:

select * from events where YEAR in (2012, 2013) works fine - however,
select * from events where YEAR in (2012, 2013, 2014) fails with a timeout.

To make it worse, I can't use the first query either  until I restart

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message