Can you take a look at [1] and let us know if that helps resolve your issue?
[1]
https://drill.apache.org/docs/s3-storage-plugin/#quering-parquet-format-files-on-s3
On Thu, Jun 1, 2017 at 12:55 PM, Raz Baluchi <raz.baluchi@gmail.com> wrote:
> Now that I have Drill working with parquet files on dfs, the next step was
> to move the parquet files to S3.
>
> I get pretty good performance - I can query for events by date range
> within 10 seconds. ( out of a total of ~ 800M events across 25 years)
> However, there seems to be some threshold beyond which queries start
> timing out.
>
> SYSTEM ERROR: ConnectionPoolTimeoutException: Timeout waiting for
> connection from pool
>
> My first question is, is there a default timeout value to queries against
> S3? Anything that takes longer than ~ 150 seconds seems to hit the timeout
> error.
>
> The second question has to do with the possible conditions that trigger the
> prolonged query time. It seems that if I increase the filters beyond a
> certain number - it doesn't take much - the query times out.
>
> For example the query:
>
> select * from events where YEAR in (2012, 2013) works fine - however,
> select * from events where YEAR in (2012, 2013, 2014) fails with a timeout.
>
> To make it worse, I can't use the first query either until I restart
> drill...
>
|