drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raz Baluchi <raz.balu...@gmail.com>
Subject Re: Parquet on S3 - timeouts
Date Thu, 01 Jun 2017 21:14:50 GMT
setting

  <property>
    <name>fs.s3a.connection.maximum</name>
    <value>100</value>
  </property>

does fix the problem. No more timeouts and very quick response. No need to
'prime' the query...

On Thu, Jun 1, 2017 at 4:08 PM, Abhishek Girish <agirish@apache.org> wrote:

> Can you take a look at [1] and let us know if that helps resolve your
> issue?
>
> [1]
> https://drill.apache.org/docs/s3-storage-plugin/#quering-
> parquet-format-files-on-s3
>
> On Thu, Jun 1, 2017 at 12:55 PM, Raz Baluchi <raz.baluchi@gmail.com>
> wrote:
>
> > Now that I have Drill working with parquet files on dfs, the next step
> was
> > to move the parquet files to S3.
> >
> > I get pretty good performance - I can query for events  by date range
> > within 10 seconds. ( out of a total of ~ 800M events across 25 years)
> >  However, there seems to be some threshold beyond which queries start
> > timing out.
> >
> > SYSTEM ERROR: ConnectionPoolTimeoutException: Timeout waiting for
> > connection from pool
> >
> > My first question is, is there a default timeout value to queries against
> > S3? Anything that takes longer than ~ 150 seconds seems to hit the
> timeout
> > error.
> >
> > The second question has to do with the possible conditions that trigger
> the
> > prolonged query time. It seems that if I increase the filters beyond a
> > certain number - it doesn't take much - the query times out.
> >
> > For example the query:
> >
> > select * from events where YEAR in (2012, 2013) works fine - however,
> > select * from events where YEAR in (2012, 2013, 2014) fails with a
> timeout.
> >
> > To make it worse, I can't use the first query either  until I restart
> > drill...
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message