drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raz Baluchi <raz.balu...@gmail.com>
Subject Re: Parquet on S3 - timeouts
Date Thu, 01 Jun 2017 20:56:35 GMT
I noticed that if I precede the query with a select count(*) with the same
filters, I no longer experience timeouts. By 'priming' the query in this
way, the second query is also faster. This seems to be an acceptable
workaround as it it seems to allow me to essentially include all partitions
in the filter and still get results pretty quickly. I am still curious why
this occurs?

On Thu, Jun 1, 2017 at 4:08 PM, Abhishek Girish <agirish@apache.org> wrote:

> Can you take a look at [1] and let us know if that helps resolve your
> issue?
>
> [1]
> https://drill.apache.org/docs/s3-storage-plugin/#quering-
> parquet-format-files-on-s3
>
> On Thu, Jun 1, 2017 at 12:55 PM, Raz Baluchi <raz.baluchi@gmail.com>
> wrote:
>
> > Now that I have Drill working with parquet files on dfs, the next step
> was
> > to move the parquet files to S3.
> >
> > I get pretty good performance - I can query for events  by date range
> > within 10 seconds. ( out of a total of ~ 800M events across 25 years)
> >  However, there seems to be some threshold beyond which queries start
> > timing out.
> >
> > SYSTEM ERROR: ConnectionPoolTimeoutException: Timeout waiting for
> > connection from pool
> >
> > My first question is, is there a default timeout value to queries against
> > S3? Anything that takes longer than ~ 150 seconds seems to hit the
> timeout
> > error.
> >
> > The second question has to do with the possible conditions that trigger
> the
> > prolonged query time. It seems that if I increase the filters beyond a
> > certain number - it doesn't take much - the query times out.
> >
> > For example the query:
> >
> > select * from events where YEAR in (2012, 2013) works fine - however,
> > select * from events where YEAR in (2012, 2013, 2014) fails with a
> timeout.
> >
> > To make it worse, I can't use the first query either  until I restart
> > drill...
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message