drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Kincaid <kincaid.d...@gmail.com>
Subject Re: Drill query planning taking a LONG time
Date Mon, 13 Feb 2017 21:35:19 GMT
Yes, we did confirm that it's planning time issue by both looking at the
query profile and running the EXPLAIN.

We ran with DEBUG on (thanks for the easy to follow instructions). That
produces a lot of output and I'm not sure how to interpret most of it.
However we do see these lines that seem to be the important ones around
planning:

2017-02-13 21:04:07,285 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Logical Planning (no
pruning or join). (2655ms)

2017-02-13 21:04:08,342 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Partition Prune
Planning (1057ms):

2017-02-13 21:04:12,449 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:LOPT Join Planning
(4106ms):

2017-02-13 21:04:13,270 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Convert SUM to $SUM0
(821ms):

*2017-02-13 21:14:36,554 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning
(623281ms):*

2017-02-13 21:14:38,281 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Physical Partition
Prune Planning (1211ms):

So that bolded one - VOLCANO:Physical Planning - is the one that seems to
be dominating the planning time. Does that mean anything to those who
understand the inner workings of Drill?

- Dave

On Mon, Feb 13, 2017 at 2:52 PM, Aman Sinha <asinha@mapr.com> wrote:

> Ok, so at least the reading of the metadata cache file for planning
> purposes is relatively quick – 210ms.
> I assume you have confirmed that it is a planning time issue by either
> looking at the query profiles ‘start time’ or by running an EXPLAIN command
> (which only does planning, no execution).
> Given that, I would suggest enabling the DEBUG logging in the logback.xml
> to see where that 12 min gap is coming from.
>
> Here’s a snippet of the logback.xml to enable debug logging at the top
> level package.  (additional information at https://drill.apache.org/docs/
> modify-logback-xml/)
>
>   <logger name="org.apache.drill" additivity="false">
>     <level value="debug" />
>     <appender-ref ref="SOCKET" />
>   </logger>
>
>
> On 2/13/17, 12:34 PM, "David Kincaid" <kincaid.dave@gmail.com> wrote:
>
>     Sorry about the S3 confusion in the query.
> s3.cisexport.transactionView is
>     just a view. The actual data resides on HDFS. At
> dfs.`parquet/transaction`.
>     Using that in the query itself gives the same result.
>
>     The only long gap we see in the log is between these two lines:
>
>     2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> took 0
>     ms, numFiles: 1
>     2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:109]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>
>     Here is most of the raw log output (I bolded the two lines with the 12
>     minute gap):
>
>     017-02-13 19:30:23,666 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.drill.exec.work.foreman.Foreman - Query text for query id
>     275df6b0-09e6-812b-e75a-b8722138eeb8: select fltb1.SAPID, yearmo,
>     COUNT(*) as totalcnt,
>     count(distinct(CASE
>                    WHEN
>                    (REPEATED_CONTAINS(fltb1.classLabels,
>     'Thing:Service:MedicalService:Diagnostic:Radiology:Ultrasound.*'))
>                    THEN fltb1.invoiceId
>                    END)) as ultracount,
>     count(distinct (CASE
>                     WHEN
>                     (REPEATED_CONTAINS(fltb1.classLabels,
>     'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
>                     THEN fltb1.invoiceId
>                     END)) as labcount
>     from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(transactionDate,
>     'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
>           from s3.cisexport.transactionView) fltb1
>     group by fltb1.sapid, yearmo
>     LIMIT 20
>     2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>     2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:30:24,261 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:30:28,021 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> took 0
>     ms, numFiles: 1
>     2017-02-13 19:30:28,022 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> took 0
>     ms, numFiles: 1
>     2017-02-13 19:30:31,027 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.exec.store.parquet.Metadata - Took 210 ms to read metadata from
>     cache file
>     2017-02-13 19:30:31,028 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.exec.store.parquet.Metadata - No directories were modified.
> Took 1
>     ms to check modification time of 1 directories
>     2017-02-13 19:30:31,043 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
> INFO
>      o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
> took 0
>     ms, numFiles: 1
>     *2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:foreman]
>     INFO  o.a.d.exec.store.dfs.FileSelection -
> FileSelection.getStatuses() took
>     0 ms, numFiles: 1*
>     *2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:109]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms*
>     2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:109]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:109]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:109]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:42:05,883 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:11:123]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>     2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:11:123]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:11:123]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:11:123]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:42:05,948 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:127]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>     2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:127]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:127]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:127]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>     2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:112]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>     2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-
> b8722138eeb8:frag:5:112]
>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>
>
>
>     On Mon, Feb 13, 2017 at 2:22 PM, Aman Sinha <asinha@mapr.com> wrote:
>
>     > In your drillbit.log file, can you look for the entries for the
> foreman
>     > node to see where the time is being spent ?
>     > e.g entries of the following type:
>     >   [275dec51-fcc1-f1bf-cb2f-57a838805a82:foreman] INFO
>     > o.a.d.exec.store.parquet.Metadata - Took 64 ms to read metadata from
>     > cache file
>     >
>     > Each entry is timestamped, so if you see a long gap between two of
> them,
>     > that will give an idea about what’s going on.
>     > I am not familiar with how the S3 is set up.  My guess is there is
> some
>     > latency issues there that could be causing it but let’s first get
> the log
>     > output.
>     >
>     > -Aman
>     >
>     > On 2/13/17, 12:05 PM, "David Kincaid" <kincaid.dave@gmail.com>
> wrote:
>     >
>     >     We've just recently started working with Drill and I'm seeing
> something
>     >     that doesn't seem right and I'm not sure how to troubleshoot. We
> have
>     > 100
>     >     Parquet files which are each about 400MB each using Snappy
> compression.
>     >     While trying to query this data I am seeing extraordinary
> planning
>     > time for
>     >     certain queries. The planning time is about 12 minutes and the
> actual
>     >     execution of the query is less than 2 minutes.
>     >
>     >     A few details of our setup. We are running Drill on an AWS EMR
> cluster
>     > on
>     >     m4.16xlarge nodes (64 cores, 256GB each). We've given Drill an
> 8GB Java
>     >     heap and 100GB Java direct memory. We have verified that the
> metadata
>     > cache
>     >     file is being created and used. We have tried a cluster of 10
> nodes
>     > and a
>     >     cluster of 2 nodes with no difference in planning time or
> execution
>     > time.
>     >
>     >     Does anyone have some pointers on troubleshooting excessive
> planning
>     > time?
>     >     It seems like we must have something misconfigured or are missing
>     >     something. We're very new to Drill and I think I've exhausted
> all my
>     >     troubleshooting ideas so far. Any tips anyone can provide?
>     >
>     >     Here is the main query I've been experimenting with, so you can
> get a
>     > feel
>     >     for the query complexity:
>     >
>     >     select fltb1.sapId, yearmo,
>     >     COUNT(*) as totalcnt,
>     >     count(distinct(CASE
>     >                    WHEN
>     >                    (REPEATED_CONTAINS(fltb1.classLabels,
>     >     'Thing:Service:MedicalService:Diagnostic:Radiology:
> Ultrasound.*'))
>     >                    THEN fltb1.invoiceId
>     >                    END)) as ultracount,
>     >     count(distinct (CASE
>     >                     WHEN
>     >                     (REPEATED_CONTAINS(fltb1.classLabels,
>     >     'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
>     >                     THEN fltb1.invoiceId
>     >                     END)) as labcount
>     >     from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(
> transactionDate,
>     >     'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
>     >           from s3.cisexport.transactionView) fltb1
>     >     group by fltb1.sapId, yearmo;
>     >
>     >     Thanks,
>     >
>     >     Dave
>     >
>     >
>     >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message