Yes, the log confirmed that the planning, especially physical
planning, is the one that took most of the time.
If the definition of view s3.cisexport.transactionView is not very
complicated (involves large # of tables), then it's possible that some
planner rules have a bug. (In the past, we once saw couple of planner
rules would be fired in a loop).
Is it possible that you can share the DDL of the view? That may help
us re-produce the problem and take a look at the trace of Calcite,
which Drill uses as the query planner.
On Mon, Feb 13, 2017 at 1:35 PM, David Kincaid <kincaid.dave@gmail.com> wrote:
> Yes, we did confirm that it's planning time issue by both looking at the
> query profile and running the EXPLAIN.
>
> We ran with DEBUG on (thanks for the easy to follow instructions). That
> produces a lot of output and I'm not sure how to interpret most of it.
> However we do see these lines that seem to be the important ones around
> planning:
>
> 2017-02-13 21:04:07,285 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Logical Planning (no
> pruning or join). (2655ms)
>
> 2017-02-13 21:04:08,342 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Partition Prune
> Planning (1057ms):
>
> 2017-02-13 21:04:12,449 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:LOPT Join Planning
> (4106ms):
>
> 2017-02-13 21:04:13,270 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Convert SUM to $SUM0
> (821ms):
>
> *2017-02-13 21:14:36,554 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning
> (623281ms):*
>
> 2017-02-13 21:14:38,281 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Physical Partition
> Prune Planning (1211ms):
>
> So that bolded one - VOLCANO:Physical Planning - is the one that seems to
> be dominating the planning time. Does that mean anything to those who
> understand the inner workings of Drill?
>
> - Dave
>
> On Mon, Feb 13, 2017 at 2:52 PM, Aman Sinha <asinha@mapr.com> wrote:
>
>> Ok, so at least the reading of the metadata cache file for planning
>> purposes is relatively quick – 210ms.
>> I assume you have confirmed that it is a planning time issue by either
>> looking at the query profiles ‘start time’ or by running an EXPLAIN command
>> (which only does planning, no execution).
>> Given that, I would suggest enabling the DEBUG logging in the logback.xml
>> to see where that 12 min gap is coming from.
>>
>> Here’s a snippet of the logback.xml to enable debug logging at the top
>> level package. (additional information at https://drill.apache.org/docs/
>> modify-logback-xml/)
>>
>> <logger name="org.apache.drill" additivity="false">
>> <level value="debug" />
>> <appender-ref ref="SOCKET" />
>> </logger>
>>
>>
>> On 2/13/17, 12:34 PM, "David Kincaid" <kincaid.dave@gmail.com> wrote:
>>
>> Sorry about the S3 confusion in the query.
>> s3.cisexport.transactionView is
>> just a view. The actual data resides on HDFS. At
>> dfs.`parquet/transaction`.
>> Using that in the query itself gives the same result.
>>
>> The only long gap we see in the log is between these two lines:
>>
>> 2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
>> took 0
>> ms, numFiles: 1
>> 2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>>
>> Here is most of the raw log output (I bolded the two lines with the 12
>> minute gap):
>>
>> 017-02-13 19:30:23,666 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.drill.exec.work.foreman.Foreman - Query text for query id
>> 275df6b0-09e6-812b-e75a-b8722138eeb8: select fltb1.SAPID, yearmo,
>> COUNT(*) as totalcnt,
>> count(distinct(CASE
>> WHEN
>> (REPEATED_CONTAINS(fltb1.classLabels,
>> 'Thing:Service:MedicalService:Diagnostic:Radiology:Ultrasound.*'))
>> THEN fltb1.invoiceId
>> END)) as ultracount,
>> count(distinct (CASE
>> WHEN
>> (REPEATED_CONTAINS(fltb1.classLabels,
>> 'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
>> THEN fltb1.invoiceId
>> END)) as labcount
>> from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(transactionDate,
>> 'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
>> from s3.cisexport.transactionView) fltb1
>> group by fltb1.sapid, yearmo
>> LIMIT 20
>> 2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>> 2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:30:24,261 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:30:28,021 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
>> took 0
>> ms, numFiles: 1
>> 2017-02-13 19:30:28,022 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
>> took 0
>> ms, numFiles: 1
>> 2017-02-13 19:30:31,027 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.exec.store.parquet.Metadata - Took 210 ms to read metadata from
>> cache file
>> 2017-02-13 19:30:31,028 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.exec.store.parquet.Metadata - No directories were modified.
>> Took 1
>> ms to check modification time of 1 directories
>> 2017-02-13 19:30:31,043 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
>> took 0
>> ms, numFiles: 1
>> *2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:foreman]
>> INFO o.a.d.exec.store.dfs.FileSelection -
>> FileSelection.getStatuses() took
>> 0 ms, numFiles: 1*
>> *2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms*
>> 2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:42:05,883 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:11:123]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>> 2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:11:123]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:11:123]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:11:123]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:42:05,948 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:127]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>> 2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:127]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:127]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:127]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>> 2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:112]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>> 2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:112]
>> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>> org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>
>>
>>
>> On Mon, Feb 13, 2017 at 2:22 PM, Aman Sinha <asinha@mapr.com> wrote:
>>
>> > In your drillbit.log file, can you look for the entries for the
>> foreman
>> > node to see where the time is being spent ?
>> > e.g entries of the following type:
>> > [275dec51-fcc1-f1bf-cb2f-57a838805a82:foreman] INFO
>> > o.a.d.exec.store.parquet.Metadata - Took 64 ms to read metadata from
>> > cache file
>> >
>> > Each entry is timestamped, so if you see a long gap between two of
>> them,
>> > that will give an idea about what’s going on.
>> > I am not familiar with how the S3 is set up. My guess is there is
>> some
>> > latency issues there that could be causing it but let’s first get
>> the log
>> > output.
>> >
>> > -Aman
>> >
>> > On 2/13/17, 12:05 PM, "David Kincaid" <kincaid.dave@gmail.com>
>> wrote:
>> >
>> > We've just recently started working with Drill and I'm seeing
>> something
>> > that doesn't seem right and I'm not sure how to troubleshoot. We
>> have
>> > 100
>> > Parquet files which are each about 400MB each using Snappy
>> compression.
>> > While trying to query this data I am seeing extraordinary
>> planning
>> > time for
>> > certain queries. The planning time is about 12 minutes and the
>> actual
>> > execution of the query is less than 2 minutes.
>> >
>> > A few details of our setup. We are running Drill on an AWS EMR
>> cluster
>> > on
>> > m4.16xlarge nodes (64 cores, 256GB each). We've given Drill an
>> 8GB Java
>> > heap and 100GB Java direct memory. We have verified that the
>> metadata
>> > cache
>> > file is being created and used. We have tried a cluster of 10
>> nodes
>> > and a
>> > cluster of 2 nodes with no difference in planning time or
>> execution
>> > time.
>> >
>> > Does anyone have some pointers on troubleshooting excessive
>> planning
>> > time?
>> > It seems like we must have something misconfigured or are missing
>> > something. We're very new to Drill and I think I've exhausted
>> all my
>> > troubleshooting ideas so far. Any tips anyone can provide?
>> >
>> > Here is the main query I've been experimenting with, so you can
>> get a
>> > feel
>> > for the query complexity:
>> >
>> > select fltb1.sapId, yearmo,
>> > COUNT(*) as totalcnt,
>> > count(distinct(CASE
>> > WHEN
>> > (REPEATED_CONTAINS(fltb1.classLabels,
>> > 'Thing:Service:MedicalService:Diagnostic:Radiology:
>> Ultrasound.*'))
>> > THEN fltb1.invoiceId
>> > END)) as ultracount,
>> > count(distinct (CASE
>> > WHEN
>> > (REPEATED_CONTAINS(fltb1.classLabels,
>> > 'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
>> > THEN fltb1.invoiceId
>> > END)) as labcount
>> > from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(
>> transactionDate,
>> > 'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
>> > from s3.cisexport.transactionView) fltb1
>> > group by fltb1.sapId, yearmo;
>> >
>> > Thanks,
>> >
>> > Dave
>> >
>> >
>> >
>>
>>
>>
|