Ok, so at least the reading of the metadata cache file for planning purposes is relatively
quick – 210ms.
I assume you have confirmed that it is a planning time issue by either looking at the query
profiles ‘start time’ or by running an EXPLAIN command (which only does planning, no execution).
Given that, I would suggest enabling the DEBUG logging in the logback.xml to see where that
12 min gap is coming from.
Here’s a snippet of the logback.xml to enable debug logging at the top level package. (additional
information at https://drill.apache.org/docs/modify-logback-xml/)
<logger name="org.apache.drill" additivity="false">
<level value="debug" />
<appender-ref ref="SOCKET" />
</logger>
On 2/13/17, 12:34 PM, "David Kincaid" <kincaid.dave@gmail.com> wrote:
Sorry about the S3 confusion in the query. s3.cisexport.transactionView is
just a view. The actual data resides on HDFS. At dfs.`parquet/transaction`.
Using that in the query itself gives the same result.
The only long gap we see in the log is between these two lines:
2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0
ms, numFiles: 1
2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
Here is most of the raw log output (I bolded the two lines with the 12
minute gap):
017-02-13 19:30:23,666 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.drill.exec.work.foreman.Foreman - Query text for query id
275df6b0-09e6-812b-e75a-b8722138eeb8: select fltb1.SAPID, yearmo,
COUNT(*) as totalcnt,
count(distinct(CASE
WHEN
(REPEATED_CONTAINS(fltb1.classLabels,
'Thing:Service:MedicalService:Diagnostic:Radiology:Ultrasound.*'))
THEN fltb1.invoiceId
END)) as ultracount,
count(distinct (CASE
WHEN
(REPEATED_CONTAINS(fltb1.classLabels,
'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
THEN fltb1.invoiceId
END)) as labcount
from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(transactionDate,
'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
from s3.cisexport.transactionView) fltb1
group by fltb1.sapid, yearmo
LIMIT 20
2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.c.s.persistence.ScanResult - loading 6 classes for
org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:30:24,261 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:30:28,021 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0
ms, numFiles: 1
2017-02-13 19:30:28,022 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0
ms, numFiles: 1
2017-02-13 19:30:31,027 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.exec.store.parquet.Metadata - Took 210 ms to read metadata from
cache file
2017-02-13 19:30:31,028 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.exec.store.parquet.Metadata - No directories were modified. Took 1
ms to check modification time of 1 directories
2017-02-13 19:30:31,043 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0
ms, numFiles: 1
*2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
0 ms, numFiles: 1*
*2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
org.apache.drill.exec.store.dfs.FormatPlugin took 0ms*
2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:42:05,883 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:11:123]
INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:11:123]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:11:123]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:11:123]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:42:05,948 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:127]
INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:127]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:127]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:127]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:112]
INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for
org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:112]
INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for
org.apache.drill.common.logical.FormatPluginConfig took 0ms
On Mon, Feb 13, 2017 at 2:22 PM, Aman Sinha <asinha@mapr.com> wrote:
> In your drillbit.log file, can you look for the entries for the foreman
> node to see where the time is being spent ?
> e.g entries of the following type:
> [275dec51-fcc1-f1bf-cb2f-57a838805a82:foreman] INFO
> o.a.d.exec.store.parquet.Metadata - Took 64 ms to read metadata from
> cache file
>
> Each entry is timestamped, so if you see a long gap between two of them,
> that will give an idea about what’s going on.
> I am not familiar with how the S3 is set up. My guess is there is some
> latency issues there that could be causing it but let’s first get the log
> output.
>
> -Aman
>
> On 2/13/17, 12:05 PM, "David Kincaid" <kincaid.dave@gmail.com> wrote:
>
> We've just recently started working with Drill and I'm seeing something
> that doesn't seem right and I'm not sure how to troubleshoot. We have
> 100
> Parquet files which are each about 400MB each using Snappy compression.
> While trying to query this data I am seeing extraordinary planning
> time for
> certain queries. The planning time is about 12 minutes and the actual
> execution of the query is less than 2 minutes.
>
> A few details of our setup. We are running Drill on an AWS EMR cluster
> on
> m4.16xlarge nodes (64 cores, 256GB each). We've given Drill an 8GB Java
> heap and 100GB Java direct memory. We have verified that the metadata
> cache
> file is being created and used. We have tried a cluster of 10 nodes
> and a
> cluster of 2 nodes with no difference in planning time or execution
> time.
>
> Does anyone have some pointers on troubleshooting excessive planning
> time?
> It seems like we must have something misconfigured or are missing
> something. We're very new to Drill and I think I've exhausted all my
> troubleshooting ideas so far. Any tips anyone can provide?
>
> Here is the main query I've been experimenting with, so you can get a
> feel
> for the query complexity:
>
> select fltb1.sapId, yearmo,
> COUNT(*) as totalcnt,
> count(distinct(CASE
> WHEN
> (REPEATED_CONTAINS(fltb1.classLabels,
> 'Thing:Service:MedicalService:Diagnostic:Radiology:Ultrasound.*'))
> THEN fltb1.invoiceId
> END)) as ultracount,
> count(distinct (CASE
> WHEN
> (REPEATED_CONTAINS(fltb1.classLabels,
> 'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
> THEN fltb1.invoiceId
> END)) as labcount
> from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(transactionDate,
> 'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
> from s3.cisexport.transactionView) fltb1
> group by fltb1.sapId, yearmo;
>
> Thanks,
>
> Dave
>
>
>
|