drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <asi...@mapr.com>
Subject Re: Drill query planning taking a LONG time
Date Mon, 13 Feb 2017 20:52:52 GMT
Ok, so at least the reading of the metadata cache file for planning purposes is relatively
quick – 210ms.   
I assume you have confirmed that it is a planning time issue by either looking at the query
profiles ‘start time’ or by running an EXPLAIN command (which only does planning, no execution).

Given that, I would suggest enabling the DEBUG logging in the logback.xml  to see where that
12 min gap is coming from.  

Here’s a snippet of the logback.xml to enable debug logging at the top level package.  (additional
information at https://drill.apache.org/docs/modify-logback-xml/)

  <logger name="org.apache.drill" additivity="false">
    <level value="debug" />
    <appender-ref ref="SOCKET" />
  </logger> 


On 2/13/17, 12:34 PM, "David Kincaid" <kincaid.dave@gmail.com> wrote:

    Sorry about the S3 confusion in the query. s3.cisexport.transactionView is
    just a view. The actual data resides on HDFS. At dfs.`parquet/transaction`.
    Using that in the query itself gives the same result.
    
    The only long gap we see in the log is between these two lines:
    
    2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0
    ms, numFiles: 1
    2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
    org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
    
    Here is most of the raw log output (I bolded the two lines with the 12
    minute gap):
    
    017-02-13 19:30:23,666 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.drill.exec.work.foreman.Foreman - Query text for query id
    275df6b0-09e6-812b-e75a-b8722138eeb8: select fltb1.SAPID, yearmo,
    COUNT(*) as totalcnt,
    count(distinct(CASE
                   WHEN
                   (REPEATED_CONTAINS(fltb1.classLabels,
    'Thing:Service:MedicalService:Diagnostic:Radiology:Ultrasound.*'))
                   THEN fltb1.invoiceId
                   END)) as ultracount,
    count(distinct (CASE
                    WHEN
                    (REPEATED_CONTAINS(fltb1.classLabels,
    'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
                    THEN fltb1.invoiceId
                    END)) as labcount
    from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(transactionDate,
    'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
          from s3.cisexport.transactionView) fltb1
    group by fltb1.sapid, yearmo
    LIMIT 20
    2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.c.s.persistence.ScanResult - loading 6 classes for
    org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
    2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:30:24,261 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:30:28,021 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0
    ms, numFiles: 1
    2017-02-13 19:30:28,022 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0
    ms, numFiles: 1
    2017-02-13 19:30:31,027 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.exec.store.parquet.Metadata - Took 210 ms to read metadata from
    cache file
    2017-02-13 19:30:31,028 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.exec.store.parquet.Metadata - No directories were modified. Took 1
    ms to check modification time of 1 directories
    2017-02-13 19:30:31,043 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] INFO
     o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0
    ms, numFiles: 1
    *2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
    INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took
    0 ms, numFiles: 1*
    *2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
    org.apache.drill.exec.store.dfs.FormatPlugin took 0ms*
    2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:109]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:42:05,883 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:11:123]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
    org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
    2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:11:123]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:11:123]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:11:123]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:42:05,948 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:127]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
    org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
    2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:127]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:127]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:127]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:112]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
    org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
    2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-b8722138eeb8:frag:5:112]
    INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
    org.apache.drill.common.logical.FormatPluginConfig took 0ms
    
    
    
    On Mon, Feb 13, 2017 at 2:22 PM, Aman Sinha <asinha@mapr.com> wrote:
    
    > In your drillbit.log file, can you look for the entries for the foreman
    > node to see where the time is being spent ?
    > e.g entries of the following type:
    >   [275dec51-fcc1-f1bf-cb2f-57a838805a82:foreman] INFO
    > o.a.d.exec.store.parquet.Metadata - Took 64 ms to read metadata from
    > cache file
    >
    > Each entry is timestamped, so if you see a long gap between two of them,
    > that will give an idea about what’s going on.
    > I am not familiar with how the S3 is set up.  My guess is there is some
    > latency issues there that could be causing it but let’s first get the log
    > output.
    >
    > -Aman
    >
    > On 2/13/17, 12:05 PM, "David Kincaid" <kincaid.dave@gmail.com> wrote:
    >
    >     We've just recently started working with Drill and I'm seeing something
    >     that doesn't seem right and I'm not sure how to troubleshoot. We have
    > 100
    >     Parquet files which are each about 400MB each using Snappy compression.
    >     While trying to query this data I am seeing extraordinary planning
    > time for
    >     certain queries. The planning time is about 12 minutes and the actual
    >     execution of the query is less than 2 minutes.
    >
    >     A few details of our setup. We are running Drill on an AWS EMR cluster
    > on
    >     m4.16xlarge nodes (64 cores, 256GB each). We've given Drill an 8GB Java
    >     heap and 100GB Java direct memory. We have verified that the metadata
    > cache
    >     file is being created and used. We have tried a cluster of 10 nodes
    > and a
    >     cluster of 2 nodes with no difference in planning time or execution
    > time.
    >
    >     Does anyone have some pointers on troubleshooting excessive planning
    > time?
    >     It seems like we must have something misconfigured or are missing
    >     something. We're very new to Drill and I think I've exhausted all my
    >     troubleshooting ideas so far. Any tips anyone can provide?
    >
    >     Here is the main query I've been experimenting with, so you can get a
    > feel
    >     for the query complexity:
    >
    >     select fltb1.sapId, yearmo,
    >     COUNT(*) as totalcnt,
    >     count(distinct(CASE
    >                    WHEN
    >                    (REPEATED_CONTAINS(fltb1.classLabels,
    >     'Thing:Service:MedicalService:Diagnostic:Radiology:Ultrasound.*'))
    >                    THEN fltb1.invoiceId
    >                    END)) as ultracount,
    >     count(distinct (CASE
    >                     WHEN
    >                     (REPEATED_CONTAINS(fltb1.classLabels,
    >     'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
    >                     THEN fltb1.invoiceId
    >                     END)) as labcount
    >     from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(transactionDate,
    >     'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
    >           from s3.cisexport.transactionView) fltb1
    >     group by fltb1.sapId, yearmo;
    >
    >     Thanks,
    >
    >     Dave
    >
    >
    >
    

Mime
View raw message