drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Kincaid <kincaid.d...@gmail.com>
Subject Re: Drill query planning taking a LONG time
Date Fri, 17 Feb 2017 21:50:49 GMT
My apologies for not following up sooner. Earlier this week our DevOps
engineer was looking into this problem as well and discovered the root
cause of our issue. We developed a custom storage provider that utilizes S3
as the pstore. We thought this was just storing configuration information
(esp. storage plugin config), but we discovered that it was spending a lot
of time reading files in a /temp/drill subdirectory of our S3 bucket. We
removed the custom plugin and things are running much better now.

I have one of our engineers working on this now to see where we went wrong.
My question for the list now is if you know what exactly it is doing. We
really want to be able to store the storage plugin config on S3 so that it
is persisted between restarts of the EMR cluster that we are running Drill
on. If you have any suggestions or advice it would be much appreciate.

I really appreciate all the time and patience you all showed helping us
troubleshoot this issue. I'm glad in the end that it really was something
on our end and not something more mysterious happening in Drill itself.

Thanks,

Dave

On Wed, Feb 15, 2017 at 12:37 PM, Jinfeng Ni <jni@apache.org> wrote:

> Can you help try one more thing if you can?
>
> Run jstack on the foreman Drillbit process while the query is doing
> the query planning. Capture the jstack every one second or couple of
> seconds consecutively for some time, by appending the jstack output
> into one log file. Take a look at the stack trace for the forman
> thread in the form of "275b623b-bb15-8bd8-fd29-f9a571a7534e:foreman"
> (The first part is the query ID).  If foreman thread is stuck in one
> method call, it  may show up in the log repeatedly.  In this way we
> may have a better idea what's the cause of the problem.
>
> Based on the tests you tried, the combination of the query / parquet
> files probably hit a bug in the code that we are not aware of
> currently. Without the parquet files to re-reproduce, it's hard to
> debug the issue and find a possible fix.
>
>
>
> On Wed, Feb 15, 2017 at 8:35 AM, David Kincaid <kincaid.dave@gmail.com>
> wrote:
> > I ran that EXPLAIN that you suggested against the entire 100 file table
> and
> > it takes about 3 seconds. I will try to get a defect written up in the
> next
> > few days.
> >
> > - Dave
> >
> > On Tue, Feb 14, 2017 at 9:06 PM, Jinfeng Ni <jni@apache.org> wrote:
> >
> >> From the two tests you did, I'm inclined to think there might be some
> >> special things in your parquet files. How do you generate these
> >> parquet files? Do they contain normal data type (int/float/varchar),
> >> or complex type (array/map)?
> >>
> >> In our environment, we also have hundreds of parquet files, each with
> >> size ~ hundreds of MBs.  A typical query (several tables joined) would
> >> takes a couple of seconds in planning.
> >>
> >> One more test if you can help run.
> >>
> >> EXPLAIN PLAN FOR
> >> SELECT someCol1, someCol2
> >> FROM dfs.`parquet/transaction/OneSingleFile.parquet`;
> >>
> >> The above query is simple enough that planner should not spend long
> >> time in enumerating different choices. If it still takes long time for
> >> query planning,  the more likely cause might be in parquet files you
> >> used.
> >>
> >>
> >>
> >> On Tue, Feb 14, 2017 at 1:06 PM, David Kincaid <kincaid.dave@gmail.com>
> >> wrote:
> >> > I will write up a defect. The first test you suggested below - running
> >> the
> >> > query on just one of our Parquet files produces the same result (10-12
> >> > minute planning time). However, the second test - using
> >> > cp.`tpch/nation.parquet` - results in a planning time of only about a
> >> > minute. So, I'm not sure how to interpret that. What does that mean to
> >> you
> >> > all?
> >> >
> >> > - Dave
> >> >
> >> > On Tue, Feb 14, 2017 at 12:37 PM, Jinfeng Ni <jni@apache.org> wrote:
> >> >
> >> >> Normally, the slow query planning could be caused by :
> >> >>
> >> >> 1. Some planner rule hit a bug when processing certain operators in
> >> >> the query, for instance join operator, distinct aggregate.  The query
> >> >> I tried on a small file seems to rule out this possibility.
> >> >> 2. The parquet metadata access time. According to the long, this does
> >> >> not seem to be the issue.
> >> >> 3. Something we are not aware of.
> >> >>
> >> >> To help get some clue, can you help do the following:
> >> >> 1. run the query over one single parquet files, in stead of 100
> >> >> parquet files? You can change using
> >> >> dfs.`parquet/transaction/OneSingleFile.parquet`. I'm wondering if
> the
> >> >> planning time is proportional to # of parquet files.
> >> >>
> >> >> 2. What if you try your query by replacing
> >> >> dfs.`parquet/transaction/OneSingleFile.parquet` with
> >> >> cp.`tpch/nation.parquet` which is a small tpch parquet file (you need
> >> >> re-enable the storage plugin 'cp')? Run EXPLAIN should be fine. This
> >> >> will tell us if the problem is caused by the parquet source, or the
> >> >> query itself.
> >> >>
> >> >> Yes, please create a defect in Drill JIRA.
> >> >>
> >> >> On Tue, Feb 14, 2017 at 5:02 AM, David Kincaid <
> kincaid.dave@gmail.com>
> >> >> wrote:
> >> >> > Thank you for the feedback. It seems there is nothing more I can
do
> >> on my
> >> >> > end. What are my next steps? Shall I create a defect in the Drill
> >> Jira?
> >> >> >
> >> >> > - Dave
> >> >> >
> >> >> > On Mon, Feb 13, 2017 at 5:13 PM, Jinfeng Ni <jni@apache.org>
> wrote:
> >> >> >
> >> >> >> The size of parquet files will matter in terms of meta data
access
> >> >> >> time, which is just 212 ms according to your log file. My
> >> >> >> understanding is it does not matter too much to the overall
> planning
> >> >> >> times. That's why it probably makes sense to try over such
a small
> >> toy
> >> >> >> example.
> >> >> >>
> >> >> >> Normally the planning time for such simple query should be
much
> >> >> >> shorter than 12 minutes.  It indicates it could be caused
by a
> code
> >> >> >> bug, or something else that we are currently unaware of.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Mon, Feb 13, 2017 at 2:47 PM, David Kincaid <
> >> kincaid.dave@gmail.com>
> >> >> >> wrote:
> >> >> >> > The example in DRILL-5183 is just a very small toy example
to
> >> >> demonstrate
> >> >> >> > the bug with how Drill reads Parquet array fields. It
doesn't
> have
> >> >> >> anything
> >> >> >> > to do with this planning issue (at least I don't think
it does).
> >> Sorry
> >> >> >> if I
> >> >> >> > confused things with that reference.
> >> >> >> >
> >> >> >> > I just tried running our query directly against the table
at
> >> >> >> > dfs.`parquet/transaction` and get the same result (12
minutes of
> >> >> planning
> >> >> >> > time). I disabled the cp and s3 storage plugins that
were
> enabled
> >> so
> >> >> that
> >> >> >> > only the dfs storage plugin is enabled and the result
is the
> same.
> >> >> >> >
> >> >> >> > Is this expected for Drill to take this long in the planning
> phase
> >> >> for a
> >> >> >> > query? Is there anything else I can try or information
I could
> >> >> provide to
> >> >> >> > help identify the bug (seems like a bug to me)? I really
> appreciate
> >> >> you
> >> >> >> > guys helping out so quickly this afternoon.
> >> >> >> >
> >> >> >> > - Dave
> >> >> >> >
> >> >> >> > On Mon, Feb 13, 2017 at 4:13 PM, Jinfeng Ni <jni@apache.org>
> >> wrote:
> >> >> >> >
> >> >> >> >> I downloaded books.parquet from DRILL-5183, and created
a view
> on
> >> top
> >> >> >> >> of this single parquet file. Then, run EXPLAIN for
the query,
> and
> >> it
> >> >> >> >> completes within 1.2 seconds on Drill 1.8.0 release.
(The # of
> >> >> parquet
> >> >> >> >> files would impact the time to fetch metadata. Since
it's not
> the
> >> >> >> >> bottleneck in this case, it should not cause a big
difference).
> >> >> >> >>
> >> >> >> >> Do you see the long planning time issue for this
query only,
> or it
> >> >> >> >> happens for other queries as well? Besides the possibility
of
> >> >> planning
> >> >> >> >> rule bugs, we once saw another possible cause of
long planning
> >> issue.
> >> >> >> >> In your storage plugin configuration, if you enable
some other
> >> >> storage
> >> >> >> >> plugin (for instance, hbase, or hive etc) which are
slow to
> >> access,
> >> >> >> >> then those un-relevant storage plugin might impact
your query
> as
> >> >> well.
> >> >> >> >> You may temporarily disable those storage plugins,
and see if
> it's
> >> >> the
> >> >> >> >> cause of the problem.
> >> >> >> >>
> >> >> >> >> 0: jdbc:drill:zk=local> explain plan for
> >> >> >> >> . . . . . . . . . . . > select fltb1.sapId, yearmo,
> >> >> >> >> . . . . . . . . . . . > COUNT(*) as totalcnt,
> >> >> >> >> . . . . . . . . . . . > count(distinct(CASE
> >> >> >> >> . . . . . . . . . . . >                WHEN
> >> >> >> >> . . . . . . . . . . . >
> (REPEATED_CONTAINS(fltb1.
> >> >> >> >> classLabels,
> >> >> >> >> . . . . . . . . . . . >
> >> >> >> >> 'Thing:Service:MedicalService:Diagnostic:Radiology:
> >> Ultrasound.*'))
> >> >> >> >> . . . . . . . . . . . >                THEN fltb1.invoiceId
> >> >> >> >> . . . . . . . . . . . >                END)) as
ultracount,
> >> >> >> >> . . . . . . . . . . . > count(distinct (CASE
> >> >> >> >> . . . . . . . . . . . >                 WHEN
> >> >> >> >> . . . . . . . . . . . >
>  (REPEATED_CONTAINS(fltb1.
> >> >> >> >> classLabels,
> >> >> >> >> . . . . . . . . . . . >
> >> >> >> >> 'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
> >> >> >> >> . . . . . . . . . . . >                 THEN fltb1.invoiceId
> >> >> >> >> . . . . . . . . . . . >                 END))
as labcount
> >> >> >> >> . . . . . . . . . . . > from (
> >> >> >> >> . . . . . . . . . . . >   select sapid, invoiceId,
> >> >> >> >> . . . . . . . . . . . >         TO_CHAR(TO_TIMESTAMP(
> >> >> transactionDate,
> >> >> >> >> 'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo,
> >> >> >> >> . . . . . . . . . . . >         classLabels
> >> >> >> >> . . . . . . . . . . . >       from dfs.tmp.transactionView)
> fltb1
> >> >> >> >> . . . . . . . . . . . > group by fltb1.sapId,
yearmo;
> >> >> >> >> +------+------+
> >> >> >> >> | text | json |
> >> >> >> >> +------+------+
> >> >> >> >> | 00-00    Screen
> >> >> >> >> 00-01      Project(sapId=[$0], yearmo=[$1], totalcnt=[$2],
> >> >> >> >> ultracount=[$3], labcount=[$4])
> >> >> >> >> ....................................
> >> >> >> >> 00-09                SelectionVectorRemover
> >> >> >> >> 00-12                  Sort(sort0=[$0], sort1=[$1],
dir0=[ASC],
> >> >> >> dir1=[ASC])
> >> >> >> >> 00-15                    HashAgg(group=[{0, 1}],
> >> totalcnt=[COUNT()])
> >> >> >> >> ................................
> >> >> >> >> 00-22                          Scan(groupscan=[
> ParquetGroupScan
> >> >> >> >> [entries=[ReadEntryWithPath [path=file:/tmp/parquet/
> >> transaction]],
> >> >> >> >> selectionRoot=file:/tmp/parquet/transaction, numFiles=1,
> >> >> >> >> usedMetadataFile=false, columns=[`sapId`, `invoiceId`,
> >> >> >> >> `transactionDate`, `classLabels`.`array`]]])
> >> >> >> >>
> >> >> >> >> 1 row selected (1.195 seconds)
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Mon, Feb 13, 2017 at 1:51 PM, David Kincaid <
> >> >> kincaid.dave@gmail.com>
> >> >> >> >> wrote:
> >> >> >> >> > Here is the entire transactionView.view.drill
file. As you
> can
> >> see
> >> >> the
> >> >> >> >> view
> >> >> >> >> > itself is very simple and is just wrapping a
syntactic
> problem
> >> with
> >> >> >> the
> >> >> >> >> > array field. That's an issue I reported in Jira
under
> >> DRILL-5183 (
> >> >> >> >> > https://issues.apache.org/jira/browse/DRILL-5183)
> >> >> >> >> >
> >> >> >> >> > {
> >> >> >> >> >   "name" : "transactionView",
> >> >> >> >> >   "sql" : "SELECT `transactionRowKey`, `sapId`,
> `practiceName`,
> >> >> >> >> > `practiceCity`, `practiceState`, `practicePostalCode`,
> >> `animalId`,
> >> >> >> >> > `dateOfBirth`, `species`, `breed`, `gender`,
`status`,
> >> `ownerId`,
> >> >> >> >> > `itemType`, `classification`, `subclass`,
> `practiceDescription`,
> >> >> >> >> > `clientDescription`, `invoiceId`, `unitOfMeasure`,
> `vendorName`,
> >> >> >> >> `vaccine`,
> >> >> >> >> > `rabies`, `vaccineType`, `price`, `quantity`,
> `transactionDate`,
> >> >> >> >> > `visitReason`, `speciesCode`, `genderCode`,
> >> >> >> `t`.`classLabels`['array'] AS
> >> >> >> >> > `classLabels`\nFROM `dfs`.`/parquet/transaction`
AS `t`",
> >> >> >> >> >   "fields" : [ {
> >> >> >> >> >     "name" : "transactionRowKey",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "sapId",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "practiceName",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "practiceCity",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "practiceState",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "practicePostalCode",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "animalId",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "dateOfBirth",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "species",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "breed",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "gender",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "status",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "ownerId",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "itemType",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "classification",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "subclass",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "practiceDescription",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "clientDescription",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "invoiceId",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "unitOfMeasure",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "vendorName",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "vaccine",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "rabies",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "vaccineType",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "price",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "quantity",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "transactionDate",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "visitReason",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "speciesCode",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "genderCode",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   }, {
> >> >> >> >> >     "name" : "classLabels",
> >> >> >> >> >     "type" : "ANY",
> >> >> >> >> >     "isNullable" : true
> >> >> >> >> >   } ],
> >> >> >> >> >   "workspaceSchemaPath" : [ ]
> >> >> >> >> > }
> >> >> >> >> >
> >> >> >> >> > On Mon, Feb 13, 2017 at 3:47 PM, Jinfeng Ni
<jni@apache.org>
> >> >> wrote:
> >> >> >> >> >
> >> >> >> >> >> Yes, the log confirmed that the planning,
especially
> physical
> >> >> >> >> >> planning, is the one that took most of the
time.
> >> >> >> >> >>
> >> >> >> >> >> If the definition of view s3.cisexport.transactionView
is
> not
> >> very
> >> >> >> >> >> complicated (involves large # of tables),
then it's possible
> >> that
> >> >> >> some
> >> >> >> >> >> planner rules have a bug. (In the past,
we once saw couple
> of
> >> >> planner
> >> >> >> >> >> rules would be fired in a loop).
> >> >> >> >> >>
> >> >> >> >> >> Is it possible that you can share the DDL
of the view?  That
> >> may
> >> >> help
> >> >> >> >> >> us re-produce the problem and take a look
at the trace of
> >> Calcite,
> >> >> >> >> >> which Drill uses as the query planner.
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message