drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <...@apache.org>
Subject Re: Drill query planning taking a LONG time
Date Mon, 13 Feb 2017 22:13:19 GMT
I downloaded books.parquet from DRILL-5183, and created a view on top
of this single parquet file. Then, run EXPLAIN for the query, and it
completes within 1.2 seconds on Drill 1.8.0 release. (The # of parquet
files would impact the time to fetch metadata. Since it's not the
bottleneck in this case, it should not cause a big difference).

Do you see the long planning time issue for this query only, or it
happens for other queries as well? Besides the possibility of planning
rule bugs, we once saw another possible cause of long planning issue.
In your storage plugin configuration, if you enable some other storage
plugin (for instance, hbase, or hive etc) which are slow to access,
then those un-relevant storage plugin might impact your query as well.
You may temporarily disable those storage plugins, and see if it's the
cause of the problem.

0: jdbc:drill:zk=local> explain plan for
. . . . . . . . . . . > select fltb1.sapId, yearmo,
. . . . . . . . . . . > COUNT(*) as totalcnt,
. . . . . . . . . . . > count(distinct(CASE
. . . . . . . . . . . >                WHEN
. . . . . . . . . . . >                (REPEATED_CONTAINS(fltb1.classLabels,
. . . . . . . . . . . >
'Thing:Service:MedicalService:Diagnostic:Radiology:Ultrasound.*'))
. . . . . . . . . . . >                THEN fltb1.invoiceId
. . . . . . . . . . . >                END)) as ultracount,
. . . . . . . . . . . > count(distinct (CASE
. . . . . . . . . . . >                 WHEN
. . . . . . . . . . . >                 (REPEATED_CONTAINS(fltb1.classLabels,
. . . . . . . . . . . >
'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
. . . . . . . . . . . >                 THEN fltb1.invoiceId
. . . . . . . . . . . >                 END)) as labcount
. . . . . . . . . . . > from (
. . . . . . . . . . . >   select sapid, invoiceId,
. . . . . . . . . . . >         TO_CHAR(TO_TIMESTAMP(transactionDate,
'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo,
. . . . . . . . . . . >         classLabels
. . . . . . . . . . . >       from dfs.tmp.transactionView) fltb1
. . . . . . . . . . . > group by fltb1.sapId, yearmo;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(sapId=[$0], yearmo=[$1], totalcnt=[$2],
ultracount=[$3], labcount=[$4])
....................................
00-09                SelectionVectorRemover
00-12                  Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-15                    HashAgg(group=[{0, 1}], totalcnt=[COUNT()])
................................
00-22                          Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=file:/tmp/parquet/transaction]],
selectionRoot=file:/tmp/parquet/transaction, numFiles=1,
usedMetadataFile=false, columns=[`sapId`, `invoiceId`,
`transactionDate`, `classLabels`.`array`]]])

1 row selected (1.195 seconds)


On Mon, Feb 13, 2017 at 1:51 PM, David Kincaid <kincaid.dave@gmail.com> wrote:
> Here is the entire transactionView.view.drill file. As you can see the view
> itself is very simple and is just wrapping a syntactic problem with the
> array field. That's an issue I reported in Jira under DRILL-5183 (
> https://issues.apache.org/jira/browse/DRILL-5183)
>
> {
>   "name" : "transactionView",
>   "sql" : "SELECT `transactionRowKey`, `sapId`, `practiceName`,
> `practiceCity`, `practiceState`, `practicePostalCode`, `animalId`,
> `dateOfBirth`, `species`, `breed`, `gender`, `status`, `ownerId`,
> `itemType`, `classification`, `subclass`, `practiceDescription`,
> `clientDescription`, `invoiceId`, `unitOfMeasure`, `vendorName`, `vaccine`,
> `rabies`, `vaccineType`, `price`, `quantity`, `transactionDate`,
> `visitReason`, `speciesCode`, `genderCode`, `t`.`classLabels`['array'] AS
> `classLabels`\nFROM `dfs`.`/parquet/transaction` AS `t`",
>   "fields" : [ {
>     "name" : "transactionRowKey",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "sapId",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "practiceName",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "practiceCity",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "practiceState",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "practicePostalCode",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "animalId",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "dateOfBirth",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "species",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "breed",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "gender",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "status",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "ownerId",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "itemType",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "classification",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "subclass",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "practiceDescription",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "clientDescription",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "invoiceId",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "unitOfMeasure",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "vendorName",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "vaccine",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "rabies",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "vaccineType",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "price",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "quantity",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "transactionDate",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "visitReason",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "speciesCode",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "genderCode",
>     "type" : "ANY",
>     "isNullable" : true
>   }, {
>     "name" : "classLabels",
>     "type" : "ANY",
>     "isNullable" : true
>   } ],
>   "workspaceSchemaPath" : [ ]
> }
>
> On Mon, Feb 13, 2017 at 3:47 PM, Jinfeng Ni <jni@apache.org> wrote:
>
>> Yes, the log confirmed that the planning, especially physical
>> planning, is the one that took most of the time.
>>
>> If the definition of view s3.cisexport.transactionView is not very
>> complicated (involves large # of tables), then it's possible that some
>> planner rules have a bug. (In the past, we once saw couple of planner
>> rules would be fired in a loop).
>>
>> Is it possible that you can share the DDL of the view?  That may help
>> us re-produce the problem and take a look at the trace of Calcite,
>> which Drill uses as the query planner.
>>
>>
>>
>>
>>

Mime
View raw message