Julian Hyde created DRILL-61:
--------------------------------
Summary: Logical plan operator "collapsesegment" produces wrong results
Key: DRILL-61
URL: https://issues.apache.org/jira/browse/DRILL-61
Project: Apache Drill
Issue Type: Bug
Reporter: Julian Hyde
Logical plan operator "collapsesegment" produces wrong results. There is a null value present
-- maybe it is responsible.
Query:
{
"head" : {
"type" : "apache_drill_logical_plan",
"version" : 1,
"generator" : {
"type" : "manual",
"info" : "na"
}
},
"storage" : [ {
"type" : "queue",
"name" : "queue"
}, {
"type" : "classpath",
"name" : "donuts-json"
} ],
"query" : [ {
"op" : "scan",
"@id" : 1,
"memo" : "initial_scan",
"storageengine" : "donuts-json",
"selection" : {
"path" : "/employees.json",
"type" : "JSON"
},
"ref" : "_MAP"
}, {
"op" : "project",
"input" : 1,
"@id" : 2,
"projections" : [ {
"ref" : "output.deptId",
"expr" : "_MAP.deptId"
} ]
}, {
op: "segment",
"input" : 2,
"@id" : 3,
ref: "segment",
exprs: ["deptId"]
}, {
"input" : 3,
"@id" : 4,
op: "collapsingaggregate",
within: "segment",
carryovers: [ "deptId" ],
aggregations: [
{ ref: "typeCount", expr: "count(1)" }
]
}, {
"op" : "store",
"input" : 4,
"@id" : 5,
"memo" : "output sink",
"target" : {
"number" : 0
},
"partition" : null,
"storageEngine" : "queue"
} ]
}
gives result
{ "typeCount" : 2, "deptId" : 34 }
{ "typeCount" : 2, "deptId" : null }
{ "typeCount" : 1, "deptId" : 31 }
{ "typeCount" : 1, "deptId" : 31 }
I think the correct result would be
{ "typeCount" : 2, "deptId" : 33 }
{ "typeCount" : 2, "deptId" : 34 }
{ "typeCount" : 1, "deptId" : null }
{ "typeCount" : 1, "deptId" : 31 }
Note that the "segment" operator is working correctly. A similar query with "collapseaggregate"
removed:
{
"head" : {
"type" : "apache_drill_logical_plan",
"version" : 1,
"generator" : {
"type" : "manual",
"info" : "na"
}
},
"storage" : [ {
"type" : "queue",
"name" : "queue"
}, {
"type" : "classpath",
"name" : "donuts-json"
} ],
"query" : [ {
"op" : "scan",
"@id" : 1,
"memo" : "initial_scan",
"storageengine" : "donuts-json",
"selection" : {
"path" : "/employees.json",
"type" : "JSON"
},
"ref" : "_MAP"
}, {
"op" : "project",
"input" : 1,
"@id" : 2,
"projections" : [ {
"ref" : "output.deptId",
"expr" : "_MAP.deptId"
} ]
}, {
op: "segment",
"input" : 2,
"@id" : 3,
ref: "segment",
exprs: ["deptId"]
}, {
"op" : "store",
"input" : 3,
"@id" : 5,
"memo" : "output sink",
"target" : {
"number" : 0
},
"partition" : null,
"storageEngine" : "queue"
} ]
}
gives
{ "segment" : 1, "deptId" : 33 }
{ "segment" : 1, "deptId" : 33 }
{ "segment" : 2, "deptId" : 34 }
{ "segment" : 2, "deptId" : 34 }
{ "segment" : 3, "deptId" : null }
{ "segment" : 4, "deptId" : 31 }
It is reasonsble to assume that these are the records flowing into the "collapseaggregate"
ROP in the first query.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
|