drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian Hyde (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-61) Logical plan operator "collapsesegment" produces wrong results
Date Sat, 18 May 2013 22:33:15 GMT
Julian Hyde created DRILL-61:
--------------------------------

             Summary: Logical plan operator "collapsesegment" produces wrong results
                 Key: DRILL-61
                 URL: https://issues.apache.org/jira/browse/DRILL-61
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Julian Hyde


Logical plan operator "collapsesegment" produces wrong results. There is a null value present
-- maybe it is responsible.

Query:

{
  "head" : {
    "type" : "apache_drill_logical_plan",
    "version" : 1,
    "generator" : {
      "type" : "manual",
      "info" : "na"
    }
  },
  "storage" : [ {
    "type" : "queue",
    "name" : "queue"
  }, {
    "type" : "classpath",
    "name" : "donuts-json"
  } ],
  "query" : [ {
    "op" : "scan",
    "@id" : 1,
    "memo" : "initial_scan",
    "storageengine" : "donuts-json",
    "selection" : {
      "path" : "/employees.json",
      "type" : "JSON"
    },
    "ref" : "_MAP"
  }, {
    "op" : "project",
    "input" : 1,
    "@id" : 2,
    "projections" : [ {
      "ref" : "output.deptId",
      "expr" : "_MAP.deptId"
    } ]
  },  {
    op: "segment",
    "input" : 2,
    "@id" : 3,
    ref: "segment",
    exprs: ["deptId"]
  }, {
    "input" : 3,
    "@id" : 4,
    op: "collapsingaggregate",
    within: "segment",
    carryovers: [ "deptId" ],
    aggregations: [
                { ref: "typeCount",  expr: "count(1)" }
              ]
  }, {
    "op" : "store",
    "input" : 4,
    "@id" : 5,
    "memo" : "output sink",
    "target" : {
      "number" : 0
    },
    "partition" : null,
    "storageEngine" : "queue"
  } ]
}

gives result

{  "typeCount" : 2,  "deptId" : 34 }
{   "typeCount" : 2,  "deptId" : null }
{   "typeCount" : 1,   "deptId" : 31 }
{  "typeCount" : 1,   "deptId" : 31 }

I think the correct result would be

{  "typeCount" : 2,  "deptId" : 33 }
{   "typeCount" : 2,  "deptId" : 34 }
{   "typeCount" : 1,   "deptId" : null }
{  "typeCount" : 1,   "deptId" : 31 }

Note that the "segment" operator is working correctly. A similar query with "collapseaggregate"
removed:

{
  "head" : {
    "type" : "apache_drill_logical_plan",
    "version" : 1,
    "generator" : {
      "type" : "manual",
      "info" : "na"
    }
  },
  "storage" : [ {
    "type" : "queue",
    "name" : "queue"
  }, {
    "type" : "classpath",
    "name" : "donuts-json"
  } ],
  "query" : [ {
    "op" : "scan",
    "@id" : 1,
    "memo" : "initial_scan",
    "storageengine" : "donuts-json",
    "selection" : {
      "path" : "/employees.json",
      "type" : "JSON"
    },
    "ref" : "_MAP"
  }, {
    "op" : "project",
    "input" : 1,
    "@id" : 2,
    "projections" : [ {
      "ref" : "output.deptId",
      "expr" : "_MAP.deptId"
    } ]
  },  {
    op: "segment",
    "input" : 2,
    "@id" : 3,
    ref: "segment",
    exprs: ["deptId"]
  }, {
    "op" : "store",
    "input" : 3,
    "@id" : 5,
    "memo" : "output sink",
    "target" : {
      "number" : 0
    },
    "partition" : null,
    "storageEngine" : "queue"
  } ]
}

gives

{  "segment" : 1,  "deptId" : 33 }
{  "segment" : 1,  "deptId" : 33 }
{  "segment" : 2,  "deptId" : 34 }
{  "segment" : 2,  "deptId" : 34 }
{  "segment" : 3,  "deptId" : null }
{  "segment" : 4,  "deptId" : 31 }

It is reasonsble to assume that these are the records flowing into the "collapseaggregate"
ROP in the first query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message