drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian Hyde (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-64) In reference implementation, full outer join returns too many rows
Date Mon, 20 May 2013 23:11:16 GMT
Julian Hyde created DRILL-64:
--------------------------------

             Summary: In reference implementation, full outer join returns too many rows
                 Key: DRILL-64
                 URL: https://issues.apache.org/jira/browse/DRILL-64
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Julian Hyde
            Priority: Minor


Full outer join returns too many rows. The query

{
  "head" : {
    "type" : "apache_drill_logical_plan",
    "version" : 1,
    "generator" : {
      "type" : "manual",
      "info" : "na"
    }
  },
  "storage" : [ {
    "type" : "queue",
    "name" : "queue"
  }, {
    "type" : "classpath",
    "name" : "donuts-json"
  } ],
  "query" : [ {
    "op" : "scan",
    "@id" : 1,
    "memo" : "initial_scan",
    "storageengine" : "donuts-json",
    "selection" : {
      "path" : "/employees.json",
      "type" : "JSON"
    },
    "ref" : "_MAP"
  }, {
    "op" : "project",
    "input" : 1,
    "@id" : 2,
    "projections" : [ {
      "ref" : "output.deptId",
      "expr" : "_MAP.deptId"
    } ]
  }, {
    "op" : "project",
    "input" : 2,
    "@id" : 10,
    "projections" : [ {
      "ref" : "output.deptId1",
      "expr" : "deptId"
    } ]
  }, {
    "op" : "scan",
    "@id" : 3,
    "memo" : "initial_scan",
    "storageengine" : "donuts-json",
    "selection" : {
      "path" : "/departments.json",
      "type" : "JSON"
    },
    "ref" : "_MAP"
  }, {
    "op" : "project",
    "input" : 3,
    "@id" : 4,
    "projections" : [ {
      "ref" : "output.deptId",
      "expr" : "_MAP.deptId"
    } ]
  },  {
    "op": "join",
    "left": 10,
    "right": 4,
    "@id" : 5,
    "type": "outer",
    "conditions": [
      {"relationship": "==", "left": "deptId1", "right": "deptId"}
    ]
  }, {
    "op" : "store",
    "input" : 5,
    "@id" : 6,
    "memo" : "output sink",
    "target" : {
      "number" : 0
    },
    "partition" : null,
    "storageEngine" : "queue"
  } ]
}

returns 

{  "deptId" : 31,  "deptId1" : 31} 
{  "deptId" : 33,  "deptId1" : 33} 
{  "deptId" : 33,  "deptId1" : 33} 
{  "deptId" : 34,  "deptId1" : 34} 
{  "deptId" : 34,  "deptId1" : 34} 
{  "deptId1" : null} 
{  "deptId" : 31} 
{  "deptId" : 33} 
{  "deptId" : 34} 
{  "deptId" : 35} 

but I think it should return

{  "deptId" : 31,  "deptId1" : 31} 
{  "deptId" : 33,  "deptId1" : 33} 
{  "deptId" : 33,  "deptId1" : 33} 
{  "deptId" : 34,  "deptId1" : 34} 
{  "deptId" : 34,  "deptId1" : 34} 
{  "deptId1" : null} 
{  "deptId" : 35} 

because only 35 on the left is unmatched on the right; {31, 33, 34} have at least one match.

Also, it's cosmetic, but the "outer" join type should be called "full" because "left" is also
a kind of outer join, at least in SQL parlance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message