I'm starting to work on GROUP BY support, and I'm trying to figure out the syntax of the "collapsingaggregate"
operator based on the spec. I tried to find some example logical plans but I couldn't. Could
someone give a couple of examples?
Here's a plan that almost works:
{
"head" : {
"type" : "apache_drill_logical_plan",
"version" : 1,
"generator" : {
"type" : "manual",
"info" : "na"
}
},
"storage" : [ {
"type" : "queue",
"name" : "queue"
}, {
"type" : "classpath",
"name" : "donuts-json"
} ],
"query" : [ {
"op" : "scan",
"@id" : 1,
"memo" : "initial_scan",
"storageengine" : "donuts-json",
"selection" : {
"path" : "/employees.json",
"type" : "JSON"
},
"ref" : "_MAP"
}, {
"op" : "collapsingaggregate",
"@id" : 2,
"input" : 1,
"carryovers": [],
"aggregations" : [ {
"ref" : "output.c",
"expr" : "count"
} ]
}, {
"op" : "store",
"@id" : 3,
"memo" : "output sink",
"input" : 2,
"target" : {
"number" : 0
},
"partition" : null,
"storageEngine" : "queue"
} ]
}
Some specific questions:
1. is it valid for "carryovers" to be omitted or empty (I get NPE if I omit it)?
2. what's the syntax for aggregation expression (I tried "COUNT()", "COUNT(*)", "COUNT")
3. the spec seems to imply that I could write something as rich as "5 + SUM(salary) / COUNT()"
for aggregation expression. Is that true?
4. "within" is a segment. Does that mean a holder field, like the "toppings" field in donuts.json?
5. is it valid for aggregations to be omitted or empty?
6. in the plan spec [ https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
] does "*" mean that a field is optional and dagger mean system-generated? It does't say explicitly.
7. if I want to implement "SELECT deptno, COUNT(*) FROM emp GROUP BY deptno", I presume I
will generate a plan that consists of scan-segment-collapsingaggregate?
If the example queries answer those questions, no need to answer them explicitly.
Julian
|