drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Matta <cma...@mapr.com>
Subject Trying to wrap my head around using WHERE with flatten
Date Fri, 27 Mar 2015 17:49:21 GMT
I’m using the yelp academic data set <https://www.yelp.com/academic_dataset>
for my testing purposes.

Say I want to list all businesses in the “Pets” category, this query
doesn’t work:

0: jdbc:drill:zk=sen11:5181,sen12:5181> select `name`,
flatten(`categories`) from
maprfs.cmatta.`yelp/yelp_academic_dataset_business.json` WHERE
flatten(`categories`) = 'Pets' limit 10;
Query failed: Query stopped., Failure while trying to materialize
incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function
implementation: [flatten(VARCHAR-REPEATED)].  Full expression:
--UNKNOWN EXPRESSION--.. [ daf9f62a-f39b-4745-a0e3-0160875f5cb7 on
se-node13.se.lab:31010 ]

Error: exception while executing query: Failure while executing query.
(state=,code=0)

But this query with a sub-select does work:

0: jdbc:drill:zk=sen11:5181,sen12:5181> select flat.`name`,
flat.`categories` from (select `name`, flatten(`categories`) as
categories from
maprfs.cmatta.`yelp/yelp_academic_dataset_business.json`) as flat
WHERE lower(flat.`categories`) = 'pets' limit 10;
+------------+------------+
|    name    | categories |
+------------+------------+
| Loving Hands Pet Care | Pets       |
| Amec Mid-City Animal Hospital | Pets       |
| PetSmart   | Pets       |
| A Dog's Life Photography | Pets       |
| Goober Pet Direct | Pets       |
| Pet Planet | Pets       |
| All-Star Animal Hospital | Pets       |
| Team Canine, Inc | Pets       |
| Foothills Pet Resort | Pets       |
| Petco      | Pets       |
+------------+------------+
10 rows selected (0.141 seconds)

Now that I can pick out which businesses are in the “Pets” category, I may
want to get the average star rating and order them by the number of
reviews. However, using the above technique in an aggregation query doesn’t
seem to work:

0: jdbc:drill:zk=sen11:5181,sen12:5181> select
. . . . . . . . . . . . . . . . . . . >        a.`name`,
. . . . . . . . . . . . . . . . . . . >        count(b.`text`),
. . . . . . . . . . . . . . . . . . . >        avg(b.`stars`)
. . . . . . . . . . . . . . . . . . . >    FROM (
. . . . . . . . . . . . . . . . . . . >      SELECT `business_id`,
`name`, flatten(`categories`) as `category`
. . . . . . . . . . . . . . . . . . . >      FROM
maprfs.`cmatta`.`yelp/yelp_academic_dataset_business.json`) a
. . . . . . . . . . . . . . . . . . . >    JOIN
maprfs.`cmatta`.`yelp/yelp_academic_dataset_review.json` b
. . . . . . . . . . . . . . . . . . . >        ON a.`business_id` =
b.`business_id`
. . . . . . . . . . . . . . . . . . . >    WHERE a.`category` = 'Pets'
. . . . . . . . . . . . . . . . . . . >    GROUP BY a.`name`
. . . . . . . . . . . . . . . . . . . >    ORDER BY count(*) DESC
. . . . . . . . . . . . . . . . . . . >    LIMIT 10;
Query failed: Query failed: Failure while running fragment., Failure
while trying to materialize incoming schema.  Errors:

Error in expression at index 2.  Error: Missing function
implementation: [flatten(VARCHAR-REPEATED)].  Full expression: null..
[ c28a76ef-ef20-426a-a47e-908e9b68b875 on se-node13.se.lab:31010 ]
[ c28a76ef-ef20-426a-a47e-908e9b68b875 on se-node13.se.lab:31010 ]

Error: exception while executing query: Failure while executing query.
(state=,code=0)

What does *Error: Missing function implementation:
[flatten(VARCHAR-REPEATED)]* in this context? Am I going about this the
wrong way?

Chris Matta
cmatta@mapr.com
215-701-3146
​

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message