drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andries Engelbrecht <aengelbre...@maprtech.com>
Subject Re: Trying to wrap my head around using WHERE with flatten
Date Fri, 27 Mar 2015 18:01:51 GMT
I would recommend to not use a count(*) but rather pick a column to use for count.

The missing function implementation is interesting though, would be good to understand exactly
what that means.

—Andries


On Mar 27, 2015, at 10:49 AM, Christopher Matta <cmatta@mapr.com> wrote:

> I’m using the yelp academic data set <https://www.yelp.com/academic_dataset>
> for my testing purposes.
> 
> Say I want to list all businesses in the “Pets” category, this query
> doesn’t work:
> 
> 0: jdbc:drill:zk=sen11:5181,sen12:5181> select `name`,
> flatten(`categories`) from
> maprfs.cmatta.`yelp/yelp_academic_dataset_business.json` WHERE
> flatten(`categories`) = 'Pets' limit 10;
> Query failed: Query stopped., Failure while trying to materialize
> incoming schema.  Errors:
> 
> Error in expression at index -1.  Error: Missing function
> implementation: [flatten(VARCHAR-REPEATED)].  Full expression:
> --UNKNOWN EXPRESSION--.. [ daf9f62a-f39b-4745-a0e3-0160875f5cb7 on
> se-node13.se.lab:31010 ]
> 
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 
> But this query with a sub-select does work:
> 
> 0: jdbc:drill:zk=sen11:5181,sen12:5181> select flat.`name`,
> flat.`categories` from (select `name`, flatten(`categories`) as
> categories from
> maprfs.cmatta.`yelp/yelp_academic_dataset_business.json`) as flat
> WHERE lower(flat.`categories`) = 'pets' limit 10;
> +------------+------------+
> |    name    | categories |
> +------------+------------+
> | Loving Hands Pet Care | Pets       |
> | Amec Mid-City Animal Hospital | Pets       |
> | PetSmart   | Pets       |
> | A Dog's Life Photography | Pets       |
> | Goober Pet Direct | Pets       |
> | Pet Planet | Pets       |
> | All-Star Animal Hospital | Pets       |
> | Team Canine, Inc | Pets       |
> | Foothills Pet Resort | Pets       |
> | Petco      | Pets       |
> +------------+------------+
> 10 rows selected (0.141 seconds)
> 
> Now that I can pick out which businesses are in the “Pets” category, I may
> want to get the average star rating and order them by the number of
> reviews. However, using the above technique in an aggregation query doesn’t
> seem to work:
> 
> 0: jdbc:drill:zk=sen11:5181,sen12:5181> select
> . . . . . . . . . . . . . . . . . . . >        a.`name`,
> . . . . . . . . . . . . . . . . . . . >        count(b.`text`),
> . . . . . . . . . . . . . . . . . . . >        avg(b.`stars`)
> . . . . . . . . . . . . . . . . . . . >    FROM (
> . . . . . . . . . . . . . . . . . . . >      SELECT `business_id`,
> `name`, flatten(`categories`) as `category`
> . . . . . . . . . . . . . . . . . . . >      FROM
> maprfs.`cmatta`.`yelp/yelp_academic_dataset_business.json`) a
> . . . . . . . . . . . . . . . . . . . >    JOIN
> maprfs.`cmatta`.`yelp/yelp_academic_dataset_review.json` b
> . . . . . . . . . . . . . . . . . . . >        ON a.`business_id` =
> b.`business_id`
> . . . . . . . . . . . . . . . . . . . >    WHERE a.`category` = 'Pets'
> . . . . . . . . . . . . . . . . . . . >    GROUP BY a.`name`
> . . . . . . . . . . . . . . . . . . . >    ORDER BY count(*) DESC
> . . . . . . . . . . . . . . . . . . . >    LIMIT 10;
> Query failed: Query failed: Failure while running fragment., Failure
> while trying to materialize incoming schema.  Errors:
> 
> Error in expression at index 2.  Error: Missing function
> implementation: [flatten(VARCHAR-REPEATED)].  Full expression: null..
> [ c28a76ef-ef20-426a-a47e-908e9b68b875 on se-node13.se.lab:31010 ]
> [ c28a76ef-ef20-426a-a47e-908e9b68b875 on se-node13.se.lab:31010 ]
> 
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 
> What does *Error: Missing function implementation:
> [flatten(VARCHAR-REPEATED)]* in this context? Am I going about this the
> wrong way?
> 
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
> ​


Mime
View raw message