drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neeraja Rentachintala <nrentachint...@maprtech.com>
Subject Re: Review Request 25996: DRILL-98 : Mongo Storage Plugin Support
Date Wed, 01 Oct 2014 00:29:29 GMT
sounds good. let me file a JIRA.

On Tue, Sep 30, 2014 at 5:26 PM, AnilKumar B <akumarb2010@gmail.com> wrote:

> Hi Neeraja,
>
> Since from last 1 week, there is no code changes. We will look into this
> issue. If possible, can you please raise jira for this?
>
> As of now we are working on following issues.
> 1) Joins on (mongo vs mongo) and (mongo vs other storages)
> 2) select *, (some other fields) (Jinfeng's review comment)
>
> Next 10 days, we both are on vacation, so once we come back, we will
> resolve these issues.
>
>
> Thanks & Regards,
> B Anil Kumar.
>
> On Wed, Oct 1, 2014 at 5:03 AM, Neeraja Rentachintala <
> nrentachintala@maprtech.com> wrote:
>
>> Btw I got the dataset JSON from
>> http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/
>>
>> On Tue, Sep 30, 2014 at 4:30 PM, Neeraja Rentachintala <
>> nrentachintala@maprtech.com> wrote:
>>
>>> Hi Anil, Kamesh
>>>
>>> I was trying MongoDB plugin with the following queries and all of them
>>> succeeded. This is using the zip codes dataset.
>>> However intermittently (now consistently) , these are failing. I built
>>> Kamesh branch and also tried from the latest 0.6 master (which merged the
>>> plugin).
>>> Do you have any insights into this. I can file JIRA if you like.
>>>
>>> select state,sum(pop) from zipcodes group by state having sum(pop) >
>>> 10000000 order by sum(pop) desc;
>>>
>>> select state,city,avg(pop) from zipcodes group by state, city;
>>>
>>> select city, sum(pop) from zipcodes where state is not null group by
>>> city order by sum(pop) desc limit 1;
>>>
>>> select city, sum(pop) from zipcodes where state is not null group by
>>> city order by sum(pop) asc limit 1;
>>>
>>> ---------------
>>>
>>>
>>> Errors
>>>
>>> -------------------
>>>
>>> select sum(pop) from zipcodes where city=‘CHICAGO’;
>>>
>>> 0: jdbc:drill:zk=local> select state, city, sum(pop) from zipcodes group
>>> by state, city order by sum(pop) limit 5;
>>>
>>> Query failed: Failure while running fragment. Invalid value for boolean:
>>> 15338 [1366240a-125d-4646-962a-1734f84b03b3]
>>>
>>>
>>> Error: exception while executing query: Failure while trying to get next
>>> result batch. (state=,code=0)
>>>
>>> 0: jdbc:drill:zk=local> select state, city, sum(pop) from zipcodes where
>>> state is not null and city is not null group by state, city order by
>>> sum(pop) limit 5;
>>>
>>> Query failed: Failure while running fragment. Invalid value for boolean:
>>> 15338 [53a1c241-82f0-4413-aa09-39f851d7209a]
>>>
>>>
>>> Error: exception while executing query: Failure while trying to get next
>>> result batch. (state=,code=0)
>>>
>>> 0: jdbc:drill:zk=local> select state, city, sum(pop) from zipcodes group
>>> by state,city order by sum(pop) asc limit 1;
>>>
>>> Query failed: Failure while running fragment. Invalid value for boolean:
>>> 15338 [911f48f7-019e-42f2-b0af-1e255416ce76]
>>>
>>>
>>> Error: exception while executing query: Failure while trying to get next
>>> result batch. (state=,code=0)
>>>
>>> 0: jdbc:drill:zk=local> select state,city,avg(pop) from zipcodes group
>>> by state, city;
>>>
>>> Query failed: Failure while running fragment. Invalid value for boolean:
>>> 15338 [54c5e480-972a-49b9-860e-60d4a7429366]
>>>
>>>
>>> Error: exception while executing query: Failure while trying to get next
>>> result batch. (state=,code=0)
>>>
>>> 0: jdbc:drill:zk=local> select city, sum(pop) from zipcodes group by
>>> city order by sum(pop) asc limit 1;
>>>
>>> Query failed: Failure while running fragment. Invalid value for boolean:
>>> 15338 [b5374f4a-3ee6-47ac-9dc1-fcebed555518]
>>>
>>>
>>> Error: exception while executing query: Failure while trying to get next
>>> result batch. (state=,code=0)
>>>
>>> 0: jdbc:drill:zk=local> select state,sum(pop) from zipcodes group by
>>> state having sum(pop) > 10000000;
>>>
>>> Query failed: Failure while running fragment. Invalid value for boolean:
>>> 15338 [5875169d-dcb8-419f-991e-d38d41dddcbb]
>>>
>>>
>>> Error: exception while executing query: Failure while trying to get next
>>> result batch. (state=,code=0)
>>>
>>> On Sat, Sep 27, 2014 at 1:07 AM, Kamesh <kamesh.hadoop@gmail.com> wrote:
>>>
>>>> Thanks Jinfeng & Neeraja for looking into this.
>>>> We will look into the above mentioned issues.
>>>>
>>>>
>>>>
>>>> On Sat, Sep 27, 2014 at 8:28 AM, Neeraja Rentachintala <
>>>> nrentachintala@maprtech.com> wrote:
>>>>
>>>>> I have played with the plugin as well today and overall its very good.
>>>>>
>>>>> I tried the queries
>>>>> http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/
>>>>> on the zip code dataset and all the aggregate queries worked.
>>>>>
>>>>>
>>>>> -----------
>>>>>
>>>>> select sum(pop) from zipcodes where city='SEATTLE’;
>>>>>
>>>>> select state, city, sum(pop) from zipcodes group by state,city order
>>>>> by sum(pop) asc limit 1;
>>>>>
>>>>> select state,city,avg(pop) from zipcodes group by state, city;
>>>>>
>>>>> select city, sum(pop) from zipcodes group by city order by sum(pop)
>>>>> asc limit 1;
>>>>>
>>>>> select state,sum(pop) from zipcodes group by state having sum(pop) >
>>>>> 10000000;
>>>>>
>>>>>
>>>>> ----------
>>>>>
>>>>>
>>>>> I however noticed issues with querying repeating elements (used USDA
>>>>> nutrition dataset), especially more than one level nested as well as
JOINs
>>>>> (example queries are below)
>>>>>
>>>>> ------------------
>>>>>
>>>>> 0: jdbc:drill:zk=local> SELECT t1.first_name FROM
>>>>> mongo.employee.`empinfo` t1 JOIN  mongo.employee.`empinfo` t2 ON
>>>>> t1.`employee_id` = t2.`employee_id`;
>>>>>
>>>>> Query failed: Failure while setting up Foreman. Internal error: Error
>>>>> while applying rule DrillPushProjIntoScan, args
>>>>> [rel#12606:ProjectRel.NONE.ANY([]).[](child=rel#12598:Subset#0.ENUMERABLE.ANY([]).[],employee_id=$1,first_name=$2),
>>>>> rel#12594:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[mongo,
>>>>> employee, empinfo])] [08f4eedd-f5c9-4ebf-8d5b-d9249b79ca32]
>>>>>
>>>>>
>>>>> 0: jdbc:drill:zk=local> select t.nutrients from mongo.usda.nutrition
t
>>>>> limit 1;
>>>>>
>>>>> Query failed: Screen received stop request sent. You tried to write a
>>>>> BigInt type when you are using a ValueWriter of type
>>>>> NullableFloat8WriterImpl. [dc44e277-1b1d-4f00-b60e-9f06b883e7c5]
>>>>>
>>>>>
>>>>> Error: exception while executing query: Failure while trying to get
>>>>> next result batch. (state=,code=0)
>>>>>
>>>>> 0: jdbc:drill:zk=local> select t.nutrients[0].units from
>>>>> mongo.usda.nutrition t limit 1;
>>>>>
>>>>> Query failed: Screen received stop request sent. You tried to write a
>>>>> BigInt type when you are using a ValueWriter of type
>>>>> NullableFloat8WriterImpl. [a285c85e-4607-48fc-97af-41b5726459e2]
>>>>>
>>>>>
>>>>> Error: exception while executing query: Failure while trying to get
>>>>> next result batch. (state=,code=0)
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 26, 2014 at 6:07 PM, Jinfeng Ni <jni@maprtech.com>
wrote:
>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------
>>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>>> https://reviews.apache.org/r/25996/#review54756
>>>>>> -----------------------------------------------------------
>>>>>>
>>>>>> Ship it!
>>>>>>
>>>>>>
>>>>>> I did not do a detail code review; let that task to Steven. I mainly
>>>>>> played with this Mongo plugin. Overall it looks good.
>>>>>>
>>>>>> Basically, I start a mongodb instance, import the data, and run
>>>>>> several single table queryies, and all of them work perfectly.
>>>>>>
>>>>>> Some issues I saw when playing around :
>>>>>>
>>>>>> 1. The result of select * seems not the expect answer : it would
>>>>>> return a map containing all the columns:
>>>>>>
>>>>>> SELECT * FROM mongo.employee.`empinfo` limit 2;
>>>>>> +------------+
>>>>>> |     *      |
>>>>>> +------------+
>>>>>> | { "employee_id" : 1101 , "full_name" : "Steve Eurich" ,
>>>>>> "first_name" : "Steve" , "last_name" : "Eurich" , "position_id" :
16 ,
>>>>>> "position" : "Store T" , "isFTE" : true} |
>>>>>> | { "employee_id" : 1102 , "full_name" : "Mary Pierson" ,
>>>>>> "first_name" : "Mary" , "last_name" : "Pierson" , "position_id" :
16 ,
>>>>>> "position" : "Store T" , "isFTE" : true} |
>>>>>> +------------+
>>>>>> 2 rows selected (0.084 seconds)
>>>>>>
>>>>>> In contrast, here is the result when Drill queries a .json file:
>>>>>>
>>>>>> select * from cp.`employee.json` limit 2;
>>>>>>
>>>>>> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
>>>>>> | employee_id | full_name  | first_name | last_name  | position_id
|
>>>>>> position_title |  store_id  | department_id | birth_date | hire_date
 |
>>>>>>  salary   | supervisor_id | education_level | marital_status |  
gender   |
>>>>>> management_role |
>>>>>>
>>>>>> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
>>>>>> | 1           | Sheri Nowmer | Sheri      | Nowmer     | 1
>>>>>>  | President      | 0          | 1             | 1961-08-26 | 1994-12-01
>>>>>> 00:00:00.0 | 80000.0    | 0             | Graduate Degree | S
>>>>>> | F          | Senior Management |
>>>>>> | 2           | Derrick Whelply | Derrick    | Whelply    | 2
>>>>>>    | VP Country Manager | 0          | 1             | 1915-07-03
|
>>>>>> 1994-12-01 00:00:00.0 | 40000.0    | 1             | Graduate Degree
| M
>>>>>>           | M          | Senior Management |
>>>>>>
>>>>>> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
>>>>>> 2 rows selected (0.39 seconds)
>>>>>>
>>>>>>
>>>>>> 2. Join two mongodb tables would fail.
>>>>>>
>>>>>> SELECT t1.first_name, t2.last_name FROM mongo.employee.`empinfo`
t1,
>>>>>> mongo.employee.`empinfo` t2 where t1.`employee_id` = t2.`employee_id`
limit
>>>>>> 1;
>>>>>> Query failed: Failure while setting up Foreman. Internal error: while
>>>>>> converting `t1`.`employee_id` = `t2`.`employee_id`
>>>>>> [39eb6c88-fd21-4514-8903-48d99210b88d]
>>>>>>
>>>>>> 3. Join a mongodb table with a table with other storage engine would
>>>>>> fail with CanNotPlanException:
>>>>>>
>>>>>> SELECT t1.first_name, t2.last_name FROM mongo.employee.`empinfo`
t1,
>>>>>> mongo.employee.`empinfo` t2 where t1.`employee_id` = t2.`employee_id`
limit
>>>>>> 1;
>>>>>> Query failed: Failure while setting up Foreman. Internal error: while
>>>>>> converting `t1`.`employee_id` = `t2`.`employee_id`
>>>>>> [39eb6c88-fd21-4514-8903-48d99210b88d]
>>>>>>
>>>>>> Error: exception while executing query: Failure while trying to get
>>>>>> next result batch. (state=,code=0)
>>>>>> 0: jdbc:drill:zk=local> SELECT t1.first_name, t1.last_name FROM
>>>>>> mongo.employee.`empinfo` as t1, cp.`employee.json` t2 where t1.employee_id
>>>>>> = t2.employee_id limit 10;
>>>>>> Query failed: Failure while parsing sql. Node
>>>>>> [rel#2496:Subset#5.LOGICAL.ANY([]).[]] could not be implemented;
planner
>>>>>> state:
>>>>>>
>>>>>> Root: rel#2496:Subset#5.LOGICAL.ANY([]).[]
>>>>>> Original rel:
>>>>>> ......
>>>>>>
>>>>>> 4. Select *, regular_column from mongodb would return the
>>>>>> regular_column as null.
>>>>>>
>>>>>> 0: jdbc:drill:zk=local> SELECT first_name FROM
>>>>>> mongo.employee.`empinfo` limit 2;
>>>>>> +------------+
>>>>>> | first_name |
>>>>>> +------------+
>>>>>> | Steve      |
>>>>>> | Mary       |
>>>>>> +------------+
>>>>>> 2 rows selected (0.084 seconds)
>>>>>> 0: jdbc:drill:zk=local> SELECT *, first_name FROM
>>>>>> mongo.employee.`empinfo` limit 2;
>>>>>> +------------+------------+
>>>>>> |     *      | first_name |
>>>>>> +------------+------------+
>>>>>> | { "employee_id" : 1101 , "full_name" : "Steve Eurich" ,
>>>>>> "first_name" : "Steve" , "last_name" : "Eurich" , "position_id" :
16 ,
>>>>>> "position" : "Store T" , "isFTE" : true} | null       |
>>>>>> | { "employee_id" : 1102 , "full_name" : "Mary Pierson" ,
>>>>>> "first_name" : "Mary" , "last_name" : "Pierson" , "position_id" :
16 ,
>>>>>> "position" : "Store T" , "isFTE" : true} | null       |
>>>>>> +------------+------------+
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think it would be fine to fix those issues in the next release.
>>>>>>
>>>>>>
>>>>>> PS: could you please re-build a patch after rebasing on the recent
>>>>>> master branch?
>>>>>>
>>>>>> - Jinfeng Ni
>>>>>>
>>>>>>
>>>>>> On Sept. 24, 2014, 11:06 a.m., Anil Kumar B wrote:
>>>>>> >
>>>>>> > -----------------------------------------------------------
>>>>>> > This is an automatically generated e-mail. To reply, visit:
>>>>>> > https://reviews.apache.org/r/25996/
>>>>>> > -----------------------------------------------------------
>>>>>> >
>>>>>> > (Updated Sept. 24, 2014, 11:06 a.m.)
>>>>>> >
>>>>>> >
>>>>>> > Review request for drill, Aditya Kishore, Jacques Nadeau, and
>>>>>> Kamesh B.
>>>>>> >
>>>>>> >
>>>>>> > Repository: drill-git
>>>>>> >
>>>>>> >
>>>>>> > Description
>>>>>> > -------
>>>>>> >
>>>>>> > Mongo storage plugin support: The features which we implemented
as
>>>>>> part of this is as follows.
>>>>>> > 1) Support for sharded(chunk wise), shared-replicated(chunk
wise),
>>>>>> replicated, stand-alone
>>>>>> > 2) Predicate pushdown
>>>>>> > 3) Mongo PStore
>>>>>> >
>>>>>> > MongoRecordReader uses JsonReaderWithState in the case of non-star
>>>>>> queries.
>>>>>> >
>>>>>> >
>>>>>> > Diffs
>>>>>> > -----
>>>>>> >
>>>>>> >   contrib/pom.xml 728038a
>>>>>> >   contrib/storage-mongo/pom.xml PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/DrillMongoConstants.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoCnxnManager.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoCompareFunctionProcessor.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoFilterBuilder.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoGroupScan.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoPushDownFilterForScan.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoRecordReader.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoScanBatchCreator.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoScanSpec.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoStoragePlugin.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoStoragePluginConfig.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoSubScan.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoUtils.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/common/ChunkInfo.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/common/MongoCompareOp.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/config/MongoPStore.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/config/MongoPStoreProvider.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoDatabaseSchema.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoSchemaFactory.java
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/main/resources/bootstrap-storage-plugins.json
>>>>>> PRE-CREATION
>>>>>> >   contrib/storage-mongo/src/main/resources/drill-module.conf
>>>>>> PRE-CREATION
>>>>>> >
>>>>>>  contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/TestMongoChunkAssignment.java
>>>>>> PRE-CREATION
>>>>>> >   distribution/pom.xml cd5df0d
>>>>>> >   distribution/src/assemble/bin.xml 86e3802
>>>>>> >
>>>>>>  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
>>>>>> 933bfbe
>>>>>> >
>>>>>>  exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>>>>>> 4fa61e1
>>>>>> >
>>>>>>  exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java
>>>>>> 4e12b8b
>>>>>> >
>>>>>>  exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReaderWithState.java
>>>>>> ef995f8
>>>>>> >
>>>>>> > Diff: https://reviews.apache.org/r/25996/diff/
>>>>>> >
>>>>>> >
>>>>>> > Testing
>>>>>> > -------
>>>>>> >
>>>>>> > 1) Tested various set of queries on sharded, replicated and
>>>>>> stand-alone modes.
>>>>>> >
>>>>>> > 2) Test Environment details: We created mongo cluster with 2
shards
>>>>>> with a collections consists of 35 chunks(18 chunks are one shard
and
>>>>>> remaining chunks on on other shard). Below are the few queries which
we
>>>>>> tested in all the environments.
>>>>>> >
>>>>>> >     a) SELECT * FROM mongo.employee.`empinfo` limit 10;
>>>>>> >
>>>>>> >       b) SELECT first_name, last_name FROM mongo.employee.`empinfo`
>>>>>> limit 10;
>>>>>> >
>>>>>> >       c) SELECT first_name, last_name FROM mongo.employee.`empinfo`
>>>>>> where employee_id = 1111;
>>>>>> >
>>>>>> >       d) SELECT * FROM mongo.employee.`empinfo` where full_name
=
>>>>>> 'Phil Munoz';
>>>>>> >
>>>>>> >       e) SELECT first_name, last_name, position_id FROM
>>>>>> mongo.employee.`empinfo` where employee_id = 1111  OR position_id
= 16;
>>>>>> >
>>>>>> >       g) SELECT first_name, last_name FROM mongo.employee.`empinfo`
>>>>>> where isFTE = true;
>>>>>> >
>>>>>> >       h) SELECT first_name, last_name, position_id FROM
>>>>>> mongo.employee.`empinfo` where employee_id = 1107  AND position_id
= 17 AND
>>>>>> last_name = 'Yonce';
>>>>>> >
>>>>>> >
>>>>>> > 3) PStore functionality not fully tested.
>>>>>> >
>>>>>> >
>>>>>> > Thanks,
>>>>>> >
>>>>>> > Anil Kumar B
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Kamesh.
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message