drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: Review Request 25996: DRILL-98 : Mongo Storage Plugin Support
Date Wed, 01 Oct 2014 00:26:59 GMT
Hi Neeraja,

Since from last 1 week, there is no code changes. We will look into this
issue. If possible, can you please raise jira for this?

As of now we are working on following issues.
1) Joins on (mongo vs mongo) and (mongo vs other storages)
2) select *, (some other fields) (Jinfeng's review comment)

Next 10 days, we both are on vacation, so once we come back, we will
resolve these issues.


Thanks & Regards,
B Anil Kumar.

On Wed, Oct 1, 2014 at 5:03 AM, Neeraja Rentachintala <
nrentachintala@maprtech.com> wrote:

> Btw I got the dataset JSON from
> http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/
>
> On Tue, Sep 30, 2014 at 4:30 PM, Neeraja Rentachintala <
> nrentachintala@maprtech.com> wrote:
>
>> Hi Anil, Kamesh
>>
>> I was trying MongoDB plugin with the following queries and all of them
>> succeeded. This is using the zip codes dataset.
>> However intermittently (now consistently) , these are failing. I built
>> Kamesh branch and also tried from the latest 0.6 master (which merged the
>> plugin).
>> Do you have any insights into this. I can file JIRA if you like.
>>
>> select state,sum(pop) from zipcodes group by state having sum(pop) >
>> 10000000 order by sum(pop) desc;
>>
>> select state,city,avg(pop) from zipcodes group by state, city;
>>
>> select city, sum(pop) from zipcodes where state is not null group by city
>> order by sum(pop) desc limit 1;
>>
>> select city, sum(pop) from zipcodes where state is not null group by city
>> order by sum(pop) asc limit 1;
>>
>> ---------------
>>
>>
>> Errors
>>
>> -------------------
>>
>> select sum(pop) from zipcodes where city=‘CHICAGO’;
>>
>> 0: jdbc:drill:zk=local> select state, city, sum(pop) from zipcodes group
>> by state, city order by sum(pop) limit 5;
>>
>> Query failed: Failure while running fragment. Invalid value for boolean:
>> 15338 [1366240a-125d-4646-962a-1734f84b03b3]
>>
>>
>> Error: exception while executing query: Failure while trying to get next
>> result batch. (state=,code=0)
>>
>> 0: jdbc:drill:zk=local> select state, city, sum(pop) from zipcodes where
>> state is not null and city is not null group by state, city order by
>> sum(pop) limit 5;
>>
>> Query failed: Failure while running fragment. Invalid value for boolean:
>> 15338 [53a1c241-82f0-4413-aa09-39f851d7209a]
>>
>>
>> Error: exception while executing query: Failure while trying to get next
>> result batch. (state=,code=0)
>>
>> 0: jdbc:drill:zk=local> select state, city, sum(pop) from zipcodes group
>> by state,city order by sum(pop) asc limit 1;
>>
>> Query failed: Failure while running fragment. Invalid value for boolean:
>> 15338 [911f48f7-019e-42f2-b0af-1e255416ce76]
>>
>>
>> Error: exception while executing query: Failure while trying to get next
>> result batch. (state=,code=0)
>>
>> 0: jdbc:drill:zk=local> select state,city,avg(pop) from zipcodes group by
>> state, city;
>>
>> Query failed: Failure while running fragment. Invalid value for boolean:
>> 15338 [54c5e480-972a-49b9-860e-60d4a7429366]
>>
>>
>> Error: exception while executing query: Failure while trying to get next
>> result batch. (state=,code=0)
>>
>> 0: jdbc:drill:zk=local> select city, sum(pop) from zipcodes group by city
>> order by sum(pop) asc limit 1;
>>
>> Query failed: Failure while running fragment. Invalid value for boolean:
>> 15338 [b5374f4a-3ee6-47ac-9dc1-fcebed555518]
>>
>>
>> Error: exception while executing query: Failure while trying to get next
>> result batch. (state=,code=0)
>>
>> 0: jdbc:drill:zk=local> select state,sum(pop) from zipcodes group by
>> state having sum(pop) > 10000000;
>>
>> Query failed: Failure while running fragment. Invalid value for boolean:
>> 15338 [5875169d-dcb8-419f-991e-d38d41dddcbb]
>>
>>
>> Error: exception while executing query: Failure while trying to get next
>> result batch. (state=,code=0)
>>
>> On Sat, Sep 27, 2014 at 1:07 AM, Kamesh <kamesh.hadoop@gmail.com> wrote:
>>
>>> Thanks Jinfeng & Neeraja for looking into this.
>>> We will look into the above mentioned issues.
>>>
>>>
>>>
>>> On Sat, Sep 27, 2014 at 8:28 AM, Neeraja Rentachintala <
>>> nrentachintala@maprtech.com> wrote:
>>>
>>>> I have played with the plugin as well today and overall its very good.
>>>>
>>>> I tried the queries
>>>> http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/
>>>> on the zip code dataset and all the aggregate queries worked.
>>>>
>>>>
>>>> -----------
>>>>
>>>> select sum(pop) from zipcodes where city='SEATTLE’;
>>>>
>>>> select state, city, sum(pop) from zipcodes group by state,city order by
>>>> sum(pop) asc limit 1;
>>>>
>>>> select state,city,avg(pop) from zipcodes group by state, city;
>>>>
>>>> select city, sum(pop) from zipcodes group by city order by sum(pop) asc
>>>> limit 1;
>>>>
>>>> select state,sum(pop) from zipcodes group by state having sum(pop) >
>>>> 10000000;
>>>>
>>>>
>>>> ----------
>>>>
>>>>
>>>> I however noticed issues with querying repeating elements (used USDA
>>>> nutrition dataset), especially more than one level nested as well as JOINs
>>>> (example queries are below)
>>>>
>>>> ------------------
>>>>
>>>> 0: jdbc:drill:zk=local> SELECT t1.first_name FROM
>>>> mongo.employee.`empinfo` t1 JOIN  mongo.employee.`empinfo` t2 ON
>>>> t1.`employee_id` = t2.`employee_id`;
>>>>
>>>> Query failed: Failure while setting up Foreman. Internal error: Error
>>>> while applying rule DrillPushProjIntoScan, args
>>>> [rel#12606:ProjectRel.NONE.ANY([]).[](child=rel#12598:Subset#0.ENUMERABLE.ANY([]).[],employee_id=$1,first_name=$2),
>>>> rel#12594:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[mongo,
>>>> employee, empinfo])] [08f4eedd-f5c9-4ebf-8d5b-d9249b79ca32]
>>>>
>>>>
>>>> 0: jdbc:drill:zk=local> select t.nutrients from mongo.usda.nutrition t
>>>> limit 1;
>>>>
>>>> Query failed: Screen received stop request sent. You tried to write a
>>>> BigInt type when you are using a ValueWriter of type
>>>> NullableFloat8WriterImpl. [dc44e277-1b1d-4f00-b60e-9f06b883e7c5]
>>>>
>>>>
>>>> Error: exception while executing query: Failure while trying to get
>>>> next result batch. (state=,code=0)
>>>>
>>>> 0: jdbc:drill:zk=local> select t.nutrients[0].units from
>>>> mongo.usda.nutrition t limit 1;
>>>>
>>>> Query failed: Screen received stop request sent. You tried to write a
>>>> BigInt type when you are using a ValueWriter of type
>>>> NullableFloat8WriterImpl. [a285c85e-4607-48fc-97af-41b5726459e2]
>>>>
>>>>
>>>> Error: exception while executing query: Failure while trying to get
>>>> next result batch. (state=,code=0)
>>>>
>>>>
>>>>
>>>> On Fri, Sep 26, 2014 at 6:07 PM, Jinfeng Ni <jni@maprtech.com> wrote:
>>>>
>>>>>
>>>>> -----------------------------------------------------------
>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>> https://reviews.apache.org/r/25996/#review54756
>>>>> -----------------------------------------------------------
>>>>>
>>>>> Ship it!
>>>>>
>>>>>
>>>>> I did not do a detail code review; let that task to Steven. I mainly
>>>>> played with this Mongo plugin. Overall it looks good.
>>>>>
>>>>> Basically, I start a mongodb instance, import the data, and run
>>>>> several single table queryies, and all of them work perfectly.
>>>>>
>>>>> Some issues I saw when playing around :
>>>>>
>>>>> 1. The result of select * seems not the expect answer : it would
>>>>> return a map containing all the columns:
>>>>>
>>>>> SELECT * FROM mongo.employee.`empinfo` limit 2;
>>>>> +------------+
>>>>> |     *      |
>>>>> +------------+
>>>>> | { "employee_id" : 1101 , "full_name" : "Steve Eurich" , "first_name"
>>>>> : "Steve" , "last_name" : "Eurich" , "position_id" : 16 , "position"
:
>>>>> "Store T" , "isFTE" : true} |
>>>>> | { "employee_id" : 1102 , "full_name" : "Mary Pierson" , "first_name"
>>>>> : "Mary" , "last_name" : "Pierson" , "position_id" : 16 , "position"
:
>>>>> "Store T" , "isFTE" : true} |
>>>>> +------------+
>>>>> 2 rows selected (0.084 seconds)
>>>>>
>>>>> In contrast, here is the result when Drill queries a .json file:
>>>>>
>>>>> select * from cp.`employee.json` limit 2;
>>>>>
>>>>> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
>>>>> | employee_id | full_name  | first_name | last_name  | position_id |
>>>>> position_title |  store_id  | department_id | birth_date | hire_date
 |
>>>>>  salary   | supervisor_id | education_level | marital_status |   gender
  |
>>>>> management_role |
>>>>>
>>>>> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
>>>>> | 1           | Sheri Nowmer | Sheri      | Nowmer     | 1          
|
>>>>> President      | 0          | 1             | 1961-08-26 | 1994-12-01
>>>>> 00:00:00.0 | 80000.0    | 0             | Graduate Degree | S
>>>>> | F          | Senior Management |
>>>>> | 2           | Derrick Whelply | Derrick    | Whelply    | 2
>>>>>  | VP Country Manager | 0          | 1             | 1915-07-03 |
>>>>> 1994-12-01 00:00:00.0 | 40000.0    | 1             | Graduate Degree
| M
>>>>>           | M          | Senior Management |
>>>>>
>>>>> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
>>>>> 2 rows selected (0.39 seconds)
>>>>>
>>>>>
>>>>> 2. Join two mongodb tables would fail.
>>>>>
>>>>> SELECT t1.first_name, t2.last_name FROM mongo.employee.`empinfo` t1,
>>>>> mongo.employee.`empinfo` t2 where t1.`employee_id` = t2.`employee_id`
limit
>>>>> 1;
>>>>> Query failed: Failure while setting up Foreman. Internal error: while
>>>>> converting `t1`.`employee_id` = `t2`.`employee_id`
>>>>> [39eb6c88-fd21-4514-8903-48d99210b88d]
>>>>>
>>>>> 3. Join a mongodb table with a table with other storage engine would
>>>>> fail with CanNotPlanException:
>>>>>
>>>>> SELECT t1.first_name, t2.last_name FROM mongo.employee.`empinfo` t1,
>>>>> mongo.employee.`empinfo` t2 where t1.`employee_id` = t2.`employee_id`
limit
>>>>> 1;
>>>>> Query failed: Failure while setting up Foreman. Internal error: while
>>>>> converting `t1`.`employee_id` = `t2`.`employee_id`
>>>>> [39eb6c88-fd21-4514-8903-48d99210b88d]
>>>>>
>>>>> Error: exception while executing query: Failure while trying to get
>>>>> next result batch. (state=,code=0)
>>>>> 0: jdbc:drill:zk=local> SELECT t1.first_name, t1.last_name FROM
>>>>> mongo.employee.`empinfo` as t1, cp.`employee.json` t2 where t1.employee_id
>>>>> = t2.employee_id limit 10;
>>>>> Query failed: Failure while parsing sql. Node
>>>>> [rel#2496:Subset#5.LOGICAL.ANY([]).[]] could not be implemented; planner
>>>>> state:
>>>>>
>>>>> Root: rel#2496:Subset#5.LOGICAL.ANY([]).[]
>>>>> Original rel:
>>>>> ......
>>>>>
>>>>> 4. Select *, regular_column from mongodb would return the
>>>>> regular_column as null.
>>>>>
>>>>> 0: jdbc:drill:zk=local> SELECT first_name FROM
>>>>> mongo.employee.`empinfo` limit 2;
>>>>> +------------+
>>>>> | first_name |
>>>>> +------------+
>>>>> | Steve      |
>>>>> | Mary       |
>>>>> +------------+
>>>>> 2 rows selected (0.084 seconds)
>>>>> 0: jdbc:drill:zk=local> SELECT *, first_name FROM
>>>>> mongo.employee.`empinfo` limit 2;
>>>>> +------------+------------+
>>>>> |     *      | first_name |
>>>>> +------------+------------+
>>>>> | { "employee_id" : 1101 , "full_name" : "Steve Eurich" , "first_name"
>>>>> : "Steve" , "last_name" : "Eurich" , "position_id" : 16 , "position"
:
>>>>> "Store T" , "isFTE" : true} | null       |
>>>>> | { "employee_id" : 1102 , "full_name" : "Mary Pierson" , "first_name"
>>>>> : "Mary" , "last_name" : "Pierson" , "position_id" : 16 , "position"
:
>>>>> "Store T" , "isFTE" : true} | null       |
>>>>> +------------+------------+
>>>>>
>>>>>
>>>>>
>>>>> I think it would be fine to fix those issues in the next release.
>>>>>
>>>>>
>>>>> PS: could you please re-build a patch after rebasing on the recent
>>>>> master branch?
>>>>>
>>>>> - Jinfeng Ni
>>>>>
>>>>>
>>>>> On Sept. 24, 2014, 11:06 a.m., Anil Kumar B wrote:
>>>>> >
>>>>> > -----------------------------------------------------------
>>>>> > This is an automatically generated e-mail. To reply, visit:
>>>>> > https://reviews.apache.org/r/25996/
>>>>> > -----------------------------------------------------------
>>>>> >
>>>>> > (Updated Sept. 24, 2014, 11:06 a.m.)
>>>>> >
>>>>> >
>>>>> > Review request for drill, Aditya Kishore, Jacques Nadeau, and Kamesh
>>>>> B.
>>>>> >
>>>>> >
>>>>> > Repository: drill-git
>>>>> >
>>>>> >
>>>>> > Description
>>>>> > -------
>>>>> >
>>>>> > Mongo storage plugin support: The features which we implemented
as
>>>>> part of this is as follows.
>>>>> > 1) Support for sharded(chunk wise), shared-replicated(chunk wise),
>>>>> replicated, stand-alone
>>>>> > 2) Predicate pushdown
>>>>> > 3) Mongo PStore
>>>>> >
>>>>> > MongoRecordReader uses JsonReaderWithState in the case of non-star
>>>>> queries.
>>>>> >
>>>>> >
>>>>> > Diffs
>>>>> > -----
>>>>> >
>>>>> >   contrib/pom.xml 728038a
>>>>> >   contrib/storage-mongo/pom.xml PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/DrillMongoConstants.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoCnxnManager.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoCompareFunctionProcessor.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoFilterBuilder.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoGroupScan.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoPushDownFilterForScan.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoRecordReader.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoScanBatchCreator.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoScanSpec.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoStoragePlugin.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoStoragePluginConfig.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoSubScan.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoUtils.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/common/ChunkInfo.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/common/MongoCompareOp.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/config/MongoPStore.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/config/MongoPStoreProvider.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoDatabaseSchema.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoSchemaFactory.java
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/main/resources/bootstrap-storage-plugins.json
>>>>> PRE-CREATION
>>>>> >   contrib/storage-mongo/src/main/resources/drill-module.conf
>>>>> PRE-CREATION
>>>>> >
>>>>>  contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/TestMongoChunkAssignment.java
>>>>> PRE-CREATION
>>>>> >   distribution/pom.xml cd5df0d
>>>>> >   distribution/src/assemble/bin.xml 86e3802
>>>>> >
>>>>>  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
>>>>> 933bfbe
>>>>> >
>>>>>  exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>>>>> 4fa61e1
>>>>> >
>>>>>  exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java
>>>>> 4e12b8b
>>>>> >
>>>>>  exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReaderWithState.java
>>>>> ef995f8
>>>>> >
>>>>> > Diff: https://reviews.apache.org/r/25996/diff/
>>>>> >
>>>>> >
>>>>> > Testing
>>>>> > -------
>>>>> >
>>>>> > 1) Tested various set of queries on sharded, replicated and
>>>>> stand-alone modes.
>>>>> >
>>>>> > 2) Test Environment details: We created mongo cluster with 2 shards
>>>>> with a collections consists of 35 chunks(18 chunks are one shard and
>>>>> remaining chunks on on other shard). Below are the few queries which
we
>>>>> tested in all the environments.
>>>>> >
>>>>> >     a) SELECT * FROM mongo.employee.`empinfo` limit 10;
>>>>> >
>>>>> >       b) SELECT first_name, last_name FROM mongo.employee.`empinfo`
>>>>> limit 10;
>>>>> >
>>>>> >       c) SELECT first_name, last_name FROM mongo.employee.`empinfo`
>>>>> where employee_id = 1111;
>>>>> >
>>>>> >       d) SELECT * FROM mongo.employee.`empinfo` where full_name
=
>>>>> 'Phil Munoz';
>>>>> >
>>>>> >       e) SELECT first_name, last_name, position_id FROM
>>>>> mongo.employee.`empinfo` where employee_id = 1111  OR position_id = 16;
>>>>> >
>>>>> >       g) SELECT first_name, last_name FROM mongo.employee.`empinfo`
>>>>> where isFTE = true;
>>>>> >
>>>>> >       h) SELECT first_name, last_name, position_id FROM
>>>>> mongo.employee.`empinfo` where employee_id = 1107  AND position_id =
17 AND
>>>>> last_name = 'Yonce';
>>>>> >
>>>>> >
>>>>> > 3) PStore functionality not fully tested.
>>>>> >
>>>>> >
>>>>> > Thanks,
>>>>> >
>>>>> > Anil Kumar B
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Kamesh.
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message