drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qiang li <tiredqi...@gmail.com>
Subject Re: query from hbase issue
Date Tue, 07 Jun 2016 08:38:33 GMT
Hi ,

I am use drill to query hbase. But its very slow , I think one of the
reason is when have multiple qualifiers the query plan scan all the data
instead of the columns specified in the SQL.
I debuged the process, and find out that the main logic is in calcite and
hard to update it.

Can I get the physical plan and update it at client side ? I do some
research , the class DrillWorker can get plan , but I have to init the
context etc.? Does there have easy way I can get the physical plan ? So I
can update it based on my query condition.


2016-05-24 13:30 GMT+08:00 qiang li <tiredqiang@gmail.com>:

> I upgrade to latest cdh version which is HBase 1.2.0-cdh5.7.0 and test it
> again, The result are correct now. Thanks for the help.
>
> Even though, I think the query plan still can be optimized.
>
> Here is what I think can improve:
> a. specify the columns when need
> b. remove the rowFilter when scan startRow and stopRow can meet it.
>     for example : select row_key from hbase.browser_action a where
> a.row_key >'0' and  a.row_key < '1' , I think sql like this will query
> faster without rowfilter.
>
>
>
> 2016-05-24 9:29 GMT+08:00 qiang li <tiredqiang@gmail.com>:
>
>> Yes, Its seems like the same issue. I will upgrade it and test again. But
>> do you think we can update the physical plan too.  If we only want to query
>> one qualifier,  then the columns of the plan should only contains the
>> qualifier instead of "*". Maybe this plan will be query fast. Am I right?
>>
>> 2016-05-23 23:38 GMT+08:00 Krystal Nguyen <knguyen@maprtech.com>:
>>
>>> Hi Qiang,
>>>
>>> Looks like you might be encountering this issue:
>>> https://issues.apache.org/jira/browse/DRILL-4271
>>>
>>> Thanks
>>>
>>> On Sun, May 22, 2016 at 8:38 PM, qiang li <tiredqiang@gmail.com> wrote:
>>>
>>> > I test it step by step again. And finally I find out that the issue
>>> > happened only if the qualifier number is more than 3.
>>> >
>>> > It's werid, but this is the result I test.
>>> >
>>> > I tested about 10 thousands row of data. The length of the event is
>>> 6,the
>>> > code I used to test is like below:
>>> >
>>> > String[] earr = action.getEvent().geteArr();
>>> > for(int i=0;i<6;i++){
>>> >     put.addColumn(family, Bytes.toBytes("e"+i),
>>> Bytes.toBytes(earr[i]));
>>> > }
>>> >
>>> > Then I test step by step like below :
>>> >
>>> > put.addColumn(family, Bytes.toBytes("e0"),  Bytes.toBytes("e0"));
>>> > put.addColumn(family, Bytes.toBytes("e1"),  Bytes.toBytes("e1"));
>>> > put.addColumn(family, Bytes.toBytes("e2"),  Bytes.toBytes("e2"));
>>> > put.addColumn(family, Bytes.toBytes("e3"),  Bytes.toBytes("e3"));
>>> >
>>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>>> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
>>> >  group by a.`v`.`e0`;
>>> > +-----+-------+
>>> > |  k  |   p   |
>>> > +-----+-------+
>>> > | e0  | 3856  |
>>> > +-----+-------+
>>> >
>>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>>> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
>>> > +-----+-------+
>>> > |  k  |   p   |
>>> > +-----+-------+
>>> > | e0  | 9094  |
>>> > +-----+-------+
>>> >
>>> >
>>> > if I put the qualifier "e3" into the table(that is more than 3
>>> qulifier),
>>> > then the issue reproduced.
>>> >
>>> >
>>> > 2016-05-23 9:23 GMT+08:00 qiang li <tiredqiang@gmail.com>:
>>> >
>>> > > Sorry late.
>>> > >
>>> > > Yes, Hadoop 2.6.0-cdh5.4.5 and HBase 1.0.0-cdh5.4.5.
>>> > >
>>> > > 2016-05-20 23:06 GMT+08:00 Krystal Nguyen <knguyen@maprtech.com>:
>>> > >
>>> > >> Qiang, Can you please let us know the hbase version and hadoop
>>> > >> distribution
>>> > >> version that you are using.
>>> > >>
>>> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <
>>> knguyen@maprtech.com>
>>> > >> wrote:
>>> > >>
>>> > >> > Can you please let us know the hbase version and hadoop
>>> distribution
>>> > >> > version that you are using.
>>> > >> >
>>> > >> >
>>> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li <tiredqiang@gmail.com>
>>> > wrote:
>>> > >> >
>>> > >> >> Khurram , I send the mail again, the last mail forget
to cc to
>>> > >> >> user@drill.apache.org
>>> > >> >>
>>> > >> >> The main process is the same, but my rowkey is more complicate,
>>> > >> >> Here is the detail I tested.
>>> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte
string] +
>>> > >> [event] +
>>> > >> >> [uid long] + [ts long]
>>> > >> >> also I have other qualifiers, only qualifier v:v is integer,
the
>>> > others
>>> > >> >> are
>>> > >> >> string.
>>> > >> >>
>>> > >> >> example:
>>> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT
=> 1}
>>> > >> >> ROW
>>> > >> COLUMN+CELL
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e0,
>>> > >> >> timestamp=1461839343076, value=pay
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e1,
>>> > >> >> timestamp=1461839343076, value=bijia
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e2,
>>> > >> >> timestamp=1461839343076, value=browser
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e3,
>>> > >> >> timestamp=1461839343076, value=*
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e4,
>>> > >> >> timestamp=1461839343076, value=*
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e5,
>>> > >> >> timestamp=1461839343076, value=*
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:h,
>>> > >> >> timestamp=1459771200000, value=20
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:m,
>>> > >> >> timestamp=1459771200000, value=0
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:v,
>>> > >> >> timestamp=1459771200000, value=\x00\x00\x00\x17
>>> > >> >>
>>> > >> >>
>>> > >> >>  1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 1 row(s) in 0.0410 seconds
>>> > >> >>
>>> > >> >>
>>> > >> >> Here is the example I how the issue look like:
>>> > >> >>
>>> > >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS
=>
>>> ['v:e0'],
>>> > >> >> STARTROW=> '0'}
>>> > >> >> ........
>>> > >> >>  920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
>>> > >> column=v:e0,
>>> > >> >> timestamp=1463723029448, value=visit
>>> > >> >>
>>> > >> >>
>>> > >> >>  x01T\x00\x0A\xFA\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  920160410visit.bijia.test\xFF\x96-\xE4\x0B\x9D\xAB]\x00\x00\
>>> > >> column=v:e0,
>>> > >> >> timestamp=1463723029217, value=visit
>>> > >> >>
>>> > >> >>
>>> > >> >>  x01T\x00\x0A\xFA\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>  920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
>>> > >> column=v:e0,
>>> > >> >> timestamp=1463723029295, value=visit
>>> > >> >>
>>> > >> >>
>>> > >> >>  x01T\x00\x0A\xFA\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 9994 row(s) in 123.8650 seconds
>>> > >> >>
>>> > >> >> the drill result:
>>> > >> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`,
'UTF8')
>>> as k,
>>> > >> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where
a.row_key
>>> >
>>> > '0'
>>> > >> >>  group by a.`v`.`e0`;
>>> > >> >> +--------+-------+
>>> > >> >> |   k    |   p   |
>>> > >> >> +--------+-------+
>>> > >> >> | visit  | 1216  |
>>> > >> >> +--------+-------+
>>> > >> >>
>>> > >> >>
>>> > >> >> I find out that if the row size larger than 10000  will
have the
>>> > issue.
>>> > >> >> The
>>> > >> >> result is right if less that 1000 rows. But not always
that way.
>>> > >> >> What I can make sure is if I updated the columns in the
physical
>>> plan
>>> > >> and
>>> > >> >> query by web UI , the result will be correct.
>>> > >> >>
>>> > >> >>
>>> > >> >> Thanks
>>> > >> >>
>>> > >> >> 2016-05-20 13:58 GMT+08:00 Khurram Faraaz <kfaraaz@maprtech.com
>>> >:
>>> > >> >>
>>> > >> >> > Qiang, can you please take a look at DRILL-4686 and
confirm if
>>> the
>>> > >> data
>>> > >> >> > set used in my repro is the same as the one you have
used. If
>>> the
>>> > >> data
>>> > >> >> set
>>> > >> >> > is different please let us know the type of data
that you have
>>> used
>>> > >> in
>>> > >> >> your
>>> > >> >> > table.
>>> > >> >> >
>>> > >> >> > Aman - I will try to repro the problem on Drill 1.6.0
and share
>>> > >> results.
>>> > >> >> >
>>> > >> >> > Thanks,
>>> > >> >> > Khurram
>>> > >> >> >
>>> > >> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <
>>> amansinha@apache.org
>>> > >
>>> > >> >> wrote:
>>> > >> >> >
>>> > >> >> >> Khurram,  DRILL-4686 seems like a different issue...it
is
>>> > reporting
>>> > >> an
>>> > >> >> >> error whereas the original problem from qiang
was an incorrect
>>> > >> result.
>>> > >> >> >> Can
>>> > >> >> >> you use the same version (1.6) that he was using.
 Also, is
>>> the
>>> > data
>>> > >> >> set
>>> > >> >> >> similar ? If you are unable to repro the exact
same issue,
>>> > perhaps
>>> > >> >> qiang
>>> > >> >> >> should file a JIRA with a smaller repro if possible.
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz
<
>>> > >> kfaraaz@maprtech.com>
>>> > >> >> >> wrote:
>>> > >> >> >>
>>> > >> >> >> > Hello Qiang,
>>> > >> >> >> >
>>> > >> >> >> > DRILL-4686 is reported to track this problem.
>>> > >> >> >> >
>>> > >> >> >> > Thanks,
>>> > >> >> >> > Khurram
>>> > >> >> >> >
>>> > >> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li
<
>>> tiredqiang@gmail.com
>>> > >
>>> > >> >> wrote:
>>> > >> >> >> >
>>> > >> >> >> >> Ok, Thanks very much.
>>> > >> >> >> >>
>>> > >> >> >> >> 2016-05-18 17:44 GMT+08:00 Khurram Faraaz
<
>>> > kfaraaz@maprtech.com
>>> > >> >:
>>> > >> >> >> >>
>>> > >> >> >> >>> Hello Qiang,
>>> > >> >> >> >>>
>>> > >> >> >> >>> Someone from our Drill team (in
San Jose) will get back
>>> to you
>>> > >> >> soon. I
>>> > >> >> >> >>> work from the India lab and I am
in a different time zone
>>> as
>>> > >> >> compared
>>> > >> >> >> to
>>> > >> >> >> >>> San Jose office, some one from MapR
San Jose will get
>>> back to
>>> > >> you
>>> > >> >> as
>>> > >> >> >> soon
>>> > >> >> >> >>> as possible.
>>> > >> >> >> >>>
>>> > >> >> >> >>> Thanks,
>>> > >> >> >> >>> Khurram
>>> > >> >> >> >>>
>>> > >> >> >> >>> On Wed, May 18, 2016 at 3:09 PM,
qiang li <
>>> > tiredqiang@gmail.com
>>> > >> >
>>> > >> >> >> wrote:
>>> > >> >> >> >>>
>>> > >> >> >> >>>> Hi Khurram, Thanks very much
to reproduce it, so what's
>>> the
>>> > >> >> >> >>>> conclusion?
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> Any idea how to sovle it?
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> 2016-05-18 17:02 GMT+08:00 Khurram
Faraaz <
>>> > >> kfaraaz@maprtech.com>:
>>> > >> >> >> >>>>
>>> > >> >> >> >>>>> So I tried to create the
table using HBase API (with no
>>> data
>>> > >> >> >> inserted
>>> > >> >> >> >>>>> into table) and I got the
query plan for drill 1.7.0
>>> > >> >> >> >>>>> Drill 1.7.0-SNAPSHOT  commit
ID :  09b26277
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp>
describe browser_action2;
>>> > >> >> >> >>>>> +--------------+------------+--------------+
>>> > >> >> >> >>>>> | COLUMN_NAME  | DATA_TYPE
 | IS_NULLABLE  |
>>> > >> >> >> >>>>> +--------------+------------+--------------+
>>> > >> >> >> >>>>> | row_key      | ANY   
    | NO           |
>>> > >> >> >> >>>>> | v            | MAP   
    | NO           |
>>> > >> >> >> >>>>> +--------------+------------+--------------+
>>> > >> >> >> >>>>> 2 rows selected (1.665 seconds)
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> Table creation Java program
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>> public class PutIntDataToHBase
{
>>> > >> >> >> >>>>>     public static void main(String
args[]) throws
>>> > IOException
>>> > >> {
>>> > >> >> >> >>>>>         Configuration conf
=
>>> HBaseConfiguration.create();
>>> > >> >> >> >>>>>
>>> > >>  conf.set("hbase.zookeeper.property.clientPort","5181");
>>> > >> >> >> >>>>>         HBaseAdmin admin
= new HBaseAdmin(conf);
>>> > >> >> >> >>>>>         if (admin.tableExists("browser_action2"))
{
>>> > >> >> >> >>>>>             admin.disableTable("browser_action2");
>>> > >> >> >> >>>>>             admin.deleteTable("browser_action2");
>>> > >> >> >> >>>>>         }
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>>         byte[][] SPLIT_KEYS
=
>>> > >> >> >> >>>>>
>>> > {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>>> > >> >> >> >>>>>         HTableDescriptor
tableDesc = new
>>> > >> >> >> >>>>>
>>> > >> >>  HTableDescriptor(TableName.valueOf("browser_action2"));
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>>         tableDesc.addFamily(new
HColumnDescriptor("v"));
>>> > >> >> >> >>>>>         admin.createTable(tableDesc,SPLIT_KEYS);
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>>     }
>>> > >> >> >> >>>>> }
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> Query plan for the query
that was reported as returning
>>> > wrong
>>> > >> >> >> results.
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp>
explain plan for select
>>> > >> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`,
'UTF8') as k,
>>> count(a.`v`.`e0`) p
>>> > >> from
>>> > >> >> >> >>>>> hbase.browser_action2 a
where a.row_key > '0'  group by
>>> > >> >> a.`v`.`e0`;
>>> > >> >> >> >>>>> +------+------+
>>> > >> >> >> >>>>> | text | json |
>>> > >> >> >> >>>>> +------+------+
>>> > >> >> >> >>>>> | 00-00    Screen
>>> > >> >> >> >>>>> 00-01      Project(k=[$0],
p=[$1])
>>> > >> >> >> >>>>> 00-02        UnionExchange
>>> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)],
p=[$1])
>>> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}],
p=[$SUM0($1)])
>>> > >> >> >> >>>>> 01-03              Project($f0=[$0],
p=[$1])
>>> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
>>> > >> >> >> >>>>> 03-01                  
 Project($f0=[$0], p=[$1],
>>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > >> >> >> >>>>> 03-02                  
   HashAgg(group=[{0}],
>>> > p=[COUNT($0)])
>>> > >> >> >> >>>>> 03-03                  
     Project($f0=[ITEM($1,
>>> 'e0')])
>>> > >> >> >> >>>>> 03-04
>>> > Scan(groupscan=[HBaseGroupScan
>>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec
[tableName=browser_action2,
>>> > >> >> >> startRow=0\x00,
>>> > >> >> >> >>>>> stopRow=, filter=null],
columns=[`*`]]])
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> and the query plan for the
other problem query
>>> mentioned in
>>> > >> the
>>> > >> >> >> first
>>> > >> >> >> >>>>> email.
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp>
explain plan for select
>>> > >> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key,
1 , 9), 'UTF8')  as
>>> k,
>>> > >> >> >> >>>>> . . . . . . . . . . . .
. . > count(a.row_key) p from
>>> > >> >> >> >>>>> hbase.browser_action2 a
group by
>>> > >> >> >> >>>>> . . . . . . . . . . . .
. . > BYTE_SUBSTR(a.row_key, 1
>>> , 9);
>>> > >> >> >> >>>>> +------+------+
>>> > >> >> >> >>>>> | text | json |
>>> > >> >> >> >>>>> +------+------+
>>> > >> >> >> >>>>> | 00-00    Screen
>>> > >> >> >> >>>>> 00-01      Project(k=[$0],
p=[$1])
>>> > >> >> >> >>>>> 00-02        UnionExchange
>>> > >> >> >> >>>>> 01-01          Project(k=[CONVERT_FROMUTF8($0)],
p=[$1])
>>> > >> >> >> >>>>> 01-02            HashAgg(group=[{0}],
p=[$SUM0($1)])
>>> > >> >> >> >>>>> 01-03              Project($f0=[$0],
p=[$1])
>>> > >> >> >> >>>>> 01-04                HashToRandomExchange(dist0=[[$0]])
>>> > >> >> >> >>>>> 02-01                  UnorderedMuxExchange
>>> > >> >> >> >>>>> 03-01                  
 Project($f0=[$0], p=[$1],
>>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > >> >> >> >>>>> 03-02                  
   HashAgg(group=[{0}],
>>> > p=[COUNT($1)])
>>> > >> >> >> >>>>> 03-03
>>> Project($f0=[BYTE_SUBSTR($0, 1,
>>> > >> 9)],
>>> > >> >> >> >>>>> row_key=[$0])
>>> > >> >> >> >>>>> 03-04
>>> > Scan(groupscan=[HBaseGroupScan
>>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec
[tableName=browser_action2,
>>> > >> >> >> startRow=null,
>>> > >> >> >> >>>>> stopRow=null, filter=null],
columns=[`*`]]])
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> Thanks,
>>> > >> >> >> >>>>> Khurram
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> On Wed, May 18, 2016 at
7:01 AM, qiang li <
>>> > >> tiredqiang@gmail.com>
>>> > >> >> >> >>>>> wrote:
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>>> Yes.
>>> > >> >> >> >>>>>> I use hbase API to create
it.
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> The main code is:
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> byte[][] SPLIT_KEYS
= { {'0'}, {'1'}, {'2'}, {'3'},
>>> {'4'},
>>> > >> >> {'5'},
>>> > >> >> >> {'6'}, {'7'},{'8'}, {'9'} };
>>> > >> >> >> >>>>>> TableName tableName
=
>>> TableName.valueOf("browser_action2");
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> HTableDescriptor tableDesc
= new
>>> > HTableDescriptor(tableName);
>>> > >> >> >> >>>>>> HColumnDescriptor columnDesc
= new
>>> HColumnDescriptor("v");
>>> > >> >> >> >>>>>> tableDesc.addFamily(columnDesc);
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>>> > >> >> >> >>>>>>
>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> admin.createTable(tableDesc,
SPLIT_KEYS);
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> 2016-05-18 1:48 GMT+08:00
Zelaine Fong <
>>> zfong@maprtech.com
>>> > >:
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>> Can you provide
the CREATE TABLE statement you used to
>>> > >> >> reproduce
>>> > >> >> >> this
>>> > >> >> >> >>>>>>> problem so we can
try to reproduce it on our end.
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> Thanks.
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> -- Zelaine
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> On Tue, May 17,
2016 at 4:50 AM, qiang li <
>>> > >> >> tiredqiang@gmail.com>
>>> > >> >> >> >>>>>>> wrote:
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> > Hi ,
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > I recently
meet a issue that can not query the
>>> correct
>>> > >> data
>>> > >> >> from
>>> > >> >> >> >>>>>>> hbase with
>>> > >> >> >> >>>>>>> > sql by drill,
can anybody help me.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > I test with
the drill 1.6.
>>> > >> >> >> >>>>>>> > My hbase scheme:
>>> > >> >> >> >>>>>>> > rowkey: salt+day+event+uid
+ ts , eg:
>>> > 120160411visituidts
>>> > >> >> >> >>>>>>> > cf : v
>>> > >> >> >> >>>>>>> > qualifier:
v, e0, e1
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > The wrong result
only happened when I use group by
>>> > clause.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > This sql will
not return correct result:
>>> > >> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`,
'UTF8') as k,
>>> > >> >> count(a.`v`.`e0`)
>>> > >> >> >> p
>>> > >> >> >> >>>>>>> from
>>> > >> >> >> >>>>>>> > hbase.browser_action2
a where a.row_key > '0'
>>> group by
>>> > >> >> >> a.`v`.`e0`;
>>> > >> >> >> >>>>>>> > Part of explain
of this sql is:
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5>
explain plan for select
>>> > >> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>>> > >> >> >> >>>>>>> > 'UTF8') as
k, count(a.`v`.`e0`) p from
>>> > >> hbase.browser_action2
>>> > >> >> a
>>> > >> >> >> >>>>>>> where
>>> > >> >> >> >>>>>>> > a.row_key >
'0'  group by a.`v`.`e0`;
>>> > >> >> >> >>>>>>> > +------+------+
>>> > >> >> >> >>>>>>> > | text | json
|
>>> > >> >> >> >>>>>>> > +------+------+
>>> > >> >> >> >>>>>>> > | 00-00   
Screen
>>> > >> >> >> >>>>>>> > 00-01     
Project(k=[$0], p=[$1])
>>> > >> >> >> >>>>>>> > 00-02     
  UnionExchange
>>> > >> >> >> >>>>>>> > 01-01     
    Project(k=[CONVERT_FROMUTF8($0)],
>>> p=[$1])
>>> > >> >> >> >>>>>>> > 01-02     
      HashAgg(group=[{0}], p=[$SUM0($1)])
>>> > >> >> >> >>>>>>> > 01-03     
        Project($f0=[$0], p=[$1])
>>> > >> >> >> >>>>>>> > 01-04
>>> HashToRandomExchange(dist0=[[$0]])
>>> > >> >> >> >>>>>>> > 02-01     
            UnorderedMuxExchange
>>> > >> >> >> >>>>>>> > 03-01     
              Project($f0=[$0], p=[$1],
>>> > >> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > >> >> >> >>>>>>> > 03-02     
                HashAgg(group=[{0}],
>>> > >> >> p=[COUNT($0)])
>>> > >> >> >> >>>>>>> > 03-03     
                  Project($f0=[ITEM($1,
>>> > 'e0')])
>>> > >> >> >> >>>>>>> > 03-04
>>> > >> Scan(groupscan=[HBaseGroupScan
>>> > >> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec
>>> [tableName=browser_action2,
>>> > >> >> >> >>>>>>> startRow=0\x00,
>>> > >> >> >> >>>>>>> > stopRow=, filter=null],
columns=[`*`]]])
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > The data return
very quickly , the result of this
>>> sql
>>> > is :
>>> > >> >> >> >>>>>>> > +------+--------+
>>> > >> >> >> >>>>>>> > |  k   |  
p    |
>>> > >> >> >> >>>>>>> > +------+--------+
>>> > >> >> >> >>>>>>> > | pay  | 12180
 |
>>> > >> >> >> >>>>>>> > +------+--------
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > But I have
millons of data in the table.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > I tried to
change the physical plan.  if I change
>>> the
>>> > json
>>> > >> >> >> explain
>>> > >> >> >> >>>>>>> > *"columns"
>>> > >> >> >> >>>>>>> > : [ "`*`" ]*
 to *"columns" : [ "`v`.`e0`" ] *, it
>>> will
>>> > >> >> return
>>> > >> >> >> the
>>> > >> >> >> >>>>>>> correct
>>> > >> >> >> >>>>>>> > result.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > It seems the
physical plan is not correct.
>>> > >> >> >> >>>>>>> > I also try
to debug the sql parser to find out the
>>> > reason,
>>> > >> >> but
>>> > >> >> >> its
>>> > >> >> >> >>>>>>> too
>>> > >> >> >> >>>>>>> > complicate.
Can anyone help me.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > Also this sql
have the same issue.
>>> > >> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key,
1 , 9),
>>> > 'UTF8')
>>> > >> >> as
>>> > >> >> >> k,
>>> > >> >> >> >>>>>>> > count(a.row_key)
p from hbase.browser_action2 a
>>> group by
>>> > >> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key,
1 , 9);
>>> > >> >> >> >>>>>>> > I change the
json explain *"columns" : [ "`*`" ]*
>>> to
>>> > >> >> >> *"columns" :
>>> > >> >> >> >>>>>>> [
>>> > >> >> >> >>>>>>> > "`row_key`"
] *, it will return the correct result.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>
>>> > >> >> >> >>>
>>> > >> >> >> >>
>>> > >> >> >> >
>>> > >> >> >>
>>> > >> >> >
>>> > >> >> >
>>> > >> >>
>>> > >> >
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message