Hi ,
I am use drill to query hbase. But its very slow , I think one of the
reason is when have multiple qualifiers the query plan scan all the data
instead of the columns specified in the SQL.
I debuged the process, and find out that the main logic is in calcite and
hard to update it.
Can I get the physical plan and update it at client side ? I do some
research , the class DrillWorker can get plan , but I have to init the
context etc.? Does there have easy way I can get the physical plan ? So I
can update it based on my query condition.
20160524 13:30 GMT+08:00 qiang li <tiredqiang@gmail.com>:
> I upgrade to latest cdh version which is HBase 1.2.0cdh5.7.0 and test it
> again, The result are correct now. Thanks for the help.
>
> Even though, I think the query plan still can be optimized.
>
> Here is what I think can improve:
> a. specify the columns when need
> b. remove the rowFilter when scan startRow and stopRow can meet it.
> for example : select row_key from hbase.browser_action a where
> a.row_key >'0' and a.row_key < '1' , I think sql like this will query
> faster without rowfilter.
>
>
>
> 20160524 9:29 GMT+08:00 qiang li <tiredqiang@gmail.com>:
>
>> Yes, Its seems like the same issue. I will upgrade it and test again. But
>> do you think we can update the physical plan too. If we only want to query
>> one qualifier, then the columns of the plan should only contains the
>> qualifier instead of "*". Maybe this plan will be query fast. Am I right?
>>
>> 20160523 23:38 GMT+08:00 Krystal Nguyen <knguyen@maprtech.com>:
>>
>>> Hi Qiang,
>>>
>>> Looks like you might be encountering this issue:
>>> https://issues.apache.org/jira/browse/DRILL4271
>>>
>>> Thanks
>>>
>>> On Sun, May 22, 2016 at 8:38 PM, qiang li <tiredqiang@gmail.com> wrote:
>>>
>>> > I test it step by step again. And finally I find out that the issue
>>> > happened only if the qualifier number is more than 3.
>>> >
>>> > It's werid, but this is the result I test.
>>> >
>>> > I tested about 10 thousands row of data. The length of the event is
>>> 6,the
>>> > code I used to test is like below:
>>> >
>>> > String[] earr = action.getEvent().geteArr();
>>> > for(int i=0;i<6;i++){
>>> > put.addColumn(family, Bytes.toBytes("e"+i),
>>> Bytes.toBytes(earr[i]));
>>> > }
>>> >
>>> > Then I test step by step like below :
>>> >
>>> > put.addColumn(family, Bytes.toBytes("e0"), Bytes.toBytes("e0"));
>>> > put.addColumn(family, Bytes.toBytes("e1"), Bytes.toBytes("e1"));
>>> > put.addColumn(family, Bytes.toBytes("e2"), Bytes.toBytes("e2"));
>>> > put.addColumn(family, Bytes.toBytes("e3"), Bytes.toBytes("e3"));
>>> >
>>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>>> > count(a.`v`.`e0`) p from hbase.browser_action a where a.row_key > '0'
>>> > group by a.`v`.`e0`;
>>> > +++
>>> >  k  p 
>>> > +++
>>> >  e0  3856 
>>> > +++
>>> >
>>> > 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`, 'UTF8') as k,
>>> > count(a.`v`.`e0`) p from hbase.browser_action a group by a.`v`.`e0`;
>>> > +++
>>> >  k  p 
>>> > +++
>>> >  e0  9094 
>>> > +++
>>> >
>>> >
>>> > if I put the qualifier "e3" into the table(that is more than 3
>>> qulifier),
>>> > then the issue reproduced.
>>> >
>>> >
>>> > 20160523 9:23 GMT+08:00 qiang li <tiredqiang@gmail.com>:
>>> >
>>> > > Sorry late.
>>> > >
>>> > > Yes, Hadoop 2.6.0cdh5.4.5 and HBase 1.0.0cdh5.4.5.
>>> > >
>>> > > 20160520 23:06 GMT+08:00 Krystal Nguyen <knguyen@maprtech.com>:
>>> > >
>>> > >> Qiang, Can you please let us know the hbase version and hadoop
>>> > >> distribution
>>> > >> version that you are using.
>>> > >>
>>> > >> On Fri, May 20, 2016 at 8:03 AM, Krystal Nguyen <
>>> knguyen@maprtech.com>
>>> > >> wrote:
>>> > >>
>>> > >> > Can you please let us know the hbase version and hadoop
>>> distribution
>>> > >> > version that you are using.
>>> > >> >
>>> > >> >
>>> > >> > On Fri, May 20, 2016 at 1:35 AM, qiang li <tiredqiang@gmail.com>
>>> > wrote:
>>> > >> >
>>> > >> >> Khurram , I send the mail again, the last mail forget
to cc to
>>> > >> >> user@drill.apache.org
>>> > >> >>
>>> > >> >> The main process is the same, but my rowkey is more complicate,
>>> > >> >> Here is the detail I tested.
>>> > >> >> rowkey is like this : [salt 1byte string] + [day 8byte
string] +
>>> > >> [event] +
>>> > >> >> [uid long] + [ts long]
>>> > >> >> also I have other qualifiers, only qualifier v:v is integer,
the
>>> > others
>>> > >> >> are
>>> > >> >> string.
>>> > >> >>
>>> > >> >> example:
>>> > >> >> hbase(main):004:0> scan 'browser_action2', { LIMIT
=> 1}
>>> > >> >> ROW
>>> > >> COLUMN+CELL
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e0,
>>> > >> >> timestamp=1461839343076, value=pay
>>> > >> >>
>>> > >> >>
>>> > >> >> 1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e1,
>>> > >> >> timestamp=1461839343076, value=bijia
>>> > >> >>
>>> > >> >>
>>> > >> >> 1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e2,
>>> > >> >> timestamp=1461839343076, value=browser
>>> > >> >>
>>> > >> >>
>>> > >> >> 1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e3,
>>> > >> >> timestamp=1461839343076, value=*
>>> > >> >>
>>> > >> >>
>>> > >> >> 1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e4,
>>> > >> >> timestamp=1461839343076, value=*
>>> > >> >>
>>> > >> >>
>>> > >> >> 1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:e5,
>>> > >> >> timestamp=1461839343076, value=*
>>> > >> >>
>>> > >> >>
>>> > >> >> 1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:h,
>>> > >> >> timestamp=1459771200000, value=20
>>> > >> >>
>>> > >> >>
>>> > >> >> 1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:m,
>>> > >> >> timestamp=1459771200000, value=0
>>> > >> >>
>>> > >> >>
>>> > >> >> 1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 020160404pay.bijia.browser\x00\x00qK>I\xD0w\x00\x00\x01S\xE
>>> > >> column=v:v,
>>> > >> >> timestamp=1459771200000, value=\x00\x00\x00\x17
>>> > >> >>
>>> > >> >>
>>> > >> >> 1$\xD2\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 1 row(s) in 0.0410 seconds
>>> > >> >>
>>> > >> >>
>>> > >> >> Here is the example I how the issue look like:
>>> > >> >>
>>> > >> >> hbase(main):69904:0> scan 'browser_action2', {COLUMNS
=>
>>> ['v:e0'],
>>> > >> >> STARTROW=> '0'}
>>> > >> >> ........
>>> > >> >> 920160410visit.bijia.test\xFFr\xC0o\x0B\x14\x0A\x16\x00\x00\
>>> > >> column=v:e0,
>>> > >> >> timestamp=1463723029448, value=visit
>>> > >> >>
>>> > >> >>
>>> > >> >> x01T\x00\x0A\xFA\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 920160410visit.bijia.test\xFF\x96\xE4\x0B\x9D\xAB]\x00\x00\
>>> > >> column=v:e0,
>>> > >> >> timestamp=1463723029217, value=visit
>>> > >> >>
>>> > >> >>
>>> > >> >> x01T\x00\x0A\xFA\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 920160410visit.bijia.test\xFF\xE3\x80\xFAac\xA6\xCF\x00\x00\
>>> > >> column=v:e0,
>>> > >> >> timestamp=1463723029295, value=visit
>>> > >> >>
>>> > >> >>
>>> > >> >> x01T\x00\x0A\xFA\x00
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> 9994 row(s) in 123.8650 seconds
>>> > >> >>
>>> > >> >> the drill result:
>>> > >> >> 0: jdbc:drill:zk=rfdc5> select CONVERT_FROM(a.`v`.`e0`,
'UTF8')
>>> as k,
>>> > >> >> count(a.`v`.`e0`) p from hbase.browser_action2 a where
a.row_key
>>> >
>>> > '0'
>>> > >> >> group by a.`v`.`e0`;
>>> > >> >> +++
>>> > >> >>  k  p 
>>> > >> >> +++
>>> > >> >>  visit  1216 
>>> > >> >> +++
>>> > >> >>
>>> > >> >>
>>> > >> >> I find out that if the row size larger than 10000 will
have the
>>> > issue.
>>> > >> >> The
>>> > >> >> result is right if less that 1000 rows. But not always
that way.
>>> > >> >> What I can make sure is if I updated the columns in the
physical
>>> plan
>>> > >> and
>>> > >> >> query by web UI , the result will be correct.
>>> > >> >>
>>> > >> >>
>>> > >> >> Thanks
>>> > >> >>
>>> > >> >> 20160520 13:58 GMT+08:00 Khurram Faraaz <kfaraaz@maprtech.com
>>> >:
>>> > >> >>
>>> > >> >> > Qiang, can you please take a look at DRILL4686 and
confirm if
>>> the
>>> > >> data
>>> > >> >> > set used in my repro is the same as the one you have
used. If
>>> the
>>> > >> data
>>> > >> >> set
>>> > >> >> > is different please let us know the type of data
that you have
>>> used
>>> > >> in
>>> > >> >> your
>>> > >> >> > table.
>>> > >> >> >
>>> > >> >> > Aman  I will try to repro the problem on Drill 1.6.0
and share
>>> > >> results.
>>> > >> >> >
>>> > >> >> > Thanks,
>>> > >> >> > Khurram
>>> > >> >> >
>>> > >> >> > On Thu, May 19, 2016 at 11:23 PM, Aman Sinha <
>>> amansinha@apache.org
>>> > >
>>> > >> >> wrote:
>>> > >> >> >
>>> > >> >> >> Khurram, DRILL4686 seems like a different issue...it
is
>>> > reporting
>>> > >> an
>>> > >> >> >> error whereas the original problem from qiang
was an incorrect
>>> > >> result.
>>> > >> >> >> Can
>>> > >> >> >> you use the same version (1.6) that he was using.
Also, is
>>> the
>>> > data
>>> > >> >> set
>>> > >> >> >> similar ? If you are unable to repro the exact
same issue,
>>> > perhaps
>>> > >> >> qiang
>>> > >> >> >> should file a JIRA with a smaller repro if possible.
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >> On Thu, May 19, 2016 at 8:35 AM, Khurram Faraaz
<
>>> > >> kfaraaz@maprtech.com>
>>> > >> >> >> wrote:
>>> > >> >> >>
>>> > >> >> >> > Hello Qiang,
>>> > >> >> >> >
>>> > >> >> >> > DRILL4686 is reported to track this problem.
>>> > >> >> >> >
>>> > >> >> >> > Thanks,
>>> > >> >> >> > Khurram
>>> > >> >> >> >
>>> > >> >> >> > On Wed, May 18, 2016 at 3:16 PM, qiang li
<
>>> tiredqiang@gmail.com
>>> > >
>>> > >> >> wrote:
>>> > >> >> >> >
>>> > >> >> >> >> Ok, Thanks very much.
>>> > >> >> >> >>
>>> > >> >> >> >> 20160518 17:44 GMT+08:00 Khurram Faraaz
<
>>> > kfaraaz@maprtech.com
>>> > >> >:
>>> > >> >> >> >>
>>> > >> >> >> >>> Hello Qiang,
>>> > >> >> >> >>>
>>> > >> >> >> >>> Someone from our Drill team (in
San Jose) will get back
>>> to you
>>> > >> >> soon. I
>>> > >> >> >> >>> work from the India lab and I am
in a different time zone
>>> as
>>> > >> >> compared
>>> > >> >> >> to
>>> > >> >> >> >>> San Jose office, some one from MapR
San Jose will get
>>> back to
>>> > >> you
>>> > >> >> as
>>> > >> >> >> soon
>>> > >> >> >> >>> as possible.
>>> > >> >> >> >>>
>>> > >> >> >> >>> Thanks,
>>> > >> >> >> >>> Khurram
>>> > >> >> >> >>>
>>> > >> >> >> >>> On Wed, May 18, 2016 at 3:09 PM,
qiang li <
>>> > tiredqiang@gmail.com
>>> > >> >
>>> > >> >> >> wrote:
>>> > >> >> >> >>>
>>> > >> >> >> >>>> Hi Khurram, Thanks very much
to reproduce it, so what's
>>> the
>>> > >> >> >> >>>> conclusion?
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> Any idea how to sovle it?
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> 20160518 17:02 GMT+08:00 Khurram
Faraaz <
>>> > >> kfaraaz@maprtech.com>:
>>> > >> >> >> >>>>
>>> > >> >> >> >>>>> So I tried to create the
table using HBase API (with no
>>> data
>>> > >> >> >> inserted
>>> > >> >> >> >>>>> into table) and I got the
query plan for drill 1.7.0
>>> > >> >> >> >>>>> Drill 1.7.0SNAPSHOT commit
ID : 09b26277
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp>
describe browser_action2;
>>> > >> >> >> >>>>> ++++
>>> > >> >> >> >>>>>  COLUMN_NAME  DATA_TYPE
 IS_NULLABLE 
>>> > >> >> >> >>>>> ++++
>>> > >> >> >> >>>>>  row_key  ANY
 NO 
>>> > >> >> >> >>>>>  v  MAP
 NO 
>>> > >> >> >> >>>>> ++++
>>> > >> >> >> >>>>> 2 rows selected (1.665 seconds)
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> Table creation Java program
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>> public class PutIntDataToHBase
{
>>> > >> >> >> >>>>> public static void main(String
args[]) throws
>>> > IOException
>>> > >> {
>>> > >> >> >> >>>>> Configuration conf
=
>>> HBaseConfiguration.create();
>>> > >> >> >> >>>>>
>>> > >> conf.set("hbase.zookeeper.property.clientPort","5181");
>>> > >> >> >> >>>>> HBaseAdmin admin
= new HBaseAdmin(conf);
>>> > >> >> >> >>>>> if (admin.tableExists("browser_action2"))
{
>>> > >> >> >> >>>>> admin.disableTable("browser_action2");
>>> > >> >> >> >>>>> admin.deleteTable("browser_action2");
>>> > >> >> >> >>>>> }
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> byte[][] SPLIT_KEYS
=
>>> > >> >> >> >>>>>
>>> > {{'0'},{'1'},{'2'},{'3'},{'4'},{'5'},{'6'},{'7'},{'8'},{'9'}};
>>> > >> >> >> >>>>> HTableDescriptor
tableDesc = new
>>> > >> >> >> >>>>>
>>> > >> >> HTableDescriptor(TableName.valueOf("browser_action2"));
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> tableDesc.addFamily(new
HColumnDescriptor("v"));
>>> > >> >> >> >>>>> admin.createTable(tableDesc,SPLIT_KEYS);
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> }
>>> > >> >> >> >>>>> }
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> Query plan for the query
that was reported as returning
>>> > wrong
>>> > >> >> >> results.
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp>
explain plan for select
>>> > >> >> >> >>>>> CONVERT_FROM(a.`v`.`e0`,
'UTF8') as k,
>>> count(a.`v`.`e0`) p
>>> > >> from
>>> > >> >> >> >>>>> hbase.browser_action2 a
where a.row_key > '0' group by
>>> > >> >> a.`v`.`e0`;
>>> > >> >> >> >>>>> +++
>>> > >> >> >> >>>>>  text  json 
>>> > >> >> >> >>>>> +++
>>> > >> >> >> >>>>>  0000 Screen
>>> > >> >> >> >>>>> 0001 Project(k=[$0],
p=[$1])
>>> > >> >> >> >>>>> 0002 UnionExchange
>>> > >> >> >> >>>>> 0101 Project(k=[CONVERT_FROMUTF8($0)],
p=[$1])
>>> > >> >> >> >>>>> 0102 HashAgg(group=[{0}],
p=[$SUM0($1)])
>>> > >> >> >> >>>>> 0103 Project($f0=[$0],
p=[$1])
>>> > >> >> >> >>>>> 0104 HashToRandomExchange(dist0=[[$0]])
>>> > >> >> >> >>>>> 0201 UnorderedMuxExchange
>>> > >> >> >> >>>>> 0301
Project($f0=[$0], p=[$1],
>>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > >> >> >> >>>>> 0302
HashAgg(group=[{0}],
>>> > p=[COUNT($0)])
>>> > >> >> >> >>>>> 0303
Project($f0=[ITEM($1,
>>> 'e0')])
>>> > >> >> >> >>>>> 0304
>>> > Scan(groupscan=[HBaseGroupScan
>>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec
[tableName=browser_action2,
>>> > >> >> >> startRow=0\x00,
>>> > >> >> >> >>>>> stopRow=, filter=null],
columns=[`*`]]])
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> and the query plan for the
other problem query
>>> mentioned in
>>> > >> the
>>> > >> >> >> first
>>> > >> >> >> >>>>> email.
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>> 0: jdbc:drill:schema=dfs.tmp>
explain plan for select
>>> > >> >> >> >>>>> CONVERT_FROM(BYTE_SUBSTR(a.row_key,
1 , 9), 'UTF8') as
>>> k,
>>> > >> >> >> >>>>> . . . . . . . . . . . .
. . > count(a.row_key) p from
>>> > >> >> >> >>>>> hbase.browser_action2 a
group by
>>> > >> >> >> >>>>> . . . . . . . . . . . .
. . > BYTE_SUBSTR(a.row_key, 1
>>> , 9);
>>> > >> >> >> >>>>> +++
>>> > >> >> >> >>>>>  text  json 
>>> > >> >> >> >>>>> +++
>>> > >> >> >> >>>>>  0000 Screen
>>> > >> >> >> >>>>> 0001 Project(k=[$0],
p=[$1])
>>> > >> >> >> >>>>> 0002 UnionExchange
>>> > >> >> >> >>>>> 0101 Project(k=[CONVERT_FROMUTF8($0)],
p=[$1])
>>> > >> >> >> >>>>> 0102 HashAgg(group=[{0}],
p=[$SUM0($1)])
>>> > >> >> >> >>>>> 0103 Project($f0=[$0],
p=[$1])
>>> > >> >> >> >>>>> 0104 HashToRandomExchange(dist0=[[$0]])
>>> > >> >> >> >>>>> 0201 UnorderedMuxExchange
>>> > >> >> >> >>>>> 0301
Project($f0=[$0], p=[$1],
>>> > >> >> >> >>>>> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > >> >> >> >>>>> 0302
HashAgg(group=[{0}],
>>> > p=[COUNT($1)])
>>> > >> >> >> >>>>> 0303
>>> Project($f0=[BYTE_SUBSTR($0, 1,
>>> > >> 9)],
>>> > >> >> >> >>>>> row_key=[$0])
>>> > >> >> >> >>>>> 0304
>>> > Scan(groupscan=[HBaseGroupScan
>>> > >> >> >> >>>>> [HBaseScanSpec=HBaseScanSpec
[tableName=browser_action2,
>>> > >> >> >> startRow=null,
>>> > >> >> >> >>>>> stopRow=null, filter=null],
columns=[`*`]]])
>>> > >> >> >> >>>>> {noformat}
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> Thanks,
>>> > >> >> >> >>>>> Khurram
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>> On Wed, May 18, 2016 at
7:01 AM, qiang li <
>>> > >> tiredqiang@gmail.com>
>>> > >> >> >> >>>>> wrote:
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>>> Yes.
>>> > >> >> >> >>>>>> I use hbase API to create
it.
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> The main code is:
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> byte[][] SPLIT_KEYS
= { {'0'}, {'1'}, {'2'}, {'3'},
>>> {'4'},
>>> > >> >> {'5'},
>>> > >> >> >> {'6'}, {'7'},{'8'}, {'9'} };
>>> > >> >> >> >>>>>> TableName tableName
=
>>> TableName.valueOf("browser_action2");
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> HTableDescriptor tableDesc
= new
>>> > HTableDescriptor(tableName);
>>> > >> >> >> >>>>>> HColumnDescriptor columnDesc
= new
>>> HColumnDescriptor("v");
>>> > >> >> >> >>>>>> tableDesc.addFamily(columnDesc);
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > columnDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>>> > >> >> >> >>>>>>
>>> columnDesc.setDataBlockEncoding(DataBlockEncoding.DIFF);
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> admin.createTable(tableDesc,
SPLIT_KEYS);
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>> 20160518 1:48 GMT+08:00
Zelaine Fong <
>>> zfong@maprtech.com
>>> > >:
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>> Can you provide
the CREATE TABLE statement you used to
>>> > >> >> reproduce
>>> > >> >> >> this
>>> > >> >> >> >>>>>>> problem so we can
try to reproduce it on our end.
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> Thanks.
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>>  Zelaine
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> On Tue, May 17,
2016 at 4:50 AM, qiang li <
>>> > >> >> tiredqiang@gmail.com>
>>> > >> >> >> >>>>>>> wrote:
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>> > Hi ,
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > I recently
meet a issue that can not query the
>>> correct
>>> > >> data
>>> > >> >> from
>>> > >> >> >> >>>>>>> hbase with
>>> > >> >> >> >>>>>>> > sql by drill,
can anybody help me.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > I test with
the drill 1.6.
>>> > >> >> >> >>>>>>> > My hbase scheme:
>>> > >> >> >> >>>>>>> > rowkey: salt+day+event+uid
+ ts , eg:
>>> > 120160411visituidts
>>> > >> >> >> >>>>>>> > cf : v
>>> > >> >> >> >>>>>>> > qualifier:
v, e0, e1
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > The wrong result
only happened when I use group by
>>> > clause.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > This sql will
not return correct result:
>>> > >> >> >> >>>>>>> > select CONVERT_FROM(a.`v`.`e0`,
'UTF8') as k,
>>> > >> >> count(a.`v`.`e0`)
>>> > >> >> >> p
>>> > >> >> >> >>>>>>> from
>>> > >> >> >> >>>>>>> > hbase.browser_action2
a where a.row_key > '0'
>>> group by
>>> > >> >> >> a.`v`.`e0`;
>>> > >> >> >> >>>>>>> > Part of explain
of this sql is:
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > 0: jdbc:drill:zk=rfdc5>
explain plan for select
>>> > >> >> >> >>>>>>> CONVERT_FROM(a.`v`.`e0`,
>>> > >> >> >> >>>>>>> > 'UTF8') as
k, count(a.`v`.`e0`) p from
>>> > >> hbase.browser_action2
>>> > >> >> a
>>> > >> >> >> >>>>>>> where
>>> > >> >> >> >>>>>>> > a.row_key >
'0' group by a.`v`.`e0`;
>>> > >> >> >> >>>>>>> > +++
>>> > >> >> >> >>>>>>> >  text  json

>>> > >> >> >> >>>>>>> > +++
>>> > >> >> >> >>>>>>> >  0000
Screen
>>> > >> >> >> >>>>>>> > 0001
Project(k=[$0], p=[$1])
>>> > >> >> >> >>>>>>> > 0002
UnionExchange
>>> > >> >> >> >>>>>>> > 0101
Project(k=[CONVERT_FROMUTF8($0)],
>>> p=[$1])
>>> > >> >> >> >>>>>>> > 0102
HashAgg(group=[{0}], p=[$SUM0($1)])
>>> > >> >> >> >>>>>>> > 0103
Project($f0=[$0], p=[$1])
>>> > >> >> >> >>>>>>> > 0104
>>> HashToRandomExchange(dist0=[[$0]])
>>> > >> >> >> >>>>>>> > 0201
UnorderedMuxExchange
>>> > >> >> >> >>>>>>> > 0301
Project($f0=[$0], p=[$1],
>>> > >> >> >> >>>>>>> > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
>>> > >> >> >> >>>>>>> > 0302
HashAgg(group=[{0}],
>>> > >> >> p=[COUNT($0)])
>>> > >> >> >> >>>>>>> > 0303
Project($f0=[ITEM($1,
>>> > 'e0')])
>>> > >> >> >> >>>>>>> > 0304
>>> > >> Scan(groupscan=[HBaseGroupScan
>>> > >> >> >> >>>>>>> > [HBaseScanSpec=HBaseScanSpec
>>> [tableName=browser_action2,
>>> > >> >> >> >>>>>>> startRow=0\x00,
>>> > >> >> >> >>>>>>> > stopRow=, filter=null],
columns=[`*`]]])
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > The data return
very quickly , the result of this
>>> sql
>>> > is :
>>> > >> >> >> >>>>>>> > +++
>>> > >> >> >> >>>>>>> >  k 
p 
>>> > >> >> >> >>>>>>> > +++
>>> > >> >> >> >>>>>>> >  pay  12180

>>> > >> >> >> >>>>>>> > ++
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > But I have
millons of data in the table.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > I tried to
change the physical plan. if I change
>>> the
>>> > json
>>> > >> >> >> explain
>>> > >> >> >> >>>>>>> > *"columns"
>>> > >> >> >> >>>>>>> > : [ "`*`" ]*
to *"columns" : [ "`v`.`e0`" ] *, it
>>> will
>>> > >> >> return
>>> > >> >> >> the
>>> > >> >> >> >>>>>>> correct
>>> > >> >> >> >>>>>>> > result.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > It seems the
physical plan is not correct.
>>> > >> >> >> >>>>>>> > I also try
to debug the sql parser to find out the
>>> > reason,
>>> > >> >> but
>>> > >> >> >> its
>>> > >> >> >> >>>>>>> too
>>> > >> >> >> >>>>>>> > complicate.
Can anyone help me.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>> > Also this sql
have the same issue.
>>> > >> >> >> >>>>>>> > select CONVERT_FROM(BYTE_SUBSTR(a.row_key,
1 , 9),
>>> > 'UTF8')
>>> > >> >> as
>>> > >> >> >> k,
>>> > >> >> >> >>>>>>> > count(a.row_key)
p from hbase.browser_action2 a
>>> group by
>>> > >> >> >> >>>>>>> > BYTE_SUBSTR(a.row_key,
1 , 9);
>>> > >> >> >> >>>>>>> > I change the
json explain *"columns" : [ "`*`" ]*
>>> to
>>> > >> >> >> *"columns" :
>>> > >> >> >> >>>>>>> [
>>> > >> >> >> >>>>>>> > "`row_key`"
] *, it will return the correct result.
>>> > >> >> >> >>>>>>> >
>>> > >> >> >> >>>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>>
>>> > >> >> >> >>>>>
>>> > >> >> >> >>>>
>>> > >> >> >> >>>
>>> > >> >> >> >>
>>> > >> >> >> >
>>> > >> >> >>
>>> > >> >> >
>>> > >> >> >
>>> > >> >>
>>> > >> >
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> >
>>>
>>
>>
>
