hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: Very poor read performance with composite keys in hbase
Date Thu, 02 May 2013 04:01:31 GMT
Navis
        Thanks for the issue link. Currently the read queries will start MR
jobs as usual for reading from HBase. Correct?  Is there any plan for
supporting noMR?

-Anoop-


On Thu, May 2, 2013 at 7:09 AM, Navis류승우 <navis.ryu@nexr.com> wrote:

> Currently, hive storage handler reads rows one by one.
>
> https://issues.apache.org/jira/browse/HIVE-3603 is for setting cache
> size, which is not yet fixed.
>
> 2013/5/1 kulkarni.swarnim@gmail.com <kulkarni.swarnim@gmail.com>:
> > That depends on how dynamic your data is. If it is pretty static, you can
> > also consider using something like Create Table As Select (CTAS) to
> create a
> > snapshot of your data to HDFS and then run queries on top of that data.
> >
> > So your query might become something like:
> >
> > create table my_table as select * from event where key.name=’Signup’ and
> > key.dateCreated=’2013-03-06 16:39:55.353’ and
> > key.uid=’7af4c330-5988-4255-9250-924ce5864e3bf’;
> >
> > Since your data is now in HDFS, this should give you a considerable
> > performance boost.
> >
> >
> > On Tue, Apr 30, 2013 at 3:00 PM, Rupinder Singh <rsingh@care.com> wrote:
> >>
> >> Swarnim,
> >>
> >>
> >>
> >> Thanks. So this means custom map reduce is the viable option when
> working
> >> with hbase tables having composite keys, since it allows to set the
> start
> >> and stop keys. Hive+Hbase combination is out.
> >>
> >>
> >>
> >> Regards
> >>
> >> Rupinder
> >>
> >>
> >>
> >> From: kulkarni.swarnim@gmail.com [mailto:kulkarni.swarnim@gmail.com]
> >> Sent: Wednesday, May 01, 2013 12:17 AM
> >>
> >>
> >> To: user@hive.apache.org
> >> Cc: user@hbase.apache.org
> >> Subject: Re: Very poor read performance with composite keys in hbase
> >>
> >>
> >>
> >> Rupinder,
> >>
> >>
> >>
> >> Hive supports a filter pushdown[1] which means that the predicates in
> the
> >> where clause are pushed down to the storage handler level where either
> they
> >> get handled by the storage handler or delegated to hive if they cannot
> >> handle them. As of now, the HBaseStorageHandler only supports primitive
> >> types. So when you use strings as keys, behind the scenes they get
> converted
> >> to start and stop keys and restrict the hbase scan. This does not
> happen for
> >> structs. Hence you see a full table scan causing bad performance.
> >>
> >>
> >>
> >> [1] https://cwiki.apache.org/Hive/filterpushdowndev.html
> >>
> >>
> >>
> >> On Tue, Apr 30, 2013 at 1:04 PM, Sanjay Subramanian
> >> <Sanjay.Subramanian@wizecommerce.com> wrote:
> >>
> >> My experience with hive + hbase has been about 8x slower on an average.
> So
> >> I went ahead with hive only option.
> >>
> >> Sent from my iPhone
> >>
> >>
> >> On Apr 30, 2013, at 11:19 PM, "Rupinder Singh" <rsingh@care.com> wrote:
> >>
> >> Hi,
> >>
> >>
> >>
> >> I have an hbase cluster where I have a table with a composite key. I map
> >> this table to a Hive external table using which I insert/select data
> >> into/from this table:
> >>
> >> CREATE EXTERNAL TABLE event(key
> >> struct<name:string,dateCreated:string,uid:string>, {more columns here})
> >>
> >> ROW FORMAT DELIMITED
> >>
> >> COLLECTION ITEMS TERMINATED BY '~'
> >>
> >> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> >>
> >> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, other columns ")
> >>
> >> TBLPROPERTIES ("hbase.table.name" = "event");
> >>
> >>
> >>
> >> The table has about 10 million rows. When I do a select * using all 3
> >> components of the key, essentially selecting just 1 row, the response
> time
> >> is almost 700 sec, which seems pretty bad.
> >>
> >>
> >>
> >> For comparison purpose, I created another table with a simple string
> key,
> >> and the rest of the columns etc same. The key is a string UUID. Table
> has
> >> same number of column families and same number of rows.
> >>
> >> CREATE EXTERNAL TABLE test_event(key string, blah blah…..
> >>
> >> TBLPROPERTIES ("hbase.table.name" = "test_event");
> >>
> >>
> >>
> >> When I select a single row from this table by doing select * where
> >> key=’something’, the response time is 35 sec.
> >>
> >>
> >>
> >> This seems to indicate that in case of composite keys, there is a full
> >> table scan happening.  This seems weird.
> >>
> >>
> >>
> >> What am I missing here? Is there something special I need to do to get
> >> good read performance if I am using composite keys ?
> >>
> >> Insert performance in both cases is comparable and is as per
> expectation.
> >>
> >>
> >>
> >> Any help is appreciated.
> >>
> >> Here is the env spec:
> >>
> >>
> >>
> >> Amazon EMR
> >>
> >> Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz
> each.
> >> Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each
> >>
> >> Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master
> >> 3.75 GB RAM, 1 CPU of 1.8 GHz
> >>
> >>
> >>
> >> Thanks
> >>
> >> Rupinder
> >>
> >>
> >>
> >>
> >>
> >> This email is intended for the person(s) to whom it is addressed and may
> >> contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized
> >> use, distribution, copying, or disclosure by any person other than the
> >> addressee(s) is strictly prohibited. If you have received this email in
> >> error, please notify the sender immediately by return email and delete
> the
> >> message and any attachments from your system.
> >>
> >>
> >>
> >>
> >>
> >> CONFIDENTIALITY NOTICE
> >> ======================
> >> This email message and any attachments are for the exclusive use of the
> >> intended recipient(s) and may contain confidential and privileged
> >> information. Any unauthorized review, use, disclosure or distribution is
> >> prohibited. If you are not the intended recipient, please contact the
> sender
> >> by reply email and destroy all copies of the original message along
> with any
> >> attachments, from your computer system. If you are the intended
> recipient,
> >> please be advised that the content of this message is subject to access,
> >> review and disclosure by the sender's Email System Administrator.
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Swarnim
> >
> >
> >
> >
> > --
> > Swarnim
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message