hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Connolly Juhani <juh...@ninja.co.jp>
Subject Re: Very slow Scan performance using Filters
Date Thu, 12 May 2011 06:12:27 GMT
By naming rows from the timestamp the rowids are going to all be sequential
when inserting. So all new inserts will be going into the same region. When
checking the last 30 days you will also be reading from the same region
where all the writing is happening, i.e the one that is already busy writing
the edit log for all those entries. You might want to consider an
alternative method of naming your rows that would result in more distributed
reading/writing.
However since you are naming rows by timestamps, you should be able to
restrict the scan by a start and end date. You are doing this, right? If
you're not, you are scanning every row in the table when you only need the
rows from end-start.

Someone may need to correct me, but based on my memory of the implementation
scans are entirely sequential, so region a gets scanned, then b, then c. You
could speed this up by scanning multiple regions in parallel processes and
merging the results.

On 12 May 2011 14:36, Himanish Kushary <himanish@gmail.com> wrote:

> Hi,
>
> We have a table split across multiple regions(approx 50-60 regions for 64
> MB
> split size) with rowid schema as
> [ReverseTimestamp/itemtimestamp/customerid/itemid].This stores the
> activities for an item for a customer.We have lots of data for lots of item
> for a custoer in this table.
>
> When we try to lookup activities for an item for the last 30 days from this
> table , we are using a Scan with RowFilter and RegexComparator.The scan
> takes a lot of time ( almost 15-20 secs) to get us the activities for an
> item.
>
> We are hooked up to HBase tables directly from a web application,so this
> response time of around 20 secs is unacceptable.We also noticed that
> whenever we do any scan kind of operation it is never in acceptable ranges
> for a web application.
>
> Are we doing something wrong ? If Hbase scans are so slow then it would be
> real hard to hook it up directly with any web application.
>
> Could somebody please suggest how to improve this or some other
> options(design,architectural) to remedy this kind of issues dealing with
> lot
> of data.
>
> Note: We have tried with setCaching,SingleColumnValueFilter to no
> significant effect.
>
> ---------------------------
> Thanks & Regards
> Himanish
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message