hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Einspanjer <deinspan...@mozilla.com>
Subject Re: hbase table as a queue.
Date Tue, 19 Jul 2011 16:26:32 GMT
Cool.  filed a task for us to work on that.

On 7/19/11 12:05 PM, Stack wrote:
> Set region size very large (In trunk you can actually disable splitting).
> St.Ack
> On Tue, Jul 19, 2011 at 8:26 AM, Daniel Einspanjer
> <deinspanjer@mozilla.com>  wrote:
>> We use a queue table like this too and ran into the same problem.  How did
>> you configure it such that it never splits?
>> -Daniel
>> On 7/16/11 4:24 PM, Stack wrote:
>>> I learned friday that our fellas on the frontend are using an hbase
>>> table to do simple queuing.  They insert stuff to be processed by
>>> distributed processes and when processes are done with the work,
>>> they'll remove the processed element from the hbase table.   They are
>>> queuing, processing, and removing millions of items a day.  Elements
>>> were added on the end of the queue (FIFO).
>>> The issue to avoid was that over time, especially if a while between
>>> major compactions, the latency was going up.  Turns out, the table had
>>> been splitting when the queue backed.   Then a scan for new stuff to
>>> process had to first traverse regions that had nought in them (the key
>>> was time-based and the tail of the table had moved on past these first
>>> regions).  This traversal, especially if no major compaction so lots
>>> of deletes to process, was taking time to get to the first row.
>>> To fix, we rid the table of its empty regions and made it so the table
>>> would on longer split so only ever one region in it.  This should make
>>> it so we don't end up with empty regions to skip through before we get
>>> to the first element in the table (need the major compaction running
>>> on a somewhat regular basis to temper latencies).  Will report back to
>>> the list if we find otherwise.
>>> Do not use locks.  Doesn't scale.  Maybe update a cell when task is
>>> taken out for processing.  If too much time elapses since last update,
>>> maybe give it out again?
>>> St.Ack
>>> On Sat, Jul 16, 2011 at 9:38 AM, Jack Levin<magnito@gmail.com>    wrote:
>>>> Hello, we are thinking about using Hbase table as a simple queue which
>>>> will dispatch the work for a mapreduce job, as well as real time
>>>> fetching of data to present to end user.  In simple terms, suppose you
>>>> had a data source table and a queue table.  The queue table has a
>>>> smaller set of Rows that point to Values which in turn point to
>>>> Perma-set table, which has large collection of Rows.  (so Queue{Row,
>>>> Value} ->    Perma-Set {Row, Value}).  Or Q-Value ->    P-Row.   Our
Goal is
>>>> to look up which Rows to retrieve from the Perma-Set table by looking
>>>> through the Queue.  Once the lookup into the Queue is done, the Row
>>>> from the Queue must be deleted to avoid the same process of Perma-Set
>>>> lookup be done twice; We expect many concurrent lookups to happen, so
>>>> I assume the first thing we need to do is to have a client that does
>>>> the work is acquire a lock on the Queue Row, process the work, then
>>>> Remove the Queue Row.
>>>> Has anyone done something similar before?  Any gotchas we should be away
>>>> of?
>>>> Thanks.
>>>> -Jack

View raw message