hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Helmling <ghelml...@gmail.com>
Subject Re: hbase table as a queue.
Date Tue, 19 Jul 2011 18:27:55 GMT
All excellent points here in terms of tuning!  For the higher-level question
about using a table as a queue, I just wanted to add in a link to the Lily
guys' rowlog library, since it does exactly that:


On Tue, Jul 19, 2011 at 9:26 AM, Daniel Einspanjer

> Cool.  filed a task for us to work on that.
> https://bugzilla.mozilla.org/**show_bug.cgi?id=672527<https://bugzilla.mozilla.org/show_bug.cgi?id=672527>
> On 7/19/11 12:05 PM, Stack wrote:
>> Set region size very large (In trunk you can actually disable splitting).
>> St.Ack
>> On Tue, Jul 19, 2011 at 8:26 AM, Daniel Einspanjer
>> <deinspanjer@mozilla.com>  wrote:
>>> We use a queue table like this too and ran into the same problem.  How
>>> did
>>> you configure it such that it never splits?
>>> -Daniel
>>> On 7/16/11 4:24 PM, Stack wrote:
>>>> I learned friday that our fellas on the frontend are using an hbase
>>>> table to do simple queuing.  They insert stuff to be processed by
>>>> distributed processes and when processes are done with the work,
>>>> they'll remove the processed element from the hbase table.   They are
>>>> queuing, processing, and removing millions of items a day.  Elements
>>>> were added on the end of the queue (FIFO).
>>>> The issue to avoid was that over time, especially if a while between
>>>> major compactions, the latency was going up.  Turns out, the table had
>>>> been splitting when the queue backed.   Then a scan for new stuff to
>>>> process had to first traverse regions that had nought in them (the key
>>>> was time-based and the tail of the table had moved on past these first
>>>> regions).  This traversal, especially if no major compaction so lots
>>>> of deletes to process, was taking time to get to the first row.
>>>> To fix, we rid the table of its empty regions and made it so the table
>>>> would on longer split so only ever one region in it.  This should make
>>>> it so we don't end up with empty regions to skip through before we get
>>>> to the first element in the table (need the major compaction running
>>>> on a somewhat regular basis to temper latencies).  Will report back to
>>>> the list if we find otherwise.
>>>> Do not use locks.  Doesn't scale.  Maybe update a cell when task is
>>>> taken out for processing.  If too much time elapses since last update,
>>>> maybe give it out again?
>>>> St.Ack
>>>> On Sat, Jul 16, 2011 at 9:38 AM, Jack Levin<magnito@gmail.com>
>>>>  wrote:
>>>>> Hello, we are thinking about using Hbase table as a simple queue which
>>>>> will dispatch the work for a mapreduce job, as well as real time
>>>>> fetching of data to present to end user.  In simple terms, suppose you
>>>>> had a data source table and a queue table.  The queue table has a
>>>>> smaller set of Rows that point to Values which in turn point to
>>>>> Perma-set table, which has large collection of Rows.  (so Queue{Row,
>>>>> Value} ->    Perma-Set {Row, Value}).  Or Q-Value ->    P-Row.
>>>>> Goal is
>>>>> to look up which Rows to retrieve from the Perma-Set table by looking
>>>>> through the Queue.  Once the lookup into the Queue is done, the Row
>>>>> from the Queue must be deleted to avoid the same process of Perma-Set
>>>>> lookup be done twice; We expect many concurrent lookups to happen, so
>>>>> I assume the first thing we need to do is to have a client that does
>>>>> the work is acquire a lock on the Queue Row, process the work, then
>>>>> Remove the Queue Row.
>>>>> Has anyone done something similar before?  Any gotchas we should be
>>>>> away
>>>>> of?
>>>>> Thanks.
>>>>> -Jack

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message