hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Occasional regionserver crashes following socket errors writing to HDFS
Date Thu, 10 May 2012 18:59:13 GMT

I really think you need to think more about the problem.

Think about what a reduce does and then think about what happens in side of HBase.

Then think about which runs faster... a job with two mappers writing the intermediate and
final results in HBase, 
or a M/R job that writes its output to HBase.

If you really truly think about the problem, you will start to understand why I say you really
don't want to use a reducer when you're working w HBase. 

On May 10, 2012, at 1:41 PM, Dave Revell wrote:

> Some examples of when you'd want a reducer:
> http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf
> On Thu, May 10, 2012 at 11:30 AM, Michael Segel
> <michael_segel@hotmail.com>wrote:
>> Dave, do you really want to go there?
>> OP has a couple of issues and he was going down a rabbit hole.
>> (You can choose if that's a reference to 'the Matrix, Jefferson Starship,
>> Alice in Wonderland... or all of the above)
>> So to put him on the correct path, I recommended the following, not in any
>> order...
>> 1) Increase his region size for this table only.
>> 2) Look to decreasing the number of regions managed by a RS (which is why
>> you increase region size)
>> 3) Up the dfs.balance.bandwidthPerSec. (How often does HBase move regions
>> and how exactly do they move regions ?)
>> 4) Look at implementing MSLABS and GC tuning. This cuts down on the
>> overhead.
>> 5) Refactoring his job....
>> Oops.
>> Ok I didn't put that in the list.
>> But that was the last thing I wrote as a separate statement.
>> Clearly you didn't take my advice and think about the problem....
>> To prove a point.... you wrote:
>> 'Many mapreduce algorithms require a reduce phase (e.g. sorting)'
>> Ok. So tell me why you would want to sort your input in to HBase and if
>> that's really a good thing?
>> Oops!... :-)
>> On May 10, 2012, at 12:31 PM, Dave Revell wrote:
>>> This "you don't need a reducer" conversation is distracting from the real
>>> problem and is false.
>>> Many mapreduce algorithms require a reduce phase (e.g. sorting). The fact
>>> that the output is written to HBase or somewhere else is irrelevant.
>>> -Dave
>>> On Thu, May 10, 2012 at 6:26 AM, Michael Segel <
>> michael_segel@hotmail.com>wrote:
>>> [SNIP]

View raw message