hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Occasional regionserver crashes following socket errors writing to HDFS
Date Fri, 11 May 2012 01:28:01 GMT
Stack,

That section was written by Doug after he and I had the same debate man moons ago. 
While I can't say with absolute certainty that you shouldn't use a reducer, I can say is that
every situation where I have seen a M/R where you are writing to HBase, you end up not wanting
to use a reducer. If you want a clear and concise statement you can say that the rule of thumb
is that you don't want to use a reducer and that cases where you would need to first use a
reducer are the rare exception. 

The reason I ask people to think about this topic is that unless you have a really good foundation
in databases, not relying on a reducer is a bit counter intuitive. (Which is why I said that
you really need to clear your mind and focus on this issue. )

-Mike

PS. If you care to read the thread, I didn't become condescending until a certain individual
piped up about how refactoring the M/R was a 'distraction' to the issue at hand. 
Not to mention his flip response w the Google paper? 

On May 10, 2012, at 4:57 PM, Stack wrote:

> On Thu, May 10, 2012 at 11:59 AM, Michael Segel
> <michael_segel@hotmail.com> wrote:
>> Sigh.
>> 
>> Dave,
>> I really think you need to think more about the problem.
>> 
>> Think about what a reduce does and then think about what happens in side of HBase.
>> 
>> Then think about which runs faster... a job with two mappers writing the intermediate
and final results in HBase,
>> or a M/R job that writes its output to HBase.
>> 
>> If you really truly think about the problem, you will start to understand why I say
you really don't want to use a reducer when you're working w HBase.
>> 
> 
> We have a bit of doc that usually you might want to forego reduce
> phase, http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink.
> Do we need to add to it?  That said, you can't make an hard and fast
> rule that the reduce is to be avoided absolutely.  There will be cases
> where it makes sense (MR sort orthogonal to HBase's or a fat
> aggregating reduce, etc.)
> 
> St.Ack
> P.S. Hey Michael.  Go easy on the 'sighs'.  The participants in this
> thread have a clue.  I can testify to that.  Also, I know you don't
> mean it, but on occasion, both in this thread and in others I've seen
> you on, your tone can come across as condescending (and there is
> nothing like condescension for raising the rankles).  We all have our
> style's but you might want to review with this in mind before you hit
> send the next time.  Just a suggestion.
> 


Mime
View raw message