hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranov <alex.barano...@gmail.com>
Subject Re: setNumReduceTasks(1)
Date Fri, 29 Jan 2010 19:43:43 GMT
How big is N?  How big is outcome of Map job?

Alex.

On Fri, Jan 29, 2010 at 7:36 PM, Something Something <
mailinglists19@gmail.com> wrote:

> I am sorry, but I forgot to add one important piece of information.
>
> I don't want to write any random N rows to the table.  I want to write the
> *top* N rows - meaning - I want to write the "key" values of the Reducer in
> descending order.  Does this make sense?  Sorry for the confusion.
>
> On Wed, Jan 27, 2010 at 11:09 PM, Mridul Muralidharan <
> mridulm@yahoo-inc.com
> > wrote:
>
> >
> > A possible solution is to emit only N rows from each mapper and then use
> 1
> > reduce task [*] - if value of N is not very high.
> > So you end up with utmost m * N rows on reducer instead of full inputset
> -
> > and so the limit can be done easier.
> >
> >
> > If you ok with some sort of variance in the number of rows inserted (and
> if
> > value of N is very high), you can do more interesting things like N/m'
> rows
> > per mapper - and multiple reducers (r) : with assumtion that each reducer
> > will see atleast N/r rows - and so you can limit to N/r per reducer :
> > ofcourse, there is a possible error that gets introduced here ...
> >
> >
> > Regards,
> > Mridul
> >
> > [*] Assuming you just want simple limit - nothing else.
> > Also note, each mapper might want to emit N rows instead of 'tweaks' like
> > N/m rows, since it is possible that multiple mappers might have less than
> > N/m rows to emit to begin with !
> >
> >
> >
> > Something Something wrote:
> >
> >> If I set # of reduce tasks to 1 using setNumReduceTasks(1), would the
> >> class
> >> be instantiated only on one machine.. always?  I mean if I have a
> cluster
> >> of
> >> say 1 master, 10 workers & 3 zookeepers, is the Reducer class guaranteed
> >> to
> >> be instantiated only on 1 machine?
> >>
> >> If answer is yes, then I will use static variable as a counter to see
> how
> >> may rows have been added to my HBase table so far.  In my use case, I
> want
> >> to write only N number of rows to a table.  Is there a better way to do
> >> this?  Please let me know.  Thanks.
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message