hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Possible Aggregator Problem
Date Wed, 17 Apr 2013 11:18:52 GMT
Hi Steven,

the AverageAggregator is used to determine the average of all absolute
differences between old pagerank and new pagerank for every vertex.
This is documented like it should behave in the javadoc of the given
classes and suffices to track if pagerank values have yet converged or not.

What you describe is a perfectly valid way to track the pagerank difference
throughout all supersteps. But this is not how (imho) the AverageAggregator
should behave, so you have to write your own.


2013/4/17 Steven van Beelen <smcvbeelen@gmail.com>

> The values in my case are the DoubleWritable values each vertice has and
> the aggregators aggregate on.
> My tests showed that, when the aggregator was set to AverageAggregator, the
> average of all the vertice values from the past compute step were returned.
> Actually, AverageAggregator should return the average difference of all the
> old-new value pairs of every vertice instead of the mean.
> The average difference is then used to check whether convergence is
> reached, which is relevant for all task ofcourse.
>
> Hence, the convergence point, for which the Aggregator is used, will not be
> reached.
> This thus makes it so that the algorithm will just run the maximum number
> of iterations set (30 iterations on the PageRank example) in every case.
> I experienced the same with my own PageRank implementation.
>
> I think it has something to do with the finalizeAggregation step taken.
> Next to that, both the 'aggregate(VERTEX vertex, M value)' and
> 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called every
> time, were one would think only the second (with old/new values) would
> suffice.
> Because of this, the global variable 'absoluteDifference' in the
> 'AbsDiffAggregator' class is overwriten/overruled by the first aggregate.
> Additionally, if one would make its own Aggregation class in the same
> fashion as AbsDiffAggregator and AverageAggregator, but leave out the
> 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000
> every time.
>
> I hope I made myself clear.
> Regards
>
>
> On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > Thanks for your report.
> >
> > What's the meaning of 'all the values'? Please give me more details
> > about your problem.
> >
> > I didn't look at 'dangling links & aggregators' part of PageRank
> > example closely, but I think there's no bug. Aggregators is just used
> > for global communication. For example, finding max value[1] can be
> > done in only one iteration using MaxValueAggregator.
> >
> > 1. http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
> >
> > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <smcvbeelen@gmail.com
> >
> > wrote:
> > > Hello,
> > >
> > > I'm creating my own pagerank in hama for a testing and I think I found
> a
> > > problem with the AverageAggregator. I'm not sure if it is me or the the
> > > AverageAggregator class in general, but I believe it just returns the
> > mean
> > > of all the values instead of the average difference between the old and
> > new
> > > value as intended.
> > >
> > > For testing, I created my own AbsDiffAggregator and AverageAggregator
> > > classes, using FloatWritable instead of DoubleWritables. The same
> problem
> > > still occured: I got a mean of all the values in the graph instead of
> an
> > > average difference.
> > >
> > > Could someone tell me if I'm doing something wrong or what I should
> > provide
> > > to better explain my problem?
> > >
> > > Regards,
> > > Steven van Beelen, Vrije Universiteit of Amsterdam
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message