hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven van Beelen <smcvbee...@gmail.com>
Subject Re: Possible Aggregator Problem
Date Tue, 23 Apr 2013 12:41:39 GMT
Could anyone tell me if I'm correct concerning the possible problem I
posted and replied on in the previous two emails?


On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <smcvbeelen@gmail.com>wrote:

> Additionally, I found this in the mail archives:
>
> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
> This actually exactly covers my point. Is this still considered as a bug,
> calling two different aggregate functions in a row?
>
>
> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <smcvbeelen@gmail.com>wrote:
>
>> Hi Thomas,
>>
>> Then I guess I did not explain myself clearly.
>> What you describe is indeed how I think of the AverageAggregator to work,
>> but if I use the AverageAggregator in my own PageRank implementation it
>> does not return
>> the average of all absolute differences but just the average of the sum
>> of all values.
>>
>> The (very) small example graph I use has only five vertices, were the sum
>> of every vertice it's value is always 1.0.
>> When I use the AverageAggregator it will always return 0.2 when calling
>> the getLastAggregatedValue method.
>> It shouldn't do that right?
>>
>>
>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
>> thomas.jungblut@gmail.com> wrote:
>>
>>> Hi Steven,
>>>
>>> the AverageAggregator is used to determine the average of all absolute
>>> differences between old pagerank and new pagerank for every vertex.
>>> This is documented like it should behave in the javadoc of the given
>>> classes and suffices to track if pagerank values have yet converged or
>>> not.
>>>
>>> What you describe is a perfectly valid way to track the pagerank
>>> difference
>>> throughout all supersteps. But this is not how (imho) the
>>> AverageAggregator
>>> should behave, so you have to write your own.
>>>
>>>
>>> 2013/4/17 Steven van Beelen <smcvbeelen@gmail.com>
>>>
>>> > The values in my case are the DoubleWritable values each vertice has
>>> and
>>> > the aggregators aggregate on.
>>> > My tests showed that, when the aggregator was set to
>>> AverageAggregator, the
>>> > average of all the vertice values from the past compute step were
>>> returned.
>>> > Actually, AverageAggregator should return the average difference of
>>> all the
>>> > old-new value pairs of every vertice instead of the mean.
>>> > The average difference is then used to check whether convergence is
>>> > reached, which is relevant for all task ofcourse.
>>> >
>>> > Hence, the convergence point, for which the Aggregator is used, will
>>> not be
>>> > reached.
>>> > This thus makes it so that the algorithm will just run the maximum
>>> number
>>> > of iterations set (30 iterations on the PageRank example) in every
>>> case.
>>> > I experienced the same with my own PageRank implementation.
>>> >
>>> > I think it has something to do with the finalizeAggregation step taken.
>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and
>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are called
>>> every
>>> > time, were one would think only the second (with old/new values) would
>>> > suffice.
>>> > Because of this, the global variable 'absoluteDifference' in the
>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
>>> aggregate.
>>> > Additionally, if one would make its own Aggregation class in the same
>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave out the
>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be 0.0000
>>> > every time.
>>> >
>>> > I hope I made myself clear.
>>> > Regards
>>> >
>>> >
>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <
>>> edwardyoon@apache.org
>>> > >wrote:
>>> >
>>> > > Thanks for your report.
>>> > >
>>> > > What's the meaning of 'all the values'? Please give me more details
>>> > > about your problem.
>>> > >
>>> > > I didn't look at 'dangling links & aggregators' part of PageRank
>>> > > example closely, but I think there's no bug. Aggregators is just used
>>> > > for global communication. For example, finding max value[1] can be
>>> > > done in only one iteration using MaxValueAggregator.
>>> > >
>>> > > 1.
>>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>>> > >
>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
>>> smcvbeelen@gmail.com
>>> > >
>>> > > wrote:
>>> > > > Hello,
>>> > > >
>>> > > > I'm creating my own pagerank in hama for a testing and I think
I
>>> found
>>> > a
>>> > > > problem with the AverageAggregator. I'm not sure if it is me or
>>> the the
>>> > > > AverageAggregator class in general, but I believe it just returns
>>> the
>>> > > mean
>>> > > > of all the values instead of the average difference between the
>>> old and
>>> > > new
>>> > > > value as intended.
>>> > > >
>>> > > > For testing, I created my own AbsDiffAggregator and
>>> AverageAggregator
>>> > > > classes, using FloatWritable instead of DoubleWritables. The same
>>> > problem
>>> > > > still occured: I got a mean of all the values in the graph instead
>>> of
>>> > an
>>> > > > average difference.
>>> > > >
>>> > > > Could someone tell me if I'm doing something wrong or what I should
>>> > > provide
>>> > > > to better explain my problem?
>>> > > >
>>> > > > Regards,
>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Best Regards, Edward J. Yoon
>>> > > @eddieyoon
>>> > >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message