hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Possible Aggregator Problem
Date Wed, 24 Apr 2013 07:53:16 GMT
Thanks for your report. It could be a bug. I'll have a look at it now.

On Wed, Apr 24, 2013 at 4:48 PM, Steven van Beelen <smcvbeelen@gmail.com> wrote:
> I'm running version 0.6.1.
> Looking at the results I found through testing,
>
>   public void aggregateVertex(M lastValue, Vertex<V, E, M> v)
>
> doesn't seem to be the problem. Both 'aggregate(v, v.getValue())' and
> 'aggregate(v, lastValue, v.getValue())'
> are called correctly and work on the same values.
>
> However, when finalizing through 'finalizeAggregation()' in the
> 'public void doMasterAggregation(MapWritable updatedCnt)' method,
>
> the value aggregated upon by 'aggregate(v, lastValue, v.getValue())'
> is lost. That is what happens at me.
>
> Could it be that I'm implementing the aggregate methods incorrect?
>
> In the end however, I can not find a direct bug in TRUNK[1], although
> it is not clear to me what/which part of the code was changed through
> the ticket on JIRA.
>
>
>
>
> On Wed, Apr 24, 2013 at 2:41 AM, Edward J. Yoon <edwardyoon@apache.org>wrote:
>
>> I found the ticket on JIRA -
>> https://issues.apache.org/jira/browse/HAMA-659
>>
>> And it seems already fixed.
>>
>> What is your version of hama here? and can you find some bug in TRUNK[1]?
>>
>> 1.
>> http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java
>>
>> On Tue, Apr 23, 2013 at 9:41 PM, Steven van Beelen <smcvbeelen@gmail.com>
>> wrote:
>> > Could anyone tell me if I'm correct concerning the possible problem I
>> > posted and replied on in the previous two emails?
>> >
>> >
>> > On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <smcvbeelen@gmail.com
>> >wrote:
>> >
>> >> Additionally, I found this in the mail archives:
>> >>
>> >>
>> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
>> >> This actually exactly covers my point. Is this still considered as a
>> bug,
>> >> calling two different aggregate functions in a row?
>> >>
>> >>
>> >> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <
>> smcvbeelen@gmail.com>wrote:
>> >>
>> >>> Hi Thomas,
>> >>>
>> >>> Then I guess I did not explain myself clearly.
>> >>> What you describe is indeed how I think of the AverageAggregator to
>> work,
>> >>> but if I use the AverageAggregator in my own PageRank implementation
it
>> >>> does not return
>> >>> the average of all absolute differences but just the average of the
sum
>> >>> of all values.
>> >>>
>> >>> The (very) small example graph I use has only five vertices, were the
>> sum
>> >>> of every vertice it's value is always 1.0.
>> >>> When I use the AverageAggregator it will always return 0.2 when calling
>> >>> the getLastAggregatedValue method.
>> >>> It shouldn't do that right?
>> >>>
>> >>>
>> >>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
>> >>> thomas.jungblut@gmail.com> wrote:
>> >>>
>> >>>> Hi Steven,
>> >>>>
>> >>>> the AverageAggregator is used to determine the average of all absolute
>> >>>> differences between old pagerank and new pagerank for every vertex.
>> >>>> This is documented like it should behave in the javadoc of the given
>> >>>> classes and suffices to track if pagerank values have yet converged
or
>> >>>> not.
>> >>>>
>> >>>> What you describe is a perfectly valid way to track the pagerank
>> >>>> difference
>> >>>> throughout all supersteps. But this is not how (imho) the
>> >>>> AverageAggregator
>> >>>> should behave, so you have to write your own.
>> >>>>
>> >>>>
>> >>>> 2013/4/17 Steven van Beelen <smcvbeelen@gmail.com>
>> >>>>
>> >>>> > The values in my case are the DoubleWritable values each vertice
has
>> >>>> and
>> >>>> > the aggregators aggregate on.
>> >>>> > My tests showed that, when the aggregator was set to
>> >>>> AverageAggregator, the
>> >>>> > average of all the vertice values from the past compute step
were
>> >>>> returned.
>> >>>> > Actually, AverageAggregator should return the average difference
of
>> >>>> all the
>> >>>> > old-new value pairs of every vertice instead of the mean.
>> >>>> > The average difference is then used to check whether convergence
is
>> >>>> > reached, which is relevant for all task ofcourse.
>> >>>> >
>> >>>> > Hence, the convergence point, for which the Aggregator is used,
will
>> >>>> not be
>> >>>> > reached.
>> >>>> > This thus makes it so that the algorithm will just run the
maximum
>> >>>> number
>> >>>> > of iterations set (30 iterations on the PageRank example) in
every
>> >>>> case.
>> >>>> > I experienced the same with my own PageRank implementation.
>> >>>> >
>> >>>> > I think it has something to do with the finalizeAggregation
step
>> taken.
>> >>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)'
and
>> >>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods
are
>> called
>> >>>> every
>> >>>> > time, were one would think only the second (with old/new values)
>> would
>> >>>> > suffice.
>> >>>> > Because of this, the global variable 'absoluteDifference' in
the
>> >>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first
>> >>>> aggregate.
>> >>>> > Additionally, if one would make its own Aggregation class in
the
>> same
>> >>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave
out
>> the
>> >>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to
be
>> 0.0000
>> >>>> > every time.
>> >>>> >
>> >>>> > I hope I made myself clear.
>> >>>> > Regards
>> >>>> >
>> >>>> >
>> >>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon <
>> >>>> edwardyoon@apache.org
>> >>>> > >wrote:
>> >>>> >
>> >>>> > > Thanks for your report.
>> >>>> > >
>> >>>> > > What's the meaning of 'all the values'? Please give me
more
>> details
>> >>>> > > about your problem.
>> >>>> > >
>> >>>> > > I didn't look at 'dangling links & aggregators' part
of PageRank
>> >>>> > > example closely, but I think there's no bug. Aggregators
is just
>> used
>> >>>> > > for global communication. For example, finding max value[1]
can be
>> >>>> > > done in only one iteration using MaxValueAggregator.
>> >>>> > >
>> >>>> > > 1.
>> >>>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>> >>>> > >
>> >>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen <
>> >>>> smcvbeelen@gmail.com
>> >>>> > >
>> >>>> > > wrote:
>> >>>> > > > Hello,
>> >>>> > > >
>> >>>> > > > I'm creating my own pagerank in hama for a testing
and I think I
>> >>>> found
>> >>>> > a
>> >>>> > > > problem with the AverageAggregator. I'm not sure
if it is me or
>> >>>> the the
>> >>>> > > > AverageAggregator class in general, but I believe
it just
>> returns
>> >>>> the
>> >>>> > > mean
>> >>>> > > > of all the values instead of the average difference
between the
>> >>>> old and
>> >>>> > > new
>> >>>> > > > value as intended.
>> >>>> > > >
>> >>>> > > > For testing, I created my own AbsDiffAggregator and
>> >>>> AverageAggregator
>> >>>> > > > classes, using FloatWritable instead of DoubleWritables.
The
>> same
>> >>>> > problem
>> >>>> > > > still occured: I got a mean of all the values in
the graph
>> instead
>> >>>> of
>> >>>> > an
>> >>>> > > > average difference.
>> >>>> > > >
>> >>>> > > > Could someone tell me if I'm doing something wrong
or what I
>> should
>> >>>> > > provide
>> >>>> > > > to better explain my problem?
>> >>>> > > >
>> >>>> > > > Regards,
>> >>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam
>> >>>> > >
>> >>>> > >
>> >>>> > >
>> >>>> > > --
>> >>>> > > Best Regards, Edward J. Yoon
>> >>>> > > @eddieyoon
>> >>>> > >
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message