hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Possible Aggregator Problem
Date Wed, 24 Apr 2013 12:23:31 GMT
Since the 'aggregator' is being used for counting the number of
updated vertices as well, I think there's no bug.

Can you provide your scenario as a unit test?

On Wed, Apr 24, 2013 at 7:47 PM, Steven van Beelen <smcvbeelen@gmail.com> wrote:
> I'm sorry to say so, but the problem still arises. Additionally I found
> that 'aggregate(v, v.getValue())'
> is called twice as often as 'aggregate(v, lastValue, v.getValue())'.
> I can not seem to find in the AggregationRunner or GraphJobRunner why this
> is so.
> But, in a case were five vertices exists, aggregate(v, v.getValue()) will
> be called five times, directly followed by the finalizeAggregation() call.
> But proceeding this, five pairs of aggregate(v, v.getValue()) and 'aggregate(v,
> lastValue, v.getValue())' are called as logically follows from the
> public void aggregateVertex(M lastValue, Vertex<V, E, M> v) in the
> AggregationRunner class.
>
> Additionally to this I could give you my code, maybe some flaw in there
> causes this problem?
>
>
> On Wed, Apr 24, 2013 at 10:43 AM, Edward J. Yoon <edwardyoon@apache.org>wrote:
>
>> Steven,
>>
>> Could you please try your application again with
>> http://people.apache.org/~edwardyoon/dist/test/ and feedback me
>> whether it works correctly as you expected?
>>
>> On Wed, Apr 24, 2013 at 4:53 PM, Edward J. Yoon <edwardyoon@apache.org>
>> wrote:
>> > Thanks for your report. It could be a bug. I'll have a look at it now.
>> >
>> > On Wed, Apr 24, 2013 at 4:48 PM, Steven van Beelen <smcvbeelen@gmail.com>
>> wrote:
>> >> I'm running version 0.6.1.
>> >> Looking at the results I found through testing,
>> >>
>> >>   public void aggregateVertex(M lastValue, Vertex<V, E, M> v)
>> >>
>> >> doesn't seem to be the problem. Both 'aggregate(v, v.getValue())' and
>> >> 'aggregate(v, lastValue, v.getValue())'
>> >> are called correctly and work on the same values.
>> >>
>> >> However, when finalizing through 'finalizeAggregation()' in the
>> >> 'public void doMasterAggregation(MapWritable updatedCnt)' method,
>> >>
>> >> the value aggregated upon by 'aggregate(v, lastValue, v.getValue())'
>> >> is lost. That is what happens at me.
>> >>
>> >> Could it be that I'm implementing the aggregate methods incorrect?
>> >>
>> >> In the end however, I can not find a direct bug in TRUNK[1], although
>> >> it is not clear to me what/which part of the code was changed through
>> >> the ticket on JIRA.
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Apr 24, 2013 at 2:41 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>> >>
>> >>> I found the ticket on JIRA -
>> >>> https://issues.apache.org/jira/browse/HAMA-659
>> >>>
>> >>> And it seems already fixed.
>> >>>
>> >>> What is your version of hama here? and can you find some bug in
>> TRUNK[1]?
>> >>>
>> >>> 1.
>> >>>
>> http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java
>> >>>
>> >>> On Tue, Apr 23, 2013 at 9:41 PM, Steven van Beelen <
>> smcvbeelen@gmail.com>
>> >>> wrote:
>> >>> > Could anyone tell me if I'm correct concerning the possible problem
I
>> >>> > posted and replied on in the previous two emails?
>> >>> >
>> >>> >
>> >>> > On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen <
>> smcvbeelen@gmail.com
>> >>> >wrote:
>> >>> >
>> >>> >> Additionally, I found this in the mail archives:
>> >>> >>
>> >>> >>
>> >>>
>> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xSa22-wNqQRKapadZD+QBag@mail.gmail.com%3E
>> >>> >> This actually exactly covers my point. Is this still considered
as a
>> >>> bug,
>> >>> >> calling two different aggregate functions in a row?
>> >>> >>
>> >>> >>
>> >>> >> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen <
>> >>> smcvbeelen@gmail.com>wrote:
>> >>> >>
>> >>> >>> Hi Thomas,
>> >>> >>>
>> >>> >>> Then I guess I did not explain myself clearly.
>> >>> >>> What you describe is indeed how I think of the AverageAggregator
to
>> >>> work,
>> >>> >>> but if I use the AverageAggregator in my own PageRank
>> implementation it
>> >>> >>> does not return
>> >>> >>> the average of all absolute differences but just the average
of
>> the sum
>> >>> >>> of all values.
>> >>> >>>
>> >>> >>> The (very) small example graph I use has only five vertices,
were
>> the
>> >>> sum
>> >>> >>> of every vertice it's value is always 1.0.
>> >>> >>> When I use the AverageAggregator it will always return
0.2 when
>> calling
>> >>> >>> the getLastAggregatedValue method.
>> >>> >>> It shouldn't do that right?
>> >>> >>>
>> >>> >>>
>> >>> >>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut <
>> >>> >>> thomas.jungblut@gmail.com> wrote:
>> >>> >>>
>> >>> >>>> Hi Steven,
>> >>> >>>>
>> >>> >>>> the AverageAggregator is used to determine the average
of all
>> absolute
>> >>> >>>> differences between old pagerank and new pagerank for
every
>> vertex.
>> >>> >>>> This is documented like it should behave in the javadoc
of the
>> given
>> >>> >>>> classes and suffices to track if pagerank values have
yet
>> converged or
>> >>> >>>> not.
>> >>> >>>>
>> >>> >>>> What you describe is a perfectly valid way to track
the pagerank
>> >>> >>>> difference
>> >>> >>>> throughout all supersteps. But this is not how (imho)
the
>> >>> >>>> AverageAggregator
>> >>> >>>> should behave, so you have to write your own.
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> 2013/4/17 Steven van Beelen <smcvbeelen@gmail.com>
>> >>> >>>>
>> >>> >>>> > The values in my case are the DoubleWritable values
each
>> vertice has
>> >>> >>>> and
>> >>> >>>> > the aggregators aggregate on.
>> >>> >>>> > My tests showed that, when the aggregator was
set to
>> >>> >>>> AverageAggregator, the
>> >>> >>>> > average of all the vertice values from the past
compute step
>> were
>> >>> >>>> returned.
>> >>> >>>> > Actually, AverageAggregator should return the
average
>> difference of
>> >>> >>>> all the
>> >>> >>>> > old-new value pairs of every vertice instead of
the mean.
>> >>> >>>> > The average difference is then used to check whether
>> convergence is
>> >>> >>>> > reached, which is relevant for all task ofcourse.
>> >>> >>>> >
>> >>> >>>> > Hence, the convergence point, for which the Aggregator
is used,
>> will
>> >>> >>>> not be
>> >>> >>>> > reached.
>> >>> >>>> > This thus makes it so that the algorithm will
just run the
>> maximum
>> >>> >>>> number
>> >>> >>>> > of iterations set (30 iterations on the PageRank
example) in
>> every
>> >>> >>>> case.
>> >>> >>>> > I experienced the same with my own PageRank implementation.
>> >>> >>>> >
>> >>> >>>> > I think it has something to do with the finalizeAggregation
step
>> >>> taken.
>> >>> >>>> > Next to that, both the 'aggregate(VERTEX vertex,
M value)' and
>> >>> >>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)'
methods are
>> >>> called
>> >>> >>>> every
>> >>> >>>> > time, were one would think only the second (with
old/new values)
>> >>> would
>> >>> >>>> > suffice.
>> >>> >>>> > Because of this, the global variable 'absoluteDifference'
in the
>> >>> >>>> > 'AbsDiffAggregator' class is overwriten/overruled
by the first
>> >>> >>>> aggregate.
>> >>> >>>> > Additionally, if one would make its own Aggregation
class in the
>> >>> same
>> >>> >>>> > fashion as AbsDiffAggregator and AverageAggregator,
but leave
>> out
>> >>> the
>> >>> >>>> > 'aggregate(VERTEX vertex, M value)', my output
turned out to be
>> >>> 0.0000
>> >>> >>>> > every time.
>> >>> >>>> >
>> >>> >>>> > I hope I made myself clear.
>> >>> >>>> > Regards
>> >>> >>>> >
>> >>> >>>> >
>> >>> >>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon
<
>> >>> >>>> edwardyoon@apache.org
>> >>> >>>> > >wrote:
>> >>> >>>> >
>> >>> >>>> > > Thanks for your report.
>> >>> >>>> > >
>> >>> >>>> > > What's the meaning of 'all the values'? Please
give me more
>> >>> details
>> >>> >>>> > > about your problem.
>> >>> >>>> > >
>> >>> >>>> > > I didn't look at 'dangling links & aggregators'
part of
>> PageRank
>> >>> >>>> > > example closely, but I think there's no bug.
Aggregators is
>> just
>> >>> used
>> >>> >>>> > > for global communication. For example, finding
max value[1]
>> can be
>> >>> >>>> > > done in only one iteration using MaxValueAggregator.
>> >>> >>>> > >
>> >>> >>>> > > 1.
>> >>> >>>>
>> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png
>> >>> >>>> > >
>> >>> >>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van
Beelen <
>> >>> >>>> smcvbeelen@gmail.com
>> >>> >>>> > >
>> >>> >>>> > > wrote:
>> >>> >>>> > > > Hello,
>> >>> >>>> > > >
>> >>> >>>> > > > I'm creating my own pagerank in hama
for a testing and I
>> think I
>> >>> >>>> found
>> >>> >>>> > a
>> >>> >>>> > > > problem with the AverageAggregator.
I'm not sure if it is
>> me or
>> >>> >>>> the the
>> >>> >>>> > > > AverageAggregator class in general,
but I believe it just
>> >>> returns
>> >>> >>>> the
>> >>> >>>> > > mean
>> >>> >>>> > > > of all the values instead of the average
difference between
>> the
>> >>> >>>> old and
>> >>> >>>> > > new
>> >>> >>>> > > > value as intended.
>> >>> >>>> > > >
>> >>> >>>> > > > For testing, I created my own AbsDiffAggregator
and
>> >>> >>>> AverageAggregator
>> >>> >>>> > > > classes, using FloatWritable instead
of DoubleWritables. The
>> >>> same
>> >>> >>>> > problem
>> >>> >>>> > > > still occured: I got a mean of all the
values in the graph
>> >>> instead
>> >>> >>>> of
>> >>> >>>> > an
>> >>> >>>> > > > average difference.
>> >>> >>>> > > >
>> >>> >>>> > > > Could someone tell me if I'm doing something
wrong or what I
>> >>> should
>> >>> >>>> > > provide
>> >>> >>>> > > > to better explain my problem?
>> >>> >>>> > > >
>> >>> >>>> > > > Regards,
>> >>> >>>> > > > Steven van Beelen, Vrije Universiteit
of Amsterdam
>> >>> >>>> > >
>> >>> >>>> > >
>> >>> >>>> > >
>> >>> >>>> > > --
>> >>> >>>> > > Best Regards, Edward J. Yoon
>> >>> >>>> > > @eddieyoon
>> >>> >>>> > >
>> >>> >>>> >
>> >>> >>>>
>> >>> >>>
>> >>> >>>
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>> @eddieyoon
>> >>>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



--
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message