spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: Query on Merge Message (Graph: pregel operator)
Date Thu, 19 Jun 2014 23:52:01 GMT
Many merge operations can be broken up to work incrementally. For example,
if the merge operation is to sum *n* rank updates, then you can set mergeMsg
= (a, b) => a + b and this function will be applied to all *n* updates in
arbitrary order to yield a final sum. Addition, multiplication, min, max,
and mean are operations that work in this manner (they are associative and
commutative).

If you absolutely must operate on all *n* messages at once, for example to
find the median, then a workaround is to emit Array(m) instead of m in the
sendMsg function, and then to set mergeMsg = (a, b) => a ++ b. This will
accumulate all inbound messages into an array which you can access in vprog.
However, it will be much slower for graphs with high-degree vertices,
because the accumulated arrays can grow very large.

Ankur <http://www.ankurdave.com/>

Mime
View raw message