spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James <alcaid1...@gmail.com>
Subject Re: Why a program would receive null from send message of mapReduceTriplets
Date Fri, 13 Feb 2015 13:35:07 GMT
I have a question:

*How could the attributes of triplets of a graph get update after
mapVertices() func? *

My code

```
// Initial the graph, assign a counter to each vertex that contains the
vertex id only
var anfGraph = graph.mapVertices { case (vid, _) =>
  val counter = new HyperLogLog(5)
  counter.offer(vid)
  counter
}

val nullVertex = anfGraph.triplets.filter(edge => edge.srcAttr ==
null).first

anfGraph.vertices.filter(_._1 == nullVertex).first
// I could see that the vertex has a not null attribute

// messages = anfGraph.aggregateMessages(msgFun, mergeMessage)   // <-
NullPointerException

```

I could found that some vertex attributes in some triplets are null, but
not all.


Alcaid


2015-02-13 14:50 GMT+08:00 Reynold Xin <rxin@databricks.com>:

> Then maybe you actually had a null in your vertex attribute?
>
>
> On Thu, Feb 12, 2015 at 10:47 PM, James <alcaid1801@gmail.com> wrote:
>
>> I changed the mapReduceTriplets() func to aggregateMessages(), but it
>> still failed.
>>
>>
>> 2015-02-13 6:52 GMT+08:00 Reynold Xin <rxin@databricks.com>:
>>
>>> Can you use the new aggregateNeighbors method? I suspect the null is
>>> coming from "automatic join elimination", which detects bytecode to see if
>>> you need the src or dst vertex data. Occasionally it can fail to detect. In
>>> the new aggregateNeighbors API, the caller needs to explicitly specifying
>>> that, making it more robust.
>>>
>>>
>>> On Thu, Feb 12, 2015 at 6:26 AM, James <alcaid1801@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> When I am running the code on a much bigger size graph, I met
>>>> NullPointerException.
>>>>
>>>> I found that is because the sendMessage() function receive a triplet
>>>> that
>>>> edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen
>>>> as I
>>>> am sure every vertices have a attr.
>>>>
>>>> Any returns is appreciated.
>>>>
>>>> Alcaid
>>>>
>>>>
>>>> 2015-02-11 19:30 GMT+08:00 James <alcaid1801@gmail.com>:
>>>>
>>>> > Hello,
>>>> >
>>>> > Recently  I am trying to estimate the average distance of a big graph
>>>> > using spark with the help of [HyperAnf](
>>>> > http://dl.acm.org/citation.cfm?id=1963493).
>>>> >
>>>> > It works like Connect Componenet algorithm, while the attribute of a
>>>> > vertex is a HyperLogLog counter that at k-th iteration it estimates
>>>> the
>>>> > number of vertices it could reaches less than k hops.
>>>> >
>>>> > I have successfully run the code on a graph with 20M vertices. But I
>>>> still
>>>> > need help:
>>>> >
>>>> >
>>>> > *I think the code could work more efficiently especially the "Send
>>>> > message" function, but I am not sure about what will happen if a
>>>> vertex
>>>> > receive no message at a iteration.*
>>>> >
>>>> > Here is my code: https://github.com/alcaid1801/Erdos
>>>> >
>>>> > Any returns is appreciated.
>>>> >
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message