spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: inconsistent edge counts in GraphX
Date Tue, 18 Nov 2014 08:57:45 GMT
At 2014-11-11 01:51:43 +0000, "Buttler, David" <buttler1@llnl.gov> wrote:
> I am building a graph from a large CSV file.  Each record contains a couple of nodes
and about 10 edges.  When I try to load a large portion of the graph, using multiple partitions,
I get inconsistent results in the number of edges between different runs.  However, if I use
a single partition, or a small portion of the CSV file (say 1000 rows), then I get a consistent
number of edges.  Is there anything I should be aware of as to why this could be happening
in GraphX?

Is it possible there's some nondeterminism in the way you're reading the file? It would be
helpful if you could post the code you're using to load the graph.

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message