spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ankurdave <ankurd...@gmail.com>
Subject Re: Benchmarking Graphx
Date Tue, 20 May 2014 01:53:20 GMT
On May 17, 2014 at 2:59pm, Hari wrote:
&gt; a) Is there a way to get the total time taken for the execution from
start to finish?
Assuming you're running the benchmark as a standalone program, such as by
invoking the  Analytics driver
<https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/Analytics.scala>

, you could wrap the driver invocation using time:
/usr/bin/time -p ./bin/spark-submit ...
If you're using spark-shell, you could use System.currentTimeMillis.
&gt; b) log4j properties need to be modified to turn off logging, but its
not clear how to. 
Create  conf/log4j.properties
<http://spark.apache.org/docs/0.9.1/configuration.html#configuring-logging>  
by copying conf/log4j.properties.template and changing the first line to
log4j.rootCategory=WARN, console
&gt; c) how can this be extended to a cluster?
It should work just to invoke the driver on the cluster using spark-submit.
If you aren't using the Analytics driver, make sure to set the same  Spark
properties
<http://spark.apache.org/docs/0.9.1/configuration.html#spark-properties>  
as it does (spark.serializer, spark.kryo.registrator, and
spark.locality.wait).
&gt; d) also how to quantify memory overhead if i added more functionality
to the execution?
You can see how much memory each cached RDD is taking up by looking at the 
web UI <http://spark.apache.org/docs/0.9.1/monitoring.html#web-interfaces> 
.
&gt; e) any scripts? reports generated?
We don't have well-supported benchmark scripts for GraphX yet. Dan Crankshaw
has some personal-use  scripts <https://github.com/dcrankshaw/graphx-utils>  
for setting up GraphX and competing graph systems on a cluster and running
some benchmarks. You could look at those for some ideas.
There are benchmarks from earlier this year in the GraphX  arXiv paper
<http://arxiv.org/abs/1402.2394>  . These are on the  soc-LiveJournal
<http://snap.stanford.edu/data/soc-LiveJournal1.html>   and  twitter-2010
<http://law.di.unimi.it/webdata/twitter-2010/>   datasets.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Benchmarking-Graphx-tp5965p6061.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Mime
View raw message