spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From on <schueler_1...@web.de>
Subject Performance tuning for standalone on one host
Date Mon, 25 Jul 2016 17:21:46 GMT
Dear all,

I am running spark on one host ("local[2]") doing calculations like this
on a socket stream.
mainStream = socketStream.filter(lambda msg:
msg['header'].startswith('test')).map(lambda x: (x['host'], x) )
s1 = mainStream.updateStateByKey(updateFirst).map(lambda x: (1, x) )
s2 = mainStream.updateStateByKey(updateSecond,
initialRDD=initialMachineStates).map(lambda x: (2, x) )
out.join(bla2).foreachRDD(no_out)

I evaluated each calculations allone has a processing time about 400ms
but processing time of the code above is over 3 sec on average.

I know there are a lot of parameters unknown but does anybody has hints
how to tune this code / system? I already changed a lot of parameters,
such as #executors, #cores and so on.

Thanks in advance and best regards,
on

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message