spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Re: Performance tuning for standalone on one host
Date Mon, 25 Jul 2016 18:01:14 GMT

>From your reference I can see that you are running in local mode with two
cores. But that is not standalone.

Can you please clarify whether you start master and slaves processes. Those
are for standalone mode.



Dr Mich Talebzadeh

LinkedIn *

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 25 July 2016 at 18:21, on <> wrote:

> Dear all,
> I am running spark on one host ("local[2]") doing calculations like this
> on a socket stream.
> mainStream = socketStream.filter(lambda msg:
> msg['header'].startswith('test')).map(lambda x: (x['host'], x) )
> s1 = mainStream.updateStateByKey(updateFirst).map(lambda x: (1, x) )
> s2 = mainStream.updateStateByKey(updateSecond,
> initialRDD=initialMachineStates).map(lambda x: (2, x) )
> out.join(bla2).foreachRDD(no_out)
> I evaluated each calculations allone has a processing time about 400ms
> but processing time of the code above is over 3 sec on average.
> I know there are a lot of parameters unknown but does anybody has hints
> how to tune this code / system? I already changed a lot of parameters,
> such as #executors, #cores and so on.
> Thanks in advance and best regards,
> on
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message