spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From on <schueler_1...@web.de>
Subject Re: Performance tuning for local mode on one host
Date Tue, 26 Jul 2016 05:35:00 GMT

There are 4 cores on my system.

Running spark with setMaster("local[2]") results in:
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND                                                                                  
                     

7 root      20   0 4748836 563400  29064 S  24.6  7.0   1:16.54
/usr/jdk1.8.0_101/bin/java -cp
/conf/:/usr/spark-2.0.0-preview-bin-hadoop2.6/jars/* -Xmx1g
org.apache.spark.de+
114 root      20   0  114208  31956   7028 S  15.7  0.4   0:16.35 python
-m
pyspark.daemon                                                                           
           

117 root      20   0  114404  32116   7028 S  15.7  0.4   0:17.28 python
-m
pyspark.daemon                                                                           
           

41 root      20   0  443548  60920  10416 S   0.0  0.8   0:10.84 python
/test.py                                                                                 
              

111 root      20   0  101272  31740   9356 S   0.0  0.4   0:00.29 python
-m pyspark.daemon 

with a processing time over 3 seconds running the code below. There must
be a lot of overhead somewhere as the code some nearly nothing, i.e., no
expensive calculations on a socket stream getting one message per second.

How to reduce this overhead?


On 25.07.2016 20:19, on wrote:
> OK, sorry, I am running in local mode.
> Just a very small setup...
>
> (changed the subject)
>
> On 25.07.2016 20:01, Mich Talebzadeh wrote:
>> Hi,
>>
>> From your reference I can see that you are running in local mode with
>> two cores. But that is not standalone.
>>
>> Can you please clarify whether you start master and slaves processes.
>> Those are for standalone mode.
>>
>> sbin/start-master.sh
>> sbin/start-slaves.sh
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>  
>>
>> LinkedIn
>> / https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
>>
>>  
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk.Any and all responsibility for
>> any loss, damage or destruction of data or any other property which
>> may arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary
>> damages arising from such loss, damage or destruction.
>>
>>  
>>
>>
>> On 25 July 2016 at 18:21, on <schueler_1234@web.de
>> <mailto:schueler_1234@web.de>> wrote:
>>
>>     Dear all,
>>
>>     I am running spark on one host ("local[2]") doing calculations
>>     like this
>>     on a socket stream.
>>     mainStream = socketStream.filter(lambda msg:
>>     msg['header'].startswith('test')).map(lambda x: (x['host'], x) )
>>     s1 = mainStream.updateStateByKey(updateFirst).map(lambda x: (1, x) )
>>     s2 = mainStream.updateStateByKey(updateSecond,
>>     initialRDD=initialMachineStates).map(lambda x: (2, x) )
>>     out.join(bla2).foreachRDD(no_out)
>>
>>     I evaluated each calculations allone has a processing time about 400ms
>>     but processing time of the code above is over 3 sec on average.
>>
>>     I know there are a lot of parameters unknown but does anybody has
>>     hints
>>     how to tune this code / system? I already changed a lot of parameters,
>>     such as #executors, #cores and so on.
>>
>>     Thanks in advance and best regards,
>>     on
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>     <mailto:user-unsubscribe@spark.apache.org>
>>
>>


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message