spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduardo Costa Alfaia <e.costaalf...@unibs.it>
Subject Re: Spark's behavior
Date Sat, 03 May 2014 18:59:15 GMT
Hi TD,

I did a test with 8 workers and 1 word source, the time gap was 27 sec, how can see in the
log files(in attach).

Hi Andrew,
I configured the ntp, all machines are synchronized.
root@computer8:/opt/unibs_test/spark-1.0RC3# for num in {1,8,10,11,13,15,16,18,22}; do ssh
computer$num date; done
Sat May  3 20:57:41 CEST 2014
Sat May  3 20:57:41 CEST 2014
Sat May  3 20:57:41 CEST 2014
Sat May  3 20:57:42 CEST 2014
Sat May  3 20:57:42 CEST 2014
Sat May  3 20:57:42 CEST 2014
Sat May  3 20:57:42 CEST 2014
Sat May  3 20:57:42 CEST 2014
Sat May  3 20:57:42 CEST 2014




On May 3, 2014, at 15:46, Eduardo Costa Alfaia <e.costaalfaia@unibs.it> wrote:

> Hi TD, Thanks for reply
> This last experiment I did with one computer, like local, but I think that time gap grow
up when I add more computer. I will do again now with 8 worker and 1 word source and I will
see what’s go on. I will control the time too, like suggested by Andrew. 
> On May 3, 2014, at 1:19, Tathagata Das <tathagata.das1565@gmail.com> wrote:
> 
>> From the logs, I see that the print() starts printing stuff 10 seconds after the
context is started. And that 10 seconds is taken by the initial empty job (50 map + 20 reduce
tasks) that spark streaming starts to ensure all the executors have started. Somehow the first
empty task takes 7-8 seconds to complete. See if this can be reproduced by running a simple,
empty job in spark shell (in the same cluster) and see if the first task takes 7-8 seconds.

>> 
>> Either way, I didnt see the 30 second gap, but a 10 second gap. And that does not
seem to be a persistent problem as after that 10 seconds, the data is being received and processed.
>> 
>> TD
>> 
>> 
>> On Fri, May 2, 2014 at 2:14 PM, Eduardo Costa Alfaia <e.costaalfaia@unibs.it>
wrote:
>> Hi TD,
>> 
>> I got the another information today using Spark 1.0 RC3 and the situation remain
the same:
>> <PastedGraphic-1.png>
>> 
>> The lines begin after 17 sec:
>> 
>> 14/05/02 21:52:25 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140502215225-0005/0
on hostPort computer8.ant-net:57229 with 2 cores, 2.0 GB RAM
>> 14/05/02 21:52:25 INFO AppClient$ClientActor: Executor updated: app-20140502215225-0005/0
is now RUNNING
>> 14/05/02 21:52:25 INFO ReceiverTracker: ReceiverTracker started
>> 14/05/02 21:52:26 INFO ForEachDStream: metadataCleanupDelay = -1
>> 14/05/02 21:52:26 INFO SocketInputDStream: metadataCleanupDelay = -1
>> 14/05/02 21:52:26 INFO SocketInputDStream: Slide time = 1000 ms
>> 14/05/02 21:52:26 INFO SocketInputDStream: Storage level = StorageLevel(false, false,
false, false, 1)
>> 14/05/02 21:52:26 INFO SocketInputDStream: Checkpoint interval = null
>> 14/05/02 21:52:26 INFO SocketInputDStream: Remember duration = 1000 ms
>> 14/05/02 21:52:26 INFO SocketInputDStream: Initialized and validated org.apache.spark.streaming.dstream.SocketInputDStream@5433868e
>> 14/05/02 21:52:26 INFO ForEachDStream: Slide time = 1000 ms
>> 14/05/02 21:52:26 INFO ForEachDStream: Storage level = StorageLevel(false, false,
false, false, 1)
>> 14/05/02 21:52:26 INFO ForEachDStream: Checkpoint interval = null
>> 14/05/02 21:52:26 INFO ForEachDStream: Remember duration = 1000 ms
>> 14/05/02 21:52:26 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@1ebdbc05
>> 14/05/02 21:52:26 INFO SparkContext: Starting job: collect at ReceiverTracker.scala:270
>> 14/05/02 21:52:26 INFO RecurringTimer: Started timer for JobGenerator at time 1399060346000
>> 14/05/02 21:52:26 INFO JobGenerator: Started JobGenerator at 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobScheduler: Started JobScheduler
>> 14/05/02 21:52:26 INFO DAGScheduler: Registering RDD 3 (reduceByKey at ReceiverTracker.scala:270)
>> 14/05/02 21:52:26 INFO ReceiverTracker: Stream 0 received 0 blocks
>> 14/05/02 21:52:26 INFO DAGScheduler: Got job 0 (collect at ReceiverTracker.scala:270)
with 20 output partitions (allowLocal=false)
>> 14/05/02 21:52:26 INFO DAGScheduler: Final stage: Stage 0(collect at ReceiverTracker.scala:270)
>> 14/05/02 21:52:26 INFO DAGScheduler: Parents of final stage: List(Stage 1)
>> 14/05/02 21:52:26 INFO JobScheduler: Added jobs for time 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobScheduler: Starting job streaming job 1399060346000 ms.0
from job set of time 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobGenerator: Checkpointing graph for time 1399060346000 ms
>> -------------------------------------------14/05/02 21:52:26 INFO DStreamGraph: Updating
checkpoint data for time 1399060346000 ms
>> 
>> Time: 1399060346000 ms
>> -------------------------------------------
>> 
>> 14/05/02 21:52:26 INFO JobScheduler: Finished job streaming job 1399060346000 ms.0
from job set of time 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobScheduler: Total delay: 0.325 s for time 1399060346000
ms (execution: 0.024 s)
>> 
>> 
>> 
>> 14/05/02 21:52:42 INFO JobScheduler: Added jobs for time 1399060362000 ms
>> 14/05/02 21:52:42 INFO JobGenerator: Checkpointing graph for time 1399060362000 ms
>> 14/05/02 21:52:42 INFO DStreamGraph: Updating checkpoint data for time 1399060362000
ms
>> 14/05/02 21:52:42 INFO DStreamGraph: Updated checkpoint data for time 1399060362000
ms
>> 14/05/02 21:52:42 INFO JobScheduler: Starting job streaming job 1399060362000 ms.0
from job set of time 1399060362000 ms
>> 14/05/02 21:52:42 INFO SparkContext: Starting job: take at DStream.scala:593
>> 14/05/02 21:52:42 INFO DAGScheduler: Got job 2 (take at DStream.scala:593) with 1
output partitions (allowLocal=true)
>> 14/05/02 21:52:42 INFO DAGScheduler: Final stage: Stage 3(take at DStream.scala:593)
>> 14/05/02 21:52:42 INFO DAGScheduler: Parents of final stage: List()
>> 14/05/02 21:52:42 INFO DAGScheduler: Missing parents: List()
>> 14/05/02 21:52:42 INFO DAGScheduler: Computing the requested partition locally
>> 14/05/02 21:52:42 INFO BlockManager: Found block input-0-1399060360400 locally
>> 14/05/02 21:52:42 INFO CheckpointWriter: Checkpoint for time 1399060361000 ms saved
to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1399060361000', took 2457 
>> bytes and 507 ms
>> 14/05/02 21:52:42 INFO CheckpointWriter: Saving checkpoint for time 1399060362000
ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1399060362000'
>> 14/05/02 21:52:42 INFO DStreamGraph: Clearing checkpoint data for time 1399060361000
ms
>> 14/05/02 21:52:42 INFO DStreamGraph: Cleared checkpoint data for time 1399060361000
ms
>> 14/05/02 21:52:42 INFO BlockManagerInfo: Added input-0-1399060360800 in memory on
computer8.ant-net:50052 (size: 238.8 KB, free: 1177.0 MB)
>> 14/05/02 21:52:42 INFO SparkContext: Job finished: take at DStream.scala:593, took
0.107033025 s
>> -------------------------------------------
>> Time: 1399060362000 ms
>> -------------------------------------------
>> The Project Gutenberg EBook of Don Quixote by Miguel de Cervantes This eBook is
>> for the use of anyone anywhere at no cost and with almost no restrictions
>> whatsoever You may copy it give it away or re use it under the terms of the
>> Project Gutenberg License included with this eBook or online at www gutenberg
>> net Title Don Quixote Author Miguel de Cervantes Saavedra Release Date July 27
>> 2004 EBook 996 Language English START OF THIS PROJECT GUTENBERG EBOOK DON
>> QUIXOTE Produced by David Widger DON QUIXOTE Complete by Miguel de Cervantes
>> Saavedra Translated by John Ormsby CONTENTS Volume I CHAPTER I WHICH TREATS OF
>> THE CHARACTER AND PURSUITS OF THE FAMOUS GENTLEMAN DON QUIXOTE OF LA MANCHA
>> CHAPTER II WHICH TREATS OF THE FIRST SALLY THE INGENIOUS DON QUIXOTE MADE FROM
>> ...
>> 
>> 14/05/02 21:52:42 INFO JobScheduler: Finished job streaming job 1399060362000 ms.0
from job set of time 1399060362000 ms
>> 
>> 
>> 
>> On Apr 30, 2014, at 0:56, Tathagata Das <tathagata.das1565@gmail.com> wrote:
>> 
>>> Strange! Can you just do lines.print() to print the raw data instead of doing
word count. Beyond that we can do two things. 
>>> 
>>> 1. Can see the Spark stage UI to see whether there are stages running during
the 30 second period you referred to?
>>> 2. If you upgrade to using Spark master branch (or Spark 1.0 RC3, see different
thread by Patrick), it has a streaming UI, which shows the number of records received, the
state of the receiver, etc. That may be more useful in debugging whats going on .
>>> 
>>> TD 
>>> 
>>> 
>>> On Tue, Apr 29, 2014 at 3:31 PM, Eduardo Costa Alfaia <e.costaalfaia@unibs.it>
wrote:
>>> Hi TD,
>>> We are not using stream context with master local, we have 1 Master and 8 Workers
and 1 word source. The command line that we are using is:
>>> bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount spark://192.168.0.13:7077
>>>      
>>> On Apr 30, 2014, at 0:09, Tathagata Das <tathagata.das1565@gmail.com> wrote:
>>> 
>>>> Is you batch size 30 seconds by any chance? 
>>>> 
>>>> Assuming not, please check whether you are creating the streaming context
with master "local[n]" where n > 2. With "local" or "local[1]", the system only has one
processing slot, which is occupied by the receiver leaving no room for processing the received
data. It could be that after 30 seconds, the server disconnects, the receiver terminates,
releasing the single slot for the processing to proceed. 
>>>> 
>>>> TD
>>>> 
>>>> 
>>>> On Tue, Apr 29, 2014 at 2:28 PM, Eduardo Costa Alfaia <e.costaalfaia@unibs.it>
wrote:
>>>> Hi TD,
>>>> 
>>>> In my tests with spark streaming, I'm using JavaNetworkWordCount(modified)
code and a program that I wrote that sends words to the Spark worker, I use TCP as transport.
I verified that after starting Spark, it connects to my source which actually starts sending,
but the first word count is advertised approximately 30 seconds after the context creation.
So I'm wondering where is stored the 30 seconds data already sent by the source. Is this a
normal spark’s behaviour? I saw the same behaviour using the shipped JavaNetworkWordCount
application.
>>>> 
>>>> Many thanks.
>>>> --
>>>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>>>> 
>>> 
>>> 
>>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>>> 
>> 
>> 
>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>> 
> 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

Mime
View raw message