spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Spark's behavior
Date Fri, 02 May 2014 23:26:09 GMT
Also make sure that the clocks on all your servers are synchronized.  Run
"date" on each machine, and if they're not really really close then you
should set up NTP to keep the clocks in sync.


On Fri, May 2, 2014 at 4:19 PM, Tathagata Das
<tathagata.das1565@gmail.com>wrote:

> From the logs, I see that the print() starts printing stuff 10 seconds
> after the context is started. And that 10 seconds is taken by the initial
> empty job (50 map + 20 reduce tasks) that spark streaming starts to ensure
> all the executors have started. Somehow the first empty task takes 7-8
> seconds to complete. See if this can be reproduced by running a simple,
> empty job in spark shell (in the same cluster) and see if the first task
> takes 7-8 seconds.
>
> Either way, I didnt see the 30 second gap, but a 10 second gap. And that
> does not seem to be a persistent problem as after that 10 seconds, the data
> is being received and processed.
>
> TD
>
>
> On Fri, May 2, 2014 at 2:14 PM, Eduardo Costa Alfaia <
> e.costaalfaia@unibs.it> wrote:
>
>> Hi TD,
>>
>> I got the another information today using Spark 1.0 RC3 and the situation
>> remain the same:
>>
>> The lines begin after 17 sec:
>>
>> 14/05/02 21:52:25 INFO SparkDeploySchedulerBackend: Granted executor ID
>> app-20140502215225-0005/0 on hostPort computer8.ant-net:57229 with 2 cores,
>> 2.0 GB RAM
>> 14/05/02 21:52:25 INFO AppClient$ClientActor: Executor updated:
>> app-20140502215225-0005/0 is now RUNNING
>> 14/05/02 21:52:25 INFO ReceiverTracker: ReceiverTracker started
>> 14/05/02 21:52:26 INFO ForEachDStream: metadataCleanupDelay = -1
>> 14/05/02 21:52:26 INFO SocketInputDStream: metadataCleanupDelay = -1
>> 14/05/02 21:52:26 INFO SocketInputDStream: Slide time = 1000 ms
>> 14/05/02 21:52:26 INFO SocketInputDStream: Storage level =
>> StorageLevel(false, false, false, false, 1)
>> 14/05/02 21:52:26 INFO SocketInputDStream: Checkpoint interval = null
>> 14/05/02 21:52:26 INFO SocketInputDStream: Remember duration = 1000 ms
>> 14/05/02 21:52:26 INFO SocketInputDStream: Initialized and validated
>> org.apache.spark.streaming.dstream.SocketInputDStream@5433868e
>> 14/05/02 21:52:26 INFO ForEachDStream: Slide time = 1000 ms
>> 14/05/02 21:52:26 INFO ForEachDStream: Storage level =
>> StorageLevel(false, false, false, false, 1)
>> 14/05/02 21:52:26 INFO ForEachDStream: Checkpoint interval = null
>> 14/05/02 21:52:26 INFO ForEachDStream: Remember duration = 1000 ms
>> 14/05/02 21:52:26 INFO ForEachDStream: Initialized and validated
>> org.apache.spark.streaming.dstream.ForEachDStream@1ebdbc05
>> 14/05/02 21:52:26 INFO SparkContext: Starting job: collect at
>> ReceiverTracker.scala:270
>> 14/05/02 21:52:26 INFO RecurringTimer: Started timer for JobGenerator at
>> time 1399060346000
>> 14/05/02 21:52:26 INFO JobGenerator: Started JobGenerator at
>> 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobScheduler: Started JobScheduler
>> 14/05/02 21:52:26 INFO DAGScheduler: Registering RDD 3 (reduceByKey at
>> ReceiverTracker.scala:270)
>> 14/05/02 21:52:26 INFO ReceiverTracker: Stream 0 received 0 blocks
>> 14/05/02 21:52:26 INFO DAGScheduler: Got job 0 (collect at
>> ReceiverTracker.scala:270) with 20 output partitions (allowLocal=false)
>> 14/05/02 21:52:26 INFO DAGScheduler: Final stage: Stage 0(collect at
>> ReceiverTracker.scala:270)
>> 14/05/02 21:52:26 INFO DAGScheduler: Parents of final stage: List(Stage 1)
>> 14/05/02 21:52:26 INFO JobScheduler: Added jobs for time 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobScheduler: Starting job streaming job
>> 1399060346000 ms.0 from job set of time 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobGenerator: Checkpointing graph for time
>> 1399060346000 ms
>> -------------------------------------------14/05/02 21:52:26 INFO
>> DStreamGraph: Updating checkpoint data for time 1399060346000 ms
>>
>> Time: 1399060346000 ms
>> -------------------------------------------
>>
>> 14/05/02 21:52:26 INFO JobScheduler: Finished job streaming job
>> 1399060346000 ms.0 from job set of time 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobScheduler: Total delay: 0.325 s for time
>> 1399060346000 ms (execution: 0.024 s)
>>
>>
>>
>> 14/05/02 21:52:42 INFO JobScheduler: Added jobs for time 1399060362000 ms
>> 14/05/02 21:52:42 INFO JobGenerator: Checkpointing graph for time
>> 1399060362000 ms
>> 14/05/02 21:52:42 INFO DStreamGraph: Updating checkpoint data for time
>> 1399060362000 ms
>> 14/05/02 21:52:42 INFO DStreamGraph: Updated checkpoint data for time
>> 1399060362000 ms
>> 14/05/02 21:52:42 INFO JobScheduler: Starting job streaming job
>> 1399060362000 ms.0 from job set of time 1399060362000 ms
>> 14/05/02 21:52:42 INFO SparkContext: Starting job: take at
>> DStream.scala:593
>> 14/05/02 21:52:42 INFO DAGScheduler: Got job 2 (take at
>> DStream.scala:593) with 1 output partitions (allowLocal=true)
>> 14/05/02 21:52:42 INFO DAGScheduler: Final stage: Stage 3(take at
>> DStream.scala:593)
>> 14/05/02 21:52:42 INFO DAGScheduler: Parents of final stage: List()
>> 14/05/02 21:52:42 INFO DAGScheduler: Missing parents: List()
>> 14/05/02 21:52:42 INFO DAGScheduler: Computing the requested partition
>> locally
>> 14/05/02 21:52:42 INFO BlockManager: Found block input-0-1399060360400
>> locally
>> 14/05/02 21:52:42 INFO CheckpointWriter: Checkpoint for time
>> 1399060361000 ms saved to file '
>> hdfs://computer8:54310/user/root/INPUT/checkpoint-1399060361000', took
>> 2457
>> bytes and 507 ms
>> 14/05/02 21:52:42 INFO CheckpointWriter: Saving checkpoint for time
>> 1399060362000 ms to file '
>> hdfs://computer8:54310/user/root/INPUT/checkpoint-1399060362000'
>> 14/05/02 21:52:42 INFO DStreamGraph: Clearing checkpoint data for time
>> 1399060361000 ms
>> 14/05/02 21:52:42 INFO DStreamGraph: Cleared checkpoint data for time
>> 1399060361000 ms
>> 14/05/02 21:52:42 INFO BlockManagerInfo: Added input-0-1399060360800 in
>> memory on computer8.ant-net:50052 (size: 238.8 KB, free: 1177.0 MB)
>> 14/05/02 21:52:42 INFO SparkContext: Job finished: take at
>> DStream.scala:593, took 0.107033025 s
>> -------------------------------------------
>> Time: 1399060362000 ms
>> -------------------------------------------
>> The Project Gutenberg EBook of Don Quixote by Miguel de Cervantes This
>> eBook is
>> for the use of anyone anywhere at no cost and with almost no restrictions
>> whatsoever You may copy it give it away or re use it under the terms of
>> the
>> Project Gutenberg License included with this eBook or online at www
>> gutenberg
>> net Title Don Quixote Author Miguel de Cervantes Saavedra Release Date
>> July 27
>> 2004 EBook 996 Language English START OF THIS PROJECT GUTENBERG EBOOK DON
>> QUIXOTE Produced by David Widger DON QUIXOTE Complete by Miguel de
>> Cervantes
>> Saavedra Translated by John Ormsby CONTENTS Volume I CHAPTER I WHICH
>> TREATS OF
>> THE CHARACTER AND PURSUITS OF THE FAMOUS GENTLEMAN DON QUIXOTE OF LA
>> MANCHA
>> CHAPTER II WHICH TREATS OF THE FIRST SALLY THE INGENIOUS DON QUIXOTE MADE
>> FROM
>> ...
>>
>> 14/05/02 21:52:42 INFO JobScheduler: Finished job streaming job
>> 1399060362000 ms.0 from job set of time 1399060362000 ms
>>
>>
>>
>> On Apr 30, 2014, at 0:56, Tathagata Das <tathagata.das1565@gmail.com>
>> wrote:
>>
>> Strange! Can you just do lines.print() to print the raw data instead of
>> doing word count. Beyond that we can do two things.
>>
>> 1. Can see the Spark stage UI to see whether there are stages running
>> during the 30 second period you referred to?
>> 2. If you upgrade to using Spark master branch (or Spark 1.0 RC3, see
>> different thread by Patrick), it has a streaming UI, which shows the number
>> of records received, the state of the receiver, etc. That may be more
>> useful in debugging whats going on .
>>
>> TD
>>
>>
>> On Tue, Apr 29, 2014 at 3:31 PM, Eduardo Costa Alfaia <
>> e.costaalfaia@unibs.it> wrote:
>>
>>> Hi TD,
>>> We are not using stream context with master local, we have 1 Master and
>>> 8 Workers and 1 word source. The command line that we are using is:
>>> bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount
>>> spark://192.168.0.13:7077
>>>
>>> On Apr 30, 2014, at 0:09, Tathagata Das <tathagata.das1565@gmail.com>
>>> wrote:
>>>
>>> Is you batch size 30 seconds by any chance?
>>>
>>> Assuming not, please check whether you are creating the streaming
>>> context with master "local[n]" where n > 2. With "local" or "local[1]", the
>>> system only has one processing slot, which is occupied by the receiver
>>> leaving no room for processing the received data. It could be that after 30
>>> seconds, the server disconnects, the receiver terminates, releasing the
>>> single slot for the processing to proceed.
>>>
>>> TD
>>>
>>>
>>> On Tue, Apr 29, 2014 at 2:28 PM, Eduardo Costa Alfaia <
>>> e.costaalfaia@unibs.it> wrote:
>>>
>>>> Hi TD,
>>>>
>>>> In my tests with spark streaming, I'm using
>>>> JavaNetworkWordCount(modified) code and a program that I wrote that sends
>>>> words to the Spark worker, I use TCP as transport. I verified that after
>>>> starting Spark, it connects to my source which actually starts sending, but
>>>> the first word count is advertised approximately 30 seconds after the
>>>> context creation. So I'm wondering where is stored the 30 seconds data
>>>> already sent by the source. Is this a normal spark’s behaviour? I saw the
>>>> same behaviour using the shipped JavaNetworkWordCount application.
>>>>
>>>> Many thanks.
>>>> --
>>>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>>>>
>>>
>>>
>>>
>>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>>>
>>
>>
>>
>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>>
>
>

Mime
View raw message