spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduardo Costa Alfaia <>
Subject Re: Spark's behavior
Date Fri, 02 May 2014 15:24:23 GMT
Hi TD,

I have get the first informations( in attach). The second suggestion I will do today.

On Apr 30, 2014, at 0:56, Tathagata Das <> wrote:

> Strange! Can you just do lines.print() to print the raw data instead of doing word count.
Beyond that we can do two things. 
> 1. Can see the Spark stage UI to see whether there are stages running during the 30 second
period you referred to?
> 2. If you upgrade to using Spark master branch (or Spark 1.0 RC3, see different thread
by Patrick), it has a streaming UI, which shows the number of records received, the state
of the receiver, etc. That may be more useful in debugging whats going on .
> TD 
> On Tue, Apr 29, 2014 at 3:31 PM, Eduardo Costa Alfaia <>
> Hi TD,
> We are not using stream context with master local, we have 1 Master and 8 Workers and
1 word source. The command line that we are using is:
> bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount spark://
> On Apr 30, 2014, at 0:09, Tathagata Das <> wrote:
>> Is you batch size 30 seconds by any chance? 
>> Assuming not, please check whether you are creating the streaming context with master
"local[n]" where n > 2. With "local" or "local[1]", the system only has one processing
slot, which is occupied by the receiver leaving no room for processing the received data.
It could be that after 30 seconds, the server disconnects, the receiver terminates, releasing
the single slot for the processing to proceed. 
>> TD
>> On Tue, Apr 29, 2014 at 2:28 PM, Eduardo Costa Alfaia <>
>> Hi TD,
>> In my tests with spark streaming, I'm using JavaNetworkWordCount(modified) code and
a program that I wrote that sends words to the Spark worker, I use TCP as transport. I verified
that after starting Spark, it connects to my source which actually starts sending, but the
first word count is advertised approximately 30 seconds after the context creation. So I'm
wondering where is stored the 30 seconds data already sent by the source. Is this a normal
spark’s behaviour? I saw the same behaviour using the shipped JavaNetworkWordCount application.
>> Many thanks.
>> --
>> Informativa sulla Privacy:
> Informativa sulla Privacy:

Informativa sulla Privacy:

View raw message