spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gaganbm <>
Subject Re: Strange behaviour of different SSCs with same Kafka topic
Date Fri, 18 Apr 2014 05:58:42 GMT
It happens with normal data rate, i.e., lets say 20 records per second.

Apart from that, I am also getting some more strange behavior. Let me

I establish two sscs. Start them one after another. In SSCs I get the
streams from Kafka sources, and do some manipulations. Like, adding some
"Record_Name" for example, to each of the incoming records. Now this
Record_Name is different for both the SSCs, and I get this field from some
other class, not relevant to the streams.

Now, expected behavior should be, all records in SSC1 gets added with the
field RECORD_NAME_1 and all records in SSC2 should get added with the field
RECORD_NAME_2. Both the SSCs have nothing to do with each other as I

However, strangely enough, I find many records in SSC1 get added with
RECORD_NAME_2 and vice versa. Is it some kind of serialization issue ?
That, the class which provides this RECORD_NAME gets serialized and is
reconstructed and then some weird thing happens inside ? I am unable to
figure out.

So, apart from skewed frequency and volume of records in both the streams,
I am getting this inter-mingling of data among the streams.

Can you help me in how to use some external data to manipulate the RDD
records ?

Thanks and regards

Gagan B Mishra

*560034, Bangalore*

On Tue, Apr 15, 2014 at 4:09 AM, Tathagata Das [via Apache Spark User List]
<> wrote:

> Does this happen at low event rate for that topic as well, or only for a
> high volume rate?
> TD
> On Wed, Apr 9, 2014 at 11:24 PM, gaganbm <[hidden email]<http://user/SendEmail.jtp?type=node&node=4238&i=0>
> > wrote:
>> I am really at my wits' end here.
>> I have different Streaming contexts, lets say 2, and both listening to
>> same
>> Kafka topics. I establish the KafkaStream by setting different consumer
>> groups to each of them.
>> Ideally, I should be seeing the kafka events in both the streams. But
>> what I
>> am getting is really unpredictable. Only one stream gets a lot of events
>> and
>> the other one almost gets nothing or very less compared to the other. Also
>> the frequency is very skewed. I get a lot of events in one stream
>> continuously, and after some duration I get a few events in the other one.
>> I don't know where I am going wrong. I can see consumer fetcher threads
>> for
>> both the streams that listen to the Kafka topics.
>> I can give further details if needed. Any help will be great.
>> Thanks
>> --
>> View this message in context:
>> Sent from the Apache Spark User List mailing list archive at
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>  To start a new topic under Apache Spark User List, email
> To unsubscribe from Apache Spark User List, click here<>
> .
> NAML<>

View this message in context:
Sent from the Apache Spark User List mailing list archive at
View raw message