spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <vivek.meghanat...@wipro.com>
Subject Re: Spark Streaming + Kafka + scala job message read issue
Date Sun, 27 Dec 2015 05:37:30 GMT
Hi Bryan,
Yes we are using only 1 thread per topic as we have only one Kafka server with 1 partition.
What kind of logs will tell us what offset spark stream is reading from Kafka or is it resetting
something without reading?

Regards
Vivek


Sent using CloudMagic Email<https://cloudmagic.com/k/d/mailapp?ct=pa&cv=8.0.67&pv=5.1.1&source=email_footer_2>
On Sun, Dec 27, 2015 at 12:03 am, Bryan <bryan.jeffrey@gmail.com<mailto:bryan.jeffrey@gmail.com>>
wrote:

Vivek,

Where you’re using numThreads – look at the documentation for createStream. I believe
that number should be the number of partitions to consume.

Sent from Outlook Mail<http://go.microsoft.com/fwlink/?LinkId=550987> for Windows 10
phone


From: vivek.meghanathan@wipro.com<mailto:vivek.meghanathan@wipro.com>
Sent: Friday, December 25, 2015 11:39 PM
To: bryan.jeffrey@gmail.com<mailto:bryan.jeffrey@gmail.com>
Cc: duc.was.here@gmail.com<mailto:duc.was.here@gmail.com>; vivek.meghanathan@wipro.com<mailto:vivek.meghanathan@wipro.com>;
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark Streaming + Kafka + scala job message read issue


Hi Brian,PhuDuc,

All 8 jobs are consuming 8 different IN topics. 8 different Scala jobs running each topic
map mentioned below has only 1 thread number mentioned. In this case group should not be a
problem right.

Here is the complete flow, spring MVC sends in messages to Kafka , spark streaming reading
that and sends message back to Kafka, some cases they will update data to Cassandra only.
Spring the response messages.
I could see the message is always reaching Kafka (checked through the console consumer).

Regards
Vivek


Sent using CloudMagic Email<https://cloudmagic.com/k/d/mailapp?ct=pa&cv=8.0.67&pv=5.1.1&source=email_footer_2>
On Sat, Dec 26, 2015 at 2:42 am, Bryan <bryan.jeffrey@gmail.com<mailto:bryan.jeffrey@gmail.com>>
wrote:

Agreed. I did not see that they were using the same group name.

Sent from Outlook Mail<http://go.microsoft.com/fwlink/?LinkId=550987> for Windows 10
phone


From: PhuDuc Nguyen<mailto:duc.was.here@gmail.com>
Sent: Friday, December 25, 2015 3:35 PM
To: vivek.meghanathan@wipro.com<mailto:vivek.meghanathan@wipro.com>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark Streaming + Kafka + scala job message read issue

Vivek,

Did you say you have 8 spark jobs that are consuming from the same topic and all jobs are
using the same consumer group name? If so, each job would get a subset of messages from that
kafka topic, ie each job would get 1 out of 8 messages from that topic. Is that your intent?

regards,
Duc






On Thu, Dec 24, 2015 at 7:20 AM, <vivek.meghanathan@wipro.com<mailto:vivek.meghanathan@wipro.com>>
wrote:
We are using the older receiver based approach, the number of partitions is 1 (we have a single
node kafka) and we use single thread per topic still we have the problem. Please see the API
we use. All 8 spark jobs use same group name – is that a problem?

val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap  - Number of threads used
here is 1
val searches = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(line => parse(line._2).extract[Search])


Regards,
Vivek M
From: Bryan [mailto:bryan.jeffrey@gmail.com<mailto:bryan.jeffrey@gmail.com>]
Sent: 24 December 2015 17:20
To: Vivek Meghanathan (WT01 - NEP) <vivek.meghanathan@wipro.com<mailto:vivek.meghanathan@wipro.com>>;
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Spark Streaming + Kafka + scala job message read issue

Are you using a direct stream consumer, or the older receiver based consumer? If the latter,
do the number of partitions you’ve specified for your topic match the number of partitions
in the topic on Kafka?

That would be an possible cause – as you might receive all data from a given partition while
missing data from other partitions.

Regards,

Bryan Jeffrey

Sent from Outlook Mail<http://go.microsoft.com/fwlink/?LinkId=550987> for Windows 10
phone


From: vivek.meghanathan@wipro.com<mailto:vivek.meghanathan@wipro.com>
Sent: Thursday, December 24, 2015 5:22 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark Streaming + Kafka + scala job message read issue

Hi All,



We are using Bitnami Kafka 0.8.2 + spark 1.5.2 in Google cloud platform. Our spark streaming
job(consumer) not receiving all the messages sent to the specific topic. It receives 1 out
of ~50 messages(added log in the job stream and identified). We are not seeing any errors
in the kafka logs. Unable to debug further from kafka layer. The console consumer shows the
INPUT topic is received in the console. it is not reaching the spark-kafka integration stream.
Any thoughts how to debug this issue. Another topic is working fine in same setup.

Again tried with spark 1.3.0, kafka 0.8.1.1 which is also has same issue. All these jobs are
working fine in our local lab servers

Regards,
Vivek M
The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments. WARNING: Computer viruses can be transmitted via email.
The recipient should check this email and any attachments for the presence of viruses. The
company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments. WARNING: Computer viruses can be transmitted via email.
The recipient should check this email and any attachments for the presence of viruses. The
company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com<http://www.wipro.com>


The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments. WARNING: Computer viruses can be transmitted via email.
The recipient should check this email and any attachments for the presence of viruses. The
company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com

The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments. WARNING: Computer viruses can be transmitted via email.
The recipient should check this email and any attachments for the presence of viruses. The
company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com

Mime
View raw message