spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Zhebet <phpap...@gmail.com>
Subject Re: multiple spark streaming contexts
Date Mon, 01 Aug 2016 07:38:39 GMT
Hi, If you want read several kafka topics in spark-streaming job, you can
set names of topics splited by coma and after that you can read all
messages from all topics in one flow:

val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap

val lines = KafkaUtils.createStream[String, String, StringDecoder,
StringDecoder](ssc, kafkaParams, topicMap,
StorageLevel.MEMORY_ONLY).map(_._2)


After that you can use ".filter" function for splitting your topics
and iterate messages separately.

val orders_paid = lines.filter(x => { x("table_name") == "kismia.orders_paid"})

orders_paid.foreachRDD( rdd => { ....


Or you can you you if..else construction for splitting your messages by
names in foreachRDD:

lines.foreachRDD((recrdd, time: Time) => {

   recrdd.foreachPartition(part => {

      part.foreach(item_row => {

         if (item_row("table_name") == "kismia.orders_paid") { ...}
else if (...) {...}

....


2016-08-01 9:39 GMT+03:00 Sumit Khanna <sumit.khanna@askme.in>:

> Any ideas guys? What are the best practices for multiple streams to be
> processed?
> I could trace a few Stack overflow comments wherein they better recommend
> a jar separate for each stream / use case. But that isn't pretty much what
> I want, as in it's better if one / multiple spark streaming contexts can
> all be handled well within a single jar.
>
> Guys please reply,
>
> Awaiting,
>
> Thanks,
> Sumit
>
> On Mon, Aug 1, 2016 at 12:24 AM, Sumit Khanna <sumit.khanna@askme.in>
> wrote:
>
>> Any ideas on this one guys ?
>>
>> I can do a sample run but can't be sure of imminent problems if any? How
>> can I ensure different batchDuration etc etc in here, per StreamingContext.
>>
>> Thanks,
>>
>> On Sun, Jul 31, 2016 at 10:50 AM, Sumit Khanna <sumit.khanna@askme.in>
>> wrote:
>>
>>> Hey,
>>>
>>> Was wondering if I could create multiple spark stream contexts in my
>>> application (e.g instantiating a worker actor per topic and it has its own
>>> streaming context its own batch duration everything).
>>>
>>> What are the caveats if any?
>>> What are the best practices?
>>>
>>> Have googled half heartedly on the same but the air isn't pretty much
>>> demystified yet. I could skim through something like
>>>
>>>
>>> http://stackoverflow.com/questions/29612726/how-do-you-setup-multiple-spark-streaming-jobs-with-different-batch-durations
>>>
>>>
>>> http://stackoverflow.com/questions/37006565/multiple-spark-streaming-contexts-on-one-worker
>>>
>>> Thanks in Advance!
>>> Sumit
>>>
>>
>>
>

Mime
View raw message