spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumit Khanna <sumit.kha...@askme.in>
Subject Re: multiple spark streaming contexts
Date Mon, 01 Aug 2016 07:41:22 GMT
Hey Nikolay,

I know the approach, but this pretty much doesnt fit the bill for my
usecase wherein each topic needs to be logged / persisted as a separate
hdfs location.

I am looking for something where a streaming context pertains to a topic
and that topic only, and was wondering if I could have them all in parallel
in one app / jar run.

Thanks,

On Mon, Aug 1, 2016 at 1:08 PM, Nikolay Zhebet <phpapple@gmail.com> wrote:

> Hi, If you want read several kafka topics in spark-streaming job, you can
> set names of topics splited by coma and after that you can read all
> messages from all topics in one flow:
>
> val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
>
> val lines = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](ssc,
kafkaParams, topicMap, StorageLevel.MEMORY_ONLY).map(_._2)
>
>
> After that you can use ".filter" function for splitting your topics and iterate messages
separately.
>
> val orders_paid = lines.filter(x => { x("table_name") == "kismia.orders_paid"})
>
> orders_paid.foreachRDD( rdd => { ....
>
>
> Or you can you you if..else construction for splitting your messages by
> names in foreachRDD:
>
> lines.foreachRDD((recrdd, time: Time) => {
>
>    recrdd.foreachPartition(part => {
>
>       part.foreach(item_row => {
>
>          if (item_row("table_name") == "kismia.orders_paid") { ...} else if (...) {...}
>
> ....
>
>
> 2016-08-01 9:39 GMT+03:00 Sumit Khanna <sumit.khanna@askme.in>:
>
>> Any ideas guys? What are the best practices for multiple streams to be
>> processed?
>> I could trace a few Stack overflow comments wherein they better recommend
>> a jar separate for each stream / use case. But that isn't pretty much what
>> I want, as in it's better if one / multiple spark streaming contexts can
>> all be handled well within a single jar.
>>
>> Guys please reply,
>>
>> Awaiting,
>>
>> Thanks,
>> Sumit
>>
>> On Mon, Aug 1, 2016 at 12:24 AM, Sumit Khanna <sumit.khanna@askme.in>
>> wrote:
>>
>>> Any ideas on this one guys ?
>>>
>>> I can do a sample run but can't be sure of imminent problems if any? How
>>> can I ensure different batchDuration etc etc in here, per StreamingContext.
>>>
>>> Thanks,
>>>
>>> On Sun, Jul 31, 2016 at 10:50 AM, Sumit Khanna <sumit.khanna@askme.in>
>>> wrote:
>>>
>>>> Hey,
>>>>
>>>> Was wondering if I could create multiple spark stream contexts in my
>>>> application (e.g instantiating a worker actor per topic and it has its own
>>>> streaming context its own batch duration everything).
>>>>
>>>> What are the caveats if any?
>>>> What are the best practices?
>>>>
>>>> Have googled half heartedly on the same but the air isn't pretty much
>>>> demystified yet. I could skim through something like
>>>>
>>>>
>>>> http://stackoverflow.com/questions/29612726/how-do-you-setup-multiple-spark-streaming-jobs-with-different-batch-durations
>>>>
>>>>
>>>> http://stackoverflow.com/questions/37006565/multiple-spark-streaming-contexts-on-one-worker
>>>>
>>>> Thanks in Advance!
>>>> Sumit
>>>>
>>>
>>>
>>
>

Mime
View raw message