spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Rai <ro...@tuplejump.com>
Subject Re: Spark usage patterns and questions
Date Fri, 14 Mar 2014 10:19:32 GMT
>
> 3. In our usecase we read from Kafka, do some mapping and lastly persists
> data to cassandra as well as pushes the data over remote actor for
> realtime update in dashboard. I used below approaches
>  - First tried to use vary naive way like stream.map(...)*.foreachRDD(
> pushes to actor)*
>     It does not work and stage failed saying akka exception
>  - Second tried to use
> akka.serialization.JavaSerilizer.withSystem(system){...} approach
>      It does not work and stage failed BUT without any trace anywhere in
> lofs
>  - Finally did rdd.collect to collect the output into driver and then
> pushes to actor
>      It worked.


Have you tried writing to Cassandra using Hadoop Output instead of pushing
it to actor in the foreachRDD handler. I am sure that will be more
efficient than collect and send to actor.

In Calliope we do a similar thing... You can checkout the example here -
http://tuplejump.github.io/calliope/streaming.html
https://github.com/tuplejump/calliope/blob/develop/src/main/scala/com/tuplejump/calliope/examples/PersistDStream.scala

Hope that helps

Regards,
Rohit


*Founder & CEO, **Tuplejump, Inc.*
____________________________
www.tuplejump.com
*The Data Engineering Platform*


On Tue, Mar 11, 2014 at 11:39 PM, Sourav Chandra <
sourav.chandra@livestream.com> wrote:

> Hi,
>
> I have some questions regarding usage patterns and debugging in
> spark/spark streaming.
>
> 1. What is some used design patterns of using broadcast variable? In my
> application i created some and also created a scheduled task which
> periodically refreshes the variables. I want to know how efficiently and in
> modular way people generally achieve this?
>
> 2. Sometimes a uncaught exception in driver program/worker does not get
> traced anywhere? How can we debug this?
>
> 3. In our usecase we read from Kafka, do some mapping and lastly persists
> data to cassandra as well as pushes the data over remote actor for realtime
> update in dashboard. I used below approaches
>  - First tried to use vary naive way like stream.map(...).foreachRDD(
> pushes to actor)
>     It does not work and stage failed saying akka exception
>  - Second tried to use
> akka.serialization.JavaSerilizer.withSystem(system){...} approach
>      It does not work and stage failed BUT without any trace anywhere in
> lofs
>  - Finally did rdd.collect to collect the output into driver and then
> pushes to actor
>      It worked.
>
> I would like to know is there any efficient way of achieving this sort of
> usecases
>
> 4. Sometimes I see failed stages but when opened those stage details it
> said stage did not start. What does this mean?
>
> Looking forward for some interesting responses :)
>
> Thanks,
> --
>
> Sourav Chandra
>
> Senior Software Engineer
>
> · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
· · · ·
>
> sourav.chandra@livestream.com
>
> o: +91 80 4121 8723
>
> m: +91 988 699 3746
>
> skype: sourav.chandra
>
> Livestream
>
> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
> Block, Koramangala Industrial Area,
>
> Bangalore 560034
>
> www.livestream.com
>

Mime
View raw message