spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evo Eftimov" <>
Subject RE: Creating topology in spark streaming
Date Wed, 06 May 2015 09:37:58 GMT
What is called Bolt in Storm is essentially a combination of [Transformation/Action and DStream
RDD] in Spark – so to achieve a higher parallelism for specific Transformation/Action on
specific Dstream RDD simply repartition it to the required number of partitions which directly
relates to the corresponding number of Threads   


From: anshu shukla [] 
Sent: Wednesday, May 6, 2015 9:33 AM
To: ayan guha
Subject: Re: Creating topology in spark streaming


But main problem is how to increase the level of parallelism  for any particular bolt logic


suppose i  want  this type of topology .


How we can manage it .


On Wed, May 6, 2015 at 1:36 PM, ayan guha <> wrote:

Every transformation on a dstream will create another dstream. You may want to take a look
at foreachrdd? Also, kindly share your code so people can help better

On 6 May 2015 17:54, "anshu shukla" <> wrote:

Please help  guys, Even  After going through all the examples given i have not understood
how to pass the  D-streams  from one bolt/logic to other (without writing it on HDFS etc.)
just like emit function in storm .

Suppose i have topology with 3  bolts(say) 


BOLT1(parse the tweets nd emit tweet using given hashtags)=====>Bolt2(Complex logic for
sentiment analysis over tweets)=======>BOLT3(submit tweets to the sql database using spark



Now  since Sentiment analysis will take most of the time ,we have to increase its level of
parallelism for tuning latency. Howe to increase the levele of parallelism since the logic
of topology is not clear .



Thanks & Regards,
Anshu Shukla

Indian Institute of Sciences



Thanks & Regards,
Anshu Shukla

View raw message