spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anshu shukla <anshushuk...@gmail.com>
Subject Re: Creating topology in spark streaming
Date Wed, 06 May 2015 09:34:59 GMT
Thanks alot Juan,

That  was a great  post, One more thing if u can .Any there any demo/blog
 telling how to configure or create  a topology of different types .. i
mean how we can decide the pipelining model in spark  as done in storm  for
  https://storm.apache.org/documentation/images/topology.png .


On Wed, May 6, 2015 at 2:47 PM, Juan Rodríguez Hortalá <
juan.rodriguez.hortala@gmail.com> wrote:

> Hi,
>
> You can use the method repartition from DStream (for the Scala API) or
> JavaDStream (for the Java API)
>
> defrepartition(numPartitions: Int): DStream
> <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/streaming/dstream/DStream.html>
> [T]
>
> Return a new DStream with an increased or decreased level of parallelism.
> Each RDD in the returned DStream has exactly numPartitions partitions.
>
> I think the post
> http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/
> on integration of Spark Streaming gives very interesting review on the
> subject, although the integration with Kafka it's not up to date with
> https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html
>
> Hope that helps.
>
> Greetings,
>
> Juan
>
> 2015-05-06 10:32 GMT+02:00 anshu shukla <anshushukla0@gmail.com>:
>
>> But main problem is how to increase the level of parallelism  for any
>> particular bolt logic .
>>
>> suppose i  want  this type of topology .
>>
>> https://storm.apache.org/documentation/images/topology.png
>>
>> How we can manage it .
>>
>> On Wed, May 6, 2015 at 1:36 PM, ayan guha <guha.ayan@gmail.com> wrote:
>>
>>> Every transformation on a dstream will create another dstream. You may
>>> want to take a look at foreachrdd? Also, kindly share your code so people
>>> can help better
>>> On 6 May 2015 17:54, "anshu shukla" <anshushukla0@gmail.com> wrote:
>>>
>>>> Please help  guys, Even  After going through all the examples given i
>>>> have not understood how to pass the  D-streams  from one bolt/logic to
>>>> other (without writing it on HDFS etc.) just like emit function in storm
.
>>>> Suppose i have topology with 3  bolts(say)
>>>>
>>>> *BOLT1(parse the tweets nd emit tweet using given
>>>> hashtags)=====>Bolt2(Complex logic for sentiment analysis over
>>>> tweets)=======>BOLT3(submit tweets to the sql database using spark SQL)*
>>>>
>>>>
>>>> Now  since Sentiment analysis will take most of the time ,we have to
>>>> increase its level of parallelism for tuning latency. Howe to increase the
>>>> levele of parallelism since the logic of topology is not clear .
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Anshu Shukla
>>>> Indian Institute of Sciences
>>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anshu Shukla
>>
>
>


-- 
Thanks & Regards,
Anshu Shukla

Mime
View raw message