spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Rodríguez Hortalá <juan.rodriguez.hort...@gmail.com>
Subject Re: Creating topology in spark streaming
Date Wed, 06 May 2015 09:17:28 GMT
Hi,

You can use the method repartition from DStream (for the Scala API) or
JavaDStream (for the Java API)

defrepartition(numPartitions: Int): DStream
<https://spark.apache.org/docs/latest/api/scala/org/apache/spark/streaming/dstream/DStream.html>
[T]

Return a new DStream with an increased or decreased level of parallelism.
Each RDD in the returned DStream has exactly numPartitions partitions.

I think the post
http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/
on integration of Spark Streaming gives very interesting review on the
subject, although the integration with Kafka it's not up to date with
https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html

Hope that helps.

Greetings,

Juan

2015-05-06 10:32 GMT+02:00 anshu shukla <anshushukla0@gmail.com>:

> But main problem is how to increase the level of parallelism  for any
> particular bolt logic .
>
> suppose i  want  this type of topology .
>
> https://storm.apache.org/documentation/images/topology.png
>
> How we can manage it .
>
> On Wed, May 6, 2015 at 1:36 PM, ayan guha <guha.ayan@gmail.com> wrote:
>
>> Every transformation on a dstream will create another dstream. You may
>> want to take a look at foreachrdd? Also, kindly share your code so people
>> can help better
>> On 6 May 2015 17:54, "anshu shukla" <anshushukla0@gmail.com> wrote:
>>
>>> Please help  guys, Even  After going through all the examples given i
>>> have not understood how to pass the  D-streams  from one bolt/logic to
>>> other (without writing it on HDFS etc.) just like emit function in storm .
>>> Suppose i have topology with 3  bolts(say)
>>>
>>> *BOLT1(parse the tweets nd emit tweet using given
>>> hashtags)=====>Bolt2(Complex logic for sentiment analysis over
>>> tweets)=======>BOLT3(submit tweets to the sql database using spark SQL)*
>>>
>>>
>>> Now  since Sentiment analysis will take most of the time ,we have to
>>> increase its level of parallelism for tuning latency. Howe to increase the
>>> levele of parallelism since the logic of topology is not clear .
>>>
>>> --
>>> Thanks & Regards,
>>> Anshu Shukla
>>> Indian Institute of Sciences
>>>
>>
>
>
> --
> Thanks & Regards,
> Anshu Shukla
>

Mime
View raw message