spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Jay <bill.jaypeter...@gmail.com>
Subject Re: spark streaming rate limiting from kafka
Date Sun, 20 Jul 2014 06:51:36 GMT
Hi Tobias,

It seems that repartition can create more executors for the stages
following data receiving. However, the number of executors is still far
less than what I require (I specify one core for each executor). Based on
the index of the executors in the stage, I find many numbers are missing in
between. For example, if I repartition(100), the index of executors may be
1, 3, 5, 10, etc. Finally, there may be 45 executors although I request 100
partitions.

Bill


On Thu, Jul 17, 2014 at 6:15 PM, Tobias Pfeiffer <tgp@preferred.jp> wrote:

> Bill,
>
> are you saying, after repartition(400), you have 400 partitions on one
> host and the other hosts receive nothing of the data?
>
> Tobias
>
>
> On Fri, Jul 18, 2014 at 8:11 AM, Bill Jay <bill.jaypeterson@gmail.com>
> wrote:
>
>> I also have an issue consuming from Kafka. When I consume from Kafka,
>> there are always a single executor working on this job. Even I use
>> repartition, it seems that there is still a single executor. Does anyone
>> has an idea how to add parallelism to this job?
>>
>>
>>
>> On Thu, Jul 17, 2014 at 2:06 PM, Chen Song <chen.song.82@gmail.com>
>> wrote:
>>
>>> Thanks Luis and Tobias.
>>>
>>>
>>> On Tue, Jul 1, 2014 at 11:39 PM, Tobias Pfeiffer <tgp@preferred.jp>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> On Wed, Jul 2, 2014 at 1:57 AM, Chen Song <chen.song.82@gmail.com>
>>>> wrote:
>>>>>
>>>>> * Is there a way to control how far Kafka Dstream can read on
>>>>> topic-partition (via offset for example). By setting this to a small
>>>>> number, it will force DStream to read less data initially.
>>>>>
>>>>
>>>> Please see the post at
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3CCAPH-c_M2ppurJx-n_TEhh0BVqe_6LA-RVgtRF1K-LWrMMe+1gQ@mail.gmail.com%3E
>>>> Kafka's auto.offset.reset parameter may be what you are looking for.
>>>>
>>>> Tobias
>>>>
>>>>
>>>
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>

Mime
View raw message