spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhaval Modi <dhavalmod...@gmail.com>
Subject Re: Advice on multiple streaming job
Date Mon, 07 May 2018 09:08:07 GMT
Hi Gerard,

Our source is kafka, and we are using standard streaming api (DStreams).

Our requirement is,  as we have 100's of kafka topics, Each topic sends
different messages in JSON (complex) format. Topics structured are as per
domain.
Hence, each topic is independent of each other.
These JSON messages needs to be flattened and stored in Hive.

For these 100's of topic, currently we have 100's of jobs running
independently and using different UI port.



Regards,
Dhaval Modi
dhavalmodi24@gmail.com

On 7 May 2018 at 13:53, Gerard Maas <gerard.maas@gmail.com> wrote:

> Dhaval,
>
> Which Streaming API are you using?
> In Structured Streaming, you are able to start several streaming queries
> within the same context.
>
> kind regards, Gerard.
>
> On Sun, May 6, 2018 at 7:59 PM, Dhaval Modi <dhavalmodi24@gmail.com>
> wrote:
>
>> Hi Susan,
>>
>> Thanks for your response.
>>
>> Will try configuration as suggested.
>>
>> But still i am looking for answer does Spark support running multiple
>> jobs on the same port?
>>
>> On Sun, May 6, 2018, 20:27 Susan X. Huynh <xhuynh@mesosphere.io> wrote:
>>
>>> Hi Dhaval,
>>>
>>> Not sure if you have considered this: the port 4040 sounds like a driver
>>> UI port. By default it will try up to 4056, but you can increase that
>>> number with "spark.port.maxRetries". (https://spark.apache.org/docs
>>> /latest/configuration.html) Try setting it to "32". This would help if
>>> the only conflict is among the driver UI ports (like if you have > 16
>>> drivers running on the same host).
>>>
>>> Susan
>>>
>>> On Sun, May 6, 2018 at 12:32 AM, vincent gromakowski <
>>> vincent.gromakowski@gmail.com> wrote:
>>>
>>>> Use a scheduler that abstract the network away with a CNI for instance
>>>> or other mécanismes (mesos, kubernetes, yarn). The CNI will allow to always
>>>> bind on the same ports because each container will have its own IP. Some
>>>> other solution like mesos and marathon can work without CNI , with host IP
>>>> binding, but will manage the ports for you ensuring there isn't any
>>>> conflict.
>>>>
>>>> Le sam. 5 mai 2018 à 17:10, Dhaval Modi <dhavalmodi24@gmail.com> a
>>>> écrit :
>>>>
>>>>> Hi All,
>>>>>
>>>>> Need advice on executing multiple streaming jobs.
>>>>>
>>>>> Problem:- We have 100's of streaming job. Every streaming job uses new
>>>>> port. Also, Spark automatically checks port from 4040 to 4056, post that
it
>>>>> fails. One of the workaround, is to provide port explicitly.
>>>>>
>>>>> Is there a way to tackle this situation? or Am I missing any thing?
>>>>>
>>>>> Thanking you in advance.
>>>>>
>>>>> Regards,
>>>>> Dhaval Modi
>>>>> dhavalmodi24@gmail.com
>>>>>
>>>>
>>>
>>>
>>> --
>>> Susan X. Huynh
>>> Software engineer, Data Agility
>>> xhuynh@mesosphere.com
>>>
>>
>

Mime
View raw message