spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: kafka and zookeeper set up in prod for spark streaming
Date Fri, 03 Mar 2017 10:33:03 GMT
Thanks all. How about Kafka HA which is important. Is it best to use
application specific Kafka delivery or Kafka MirrorMaker?

Cheers

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 3 March 2017 at 10:22, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:

>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> Forwarded conversation
> Subject: kafka and zookeeper set up in prod for spark streaming
> ------------------------
>
> From: Mich Talebzadeh <mich.talebzadeh@gmail.com>
> Date: 3 March 2017 at 08:15
> To: "user @spark" <user@spark.apache.org>
>
>
>
> hi,
>
> In DEV, Kafka and ZooKeeper services can be co- located.on the same
> physical hosts
>
> In Prod moving forward do we need to set up Zookeeper on its own cluster
> not sharing with Hadoop cluster? Can these services be shared within the
> Hadoop cluster?
>
> How best to set up Zookeeper that is needed for Kafka for use with Spark
> Streaming?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> ----------
> From: Jörn Franke <jornfranke@gmail.com>
> Date: 3 March 2017 at 08:29
> To: Mich Talebzadeh <mich.talebzadeh@gmail.com>
> Cc: "user @spark" <user@spark.apache.org>
>
>
> I think this highly depends on the risk that you want to be exposed to. If
> you have it on dedicated nodes there is less influence of other processes.
>
> I have seen both: on Hadoop nodes or dedicated. On Hadoop I would not
> recommend to put it on data nodes/heavily utilized nodes.
>
> Zookeeper does not need many resources (if you do not abuse it) and you
> may think about putting it on a dedicated small infrastructure of several
> nodes.
>
> ----------
> From: vincent gromakowski <vincent.gromakowski@gmail.com>
> Date: 3 March 2017 at 08:29
> To: Mich Talebzadeh <mich.talebzadeh@gmail.com>
> Cc: "user @spark" <user@spark.apache.org>
>
>
> Hi,
> Depending on the Kafka version (< 0.8.2 I think), offsets are managed in
> Zookeeper and if you have lots of consumer it's recommended to use a
> dedicated zookeeper cluster (always with dedicated disks, even SSD is
> better). On newer version offsets are managed in special Kafka topics and
> Zookeeper is only used to store metadata, you can share it with Hadoop.
> Maybe you can reach a limit depending on the size of your Kafka, the number
> of topics, producers/consumers... but I have never heard yet. Another point
> is to be careful about security on Zookeeper, sharing a cluster means you
> get the same security level (authentication or not)
>
> ----------
> From: vincent gromakowski <vincent.gromakowski@gmail.com>
> Date: 3 March 2017 at 08:31
> To: Jörn Franke <jornfranke@gmail.com>
> Cc: Mich Talebzadeh <mich.talebzadeh@gmail.com>, "user @spark" <
> user@spark.apache.org>
>
>
> I forgot to mention it also depends on the spark kafka connector you use.
> If it's receiver based, I recommend a dedicated zookeeper cluster because
> it is used to store offsets. If it's receiver less Zookeeper can be shared.
>
>
>

Mime
View raw message