spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
Subject Re: Apache Spark data locality when integrating with Kafka
Date Mon, 08 Feb 2016 02:52:06 GMT
We   are using spark in  two  ways 
1. Yarn with spark support. Kafka running along with  data nodes 
2.  Spark master and workers  running  with  some  of  Kafka brokers. 
Data locality is  important.

Regards
Diwakar 


Sent from Samsung Mobile.

<div>-------- Original message --------</div><div>From: أنس الليثي
<dev.fanooos@gmail.com> </div><div>Date:08/02/2016  02:07  (GMT+05:30) </div><div>To:
Diwakar Dhanuskodi <diwakar.dhanuskodi@gmail.com> </div><div>Cc: "Yuval.Itzchakov"
<yuvalos@gmail.com>, user <user@spark.apache.org> </div><div>Subject:
Re: Apache Spark data locality when integrating with Kafka </div><div>
</div>Diwakar 

We have our own servers. We will not use any cloud service like Amazon's 

On 7 February 2016 at 18:24, Diwakar Dhanuskodi <diwakar.dhanuskodi@gmail.com> wrote:
Fanoos, 
Where  you  want the solution to  be deployed ?. On premise or cloud?

Regards 
Diwakar .



Sent from Samsung Mobile.


-------- Original message --------
From: "Yuval.Itzchakov" <yuvalos@gmail.com>
Date:07/02/2016 19:38 (GMT+05:30)
To: user@spark.apache.org
Cc:
Subject: Re: Apache Spark data locality when integrating with Kafka

I would definitely try to avoid hosting Kafka and Spark on the same servers. 

Kafka and Spark will be doing alot of IO between them, so you'll want to
maximize on those resources and not share them on the same server. You'll
want each Kafka broker to be on a dedicated server, as well as your spark
master and workers. If you're hosting them on Amazon EC2 instances, then
you'll want these to be on the same availability zone, so you can benefit
from low latency in that same zone. If you're on a dedicated servers,
perhaps you'll want to create a VPC between the two clusters so you can,
again, benefit from low IO latency and high throughput.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-data-locality-when-integrating-with-Kafka-tp26165p26170.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org




-- 
Anas Rabei
Senior Software Developer
Mubasher.info
anas.rabei@mubasher.info
Mime
View raw message