spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jamison.bennett" <jamison.benn...@gmail.com>
Subject Re: Apache Hadoop and Spark
Date Thu, 25 Jan 2018 14:58:42 GMT
Hi Mutahir,

I will try to answer some of your questions.

Q1) Can we use Mapreduce and apache spark in the same cluster
Yes. I run a cluster with both MapReduce2 and Spark and I use Yarn as the
resource manager.

Q2) is it mandatory to use GPUs for apache spark?
No. My cluster has Spark and does not have any GPUs.

Q3) I read that apache spark is in-memory, will it benefit from SSD / Flash
for caching or persistent storage?
As you noted, Spark is in-memory but there may be a few places that faster
storage may benefit including:
- Storage of the data file data read into Spark from HDFS DataNodes
-  RDD persistence
<https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence>  
when caching includes one of the disk options
- Spark shuffle service - Between Spark stages which process the data
in-memory, intermediate results from Spark executors are written to storage
and served to the next stage by the shuffle service.
I don't have any benchmark results for these, but it might be something you
want to look into.

Thanks,
Jamison



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message