spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jamison.bennett" <>
Subject Re: Apache Hadoop and Spark
Date Thu, 25 Jan 2018 14:58:42 GMT
Hi Mutahir,

I will try to answer some of your questions.

Q1) Can we use Mapreduce and apache spark in the same cluster
Yes. I run a cluster with both MapReduce2 and Spark and I use Yarn as the
resource manager.

Q2) is it mandatory to use GPUs for apache spark?
No. My cluster has Spark and does not have any GPUs.

Q3) I read that apache spark is in-memory, will it benefit from SSD / Flash
for caching or persistent storage?
As you noted, Spark is in-memory but there may be a few places that faster
storage may benefit including:
- Storage of the data file data read into Spark from HDFS DataNodes
-  RDD persistence
when caching includes one of the disk options
- Spark shuffle service - Between Spark stages which process the data
in-memory, intermediate results from Spark executors are written to storage
and served to the next stage by the shuffle service.
I don't have any benchmark results for these, but it might be something you
want to look into.


Sent from:

To unsubscribe e-mail:

View raw message