spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From durga <>
Subject persistent HDFS instance for cluster restarts/destroys
Date Wed, 23 Jul 2014 22:26:27 GMT
Hi All,
I have a question,

For my company , we are planning to use spark-ec2 scripts to create cluster
for us.

I understand that , persistent HDFS will make the hdfs available for cluster

Question is:

1) What happens , If I destroy and re-create , do I loose the data.
    a) If I loose the data , is there only way is to copy to s3 and recopy
after launching the cluster(it seems costly data transfer from and to s3?)
2) How would I add/remove some machines in the cluster?. I mean I am asking
for cluster management.
Is there any place amazon allows to see the machines , and do the operation
of adding and removing?


View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message