spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marius Soutier <mps....@gmail.com>
Subject Re: use additional ebs volumes for hsdf storage with spark-ec2
Date Sat, 01 Nov 2014 15:23:54 GMT
Are these /vols formatted? You typically need to format and define a mount point in /mnt for
attached EBS volumes.

I’m not using the ec2 script, so I don’t know what is installed, but there’s usually
an HDFS info service running on port 50070. After changing hdfs-site.xml, you have to restart
the HDFS service. The Cloudera distribution supports this in the UI, otherwise depending on
your version and so on there should be scripts in /usr/local/hadoop, /usr/lib/hadoop-hdfs,
or something similar.

On 31.10.2014, at 05:56, Daniel Mahler <dmahler@gmail.com> wrote:

> Thanks Akhil. I tried changing /root/ephemeral-hdfs/conf/hdfs-site.xml to have
> 
>   <property>
>     <name>dfs.data.dir</name>
>     <value>/vol,/vol0,/vol1,/vol2,/vol3,/vol4,/vol5,/vol6,/vol7,/mnt/ephemeral-hdfs/data,/mnt2/ephemeral-hdfs/data</value>
>   </property>
> 
> and then running
> 
> /root/ephemeral-hdfs/bin/stop-all.sh
> copy-dir  /root/ephemeral-hdfs/conf/
> /root/ephemeral-hdfs/bin/start-all.sh
> 
> to try and make sure the new configurations taks on the entire cluster.
> I then ran spark to write to the local hdfs.
> It failed after filling the original /mnt* mounted drives,,
> without writing anything to the attached /vol* drives.
> 
> I also tried completely stopping and restarting the cluster,
> but restarting resets /root/ephemeral-hdfs/conf/hdfs-site.xml to the default state.
> 
> thanks
> Daniel
> 
> 
> 
> On Thu, Oct 30, 2014 at 1:56 AM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
> I think you can check in the core-site.xml or hdfs-site.xml file under /root/ephemeral-hdfs/etc/hadoop/
where you can see data node dir property which will be a comma separated list of volumes.

> 
> Thanks
> Best Regards
> 
> On Thu, Oct 30, 2014 at 5:21 AM, Daniel Mahler <dmahler@gmail.com> wrote:
> I started my ec2 spark cluster with 
> 
>     ./ec2/spark---ebs-vol-{size=100,num=8,type=gp2} -t m3.xlarge -s 10 launch mycluster
> 
> I see the additional volumes attached but they do not seem to be set up for hdfs.
> How can I check if they are being utilized on all workers,
> and how can I get all workers to utilize the extra volumes for hdfs.
> I do not have experience using hadoop directly, only through spark.
> 
> thanks
> Daniel
> 
> 


Mime
View raw message