spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: Error: No space left on device
Date Wed, 16 Jul 2014 07:22:07 GMT
Hi Chris,

Could you also try `df -i` on the master node? How many
blocks/partitions did you set?

In the current implementation, ALS doesn't clean the shuffle data
because the operations are chained together. But it shouldn't run out
of disk space on the MovieLens dataset, which is small. spark-ec2
script sets /mnt/spark and /mnt/spark2 as the local.dir by default, I
would recommend leaving this setting as the default value.

Best,
Xiangrui

On Wed, Jul 16, 2014 at 12:02 AM, Chris DuBois <chris.dubois@gmail.com> wrote:
> Thanks for the quick responses!
>
> I used your final -Dspark.local.dir suggestion, but I see this during the
> initialization of the application:
>
> 14/07/16 06:56:08 INFO storage.DiskBlockManager: Created local directory at
> /vol/spark-local-20140716065608-7b2a
>
> I would have expected something in /mnt/spark/.
>
> Thanks,
> Chris
>
>
>
> On Tue, Jul 15, 2014 at 11:44 PM, Chris Gore <cdgore@cdgore.com> wrote:
>>
>> Hi Chris,
>>
>> I've encountered this error when running Spark’s ALS methods too.  In my
>> case, it was because I set spark.local.dir improperly, and every time there
>> was a shuffle, it would spill many GB of data onto the local drive.  What
>> fixed it was setting it to use the /mnt directory, where a network drive is
>> mounted.  For example, setting an environmental variable:
>>
>> export SPACE=$(mount | grep mnt | awk '{print $3"/spark/"}' | xargs | sed
>> 's/ /,/g’)
>>
>> Then adding -Dspark.local.dir=$SPACE or simply
>> -Dspark.local.dir=/mnt/spark/,/mnt2/spark/ when you run your driver
>> application
>>
>> Chris
>>
>> On Jul 15, 2014, at 11:39 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
>>
>> > Check the number of inodes (df -i). The assembly build may create many
>> > small files. -Xiangrui
>> >
>> > On Tue, Jul 15, 2014 at 11:35 PM, Chris DuBois <chris.dubois@gmail.com>
>> > wrote:
>> >> Hi all,
>> >>
>> >> I am encountering the following error:
>> >>
>> >> INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No
>> >> space
>> >> left on device [duplicate 4]
>> >>
>> >> For each slave, df -h looks roughtly like this, which makes the above
>> >> error
>> >> surprising.
>> >>
>> >> Filesystem            Size  Used Avail Use% Mounted on
>> >> /dev/xvda1            7.9G  4.4G  3.5G  57% /
>> >> tmpfs                 7.4G  4.0K  7.4G   1% /dev/shm
>> >> /dev/xvdb              37G  3.3G   32G  10% /mnt
>> >> /dev/xvdf              37G  2.0G   34G   6% /mnt2
>> >> /dev/xvdv             500G   33M  500G   1% /vol
>> >>
>> >> I'm on an EC2 cluster (c3.xlarge + 5 x m3) that I launched using the
>> >> spark-ec2 scripts and a clone of spark from today. The job I am running
>> >> closely resembles the collaborative filtering example. This issue
>> >> happens
>> >> with the 1M version as well as the 10 million rating version of the
>> >> MovieLens dataset.
>> >>
>> >> I have seen previous questions, but they haven't helped yet. For
>> >> example, I
>> >> tried setting the Spark tmp directory to the EBS volume at /vol/, both
>> >> by
>> >> editing the spark conf file (and copy-dir'ing it to the slaves) as well
>> >> as
>> >> through the SparkConf. Yet I still get the above error. Here is my
>> >> current
>> >> Spark config below. Note that I'm launching via
>> >> ~/spark/bin/spark-submit.
>> >>
>> >> conf = SparkConf()
>> >> conf.setAppName("RecommendALS").set("spark.local.dir",
>> >> "/vol/").set("spark.executor.memory", "7g").set("spark.akka.frameSize",
>> >> "100").setExecutorEnv("SPARK_JAVA_OPTS", " -Dspark.akka.frameSize=100")
>> >> sc = SparkContext(conf=conf)
>> >>
>> >> Thanks for any advice,
>> >> Chris
>> >>
>>
>

Mime
View raw message