From user-return-11802-apmail-spark-user-archive=spark.apache.org@spark.apache.org Wed Jul 16 06:36:42 2014 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3BCA111C05 for ; Wed, 16 Jul 2014 06:36:42 +0000 (UTC) Received: (qmail 73438 invoked by uid 500); 16 Jul 2014 06:36:41 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 73376 invoked by uid 500); 16 Jul 2014 06:36:41 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@spark.apache.org Delivered-To: mailing list user@spark.apache.org Received: (qmail 73366 invoked by uid 99); 16 Jul 2014 06:36:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jul 2014 06:36:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chris.dubois@gmail.com designates 209.85.213.47 as permitted sender) Received: from [209.85.213.47] (HELO mail-yh0-f47.google.com) (209.85.213.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jul 2014 06:36:35 +0000 Received: by mail-yh0-f47.google.com with SMTP id f10so250136yha.6 for ; Tue, 15 Jul 2014 23:36:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=5htDDAo5LFa9jy0cHD/rZfdOqoUVEZOR+peH0jGRuNo=; b=hBgZTay7/8Ui4OIWhtwETC0wnBnh51nXomb7yjaapEd7NcJElaifYZV4ikxjkqVspH FOHAuriggvBLSIDcuM23rTtZiH2znQhwzJUrauzpcKz04DBVqPhS3cJyf8ui9zURSatp qrJH1NaMWaX5HelTxvs2yB2bgUIs7ere4F0vwRvucZ0rbQNXJFyvOu+RnJZ7LuC0s02u ulh7MJczKYgUI47rjFje1z15ktSqcbZOCJbDA8Zt2e+ae9f3rCXDDV/9sCy8gSK+Afy/ yZCRd+yV9JS0imcK+AWL1gwvuOvANfSUb7gHEdF6UDeoWOHGRkgZruPAzgVJzDF2cif0 bqNg== X-Received: by 10.236.131.193 with SMTP id m41mr48506523yhi.96.1405492575056; Tue, 15 Jul 2014 23:36:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.68.146 with HTTP; Tue, 15 Jul 2014 23:35:54 -0700 (PDT) From: Chris DuBois Date: Tue, 15 Jul 2014 23:35:54 -0700 Message-ID: Subject: Error: No space left on device To: user@spark.apache.org Content-Type: multipart/alternative; boundary=20cf3003bda2abb9a304fe49bd49 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3003bda2abb9a304fe49bd49 Content-Type: text/plain; charset=UTF-8 Hi all, I am encountering the following error: INFO scheduler.TaskSetManager: Loss was due to java.io.IOException: No space left on device [duplicate 4] For each slave, df -h looks roughtly like this, which makes the above error surprising. Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 4.4G 3.5G 57% / tmpfs 7.4G 4.0K 7.4G 1% /dev/shm /dev/xvdb 37G 3.3G 32G 10% /mnt /dev/xvdf 37G 2.0G 34G 6% /mnt2 /dev/xvdv 500G 33M 500G 1% /vol I'm on an EC2 cluster (c3.xlarge + 5 x m3) that I launched using the spark-ec2 scripts and a clone of spark from today. The job I am running closely resembles the collaborative filtering example . This issue happens with the 1M version as well as the 10 million rating version of the MovieLens dataset. I have seen previous questions , but they haven't helped yet. For example, I tried setting the Spark tmp directory to the EBS volume at /vol/, both by editing the spark conf file (and copy-dir'ing it to the slaves) as well as through the SparkConf. Yet I still get the above error. Here is my current Spark config below. Note that I'm launching via ~/spark/bin/spark-submit. conf = SparkConf() conf.setAppName("RecommendALS").set("spark.local.dir", "/vol/").set("spark.executor.memory", "7g").set("spark.akka.frameSize", "100").setExecutorEnv("SPARK_JAVA_OPTS", " -Dspark.akka.frameSize=100") sc = SparkContext(conf=conf) Thanks for any advice, Chris --20cf3003bda2abb9a304fe49bd49 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,

I am encountering the following= error:

INFO scheduler.TaskSetManager: Loss was du= e to java.io.IOException: No space left on device [duplicate 4]

For each slave, df -h looks roughtly like this, which makes = the above error surprising.

Filesystem = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Size =C2=A0Used Avail Use% Mounted= on
/dev/xvda1 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A07.9G =C2= =A04.4G =C2=A03.5G =C2=A057% /
tmpfs =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 7.4G =C2= =A04.0K =C2=A07.4G =C2=A0 1% /dev/shm
/dev/xvdb =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A037G =C2=A03.3G =C2=A0 32G =C2=A010% /mnt
/dev/xvdf =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A037G =C2=A0= 2.0G =C2=A0 34G =C2=A0 6% /mnt2
/dev/xvdv =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 500G =C2=A0 33M =C2=A0500G =C2=A0 1% /vol

I'm on an EC2 cluster (c3.xlarge + 5 x m= 3) that I launched using the spark-ec2 scripts and a clone of spark from to= day. The job I am running closely resembles the collaborative filt= ering example. This issue happens with the 1M version as well as the 10= million rating version of the MovieLens dataset.

I have seen=C2=A0previous=C2=A0questions, but they haven't helped = yet. For example, I tried setting the Spark tmp directory to the EBS volume= at /vol/, both by editing the spark conf file (and copy-dir'ing it to = the slaves) as well as through the SparkConf. Yet I still get the above err= or.=C2=A0Here is my current Spark config below. Note that I'm launching= via ~/spark/bin/spark-submit.

conf =3D SparkConf()
conf.setAppNa= me("RecommendALS").set("spark.local.dir", "/vol/&q= uot;).set("spark.executor.memory", "7g").set("spar= k.akka.frameSize", "100").setExecutorEnv("SPARK_JAVA_OP= TS", " -Dspark.akka.frameSize=3D100")
sc =3D SparkContext(conf=3Dconf)

Thanks= for any advice,
Chris

--20cf3003bda2abb9a304fe49bd49--