spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramkumar Chokkalingam <ramkumar...@gmail.com>
Subject Job hangs for larger input size
Date Mon, 14 Oct 2013 19:30:08 GMT
Hello Group,

I'm using Spark 0.8.0 with Scala 2.9.3.

Two issues - >

*1)  Job hangs when the number of files increases > 4000 *

First, I was using "local" as an argument for for the Master URL like here,

*val sc = new SparkContext("local", "AnonApp", "/usr/local/spark/")
*
*// Read all files in a directory *
*var t = sc.textFile(fileName)*
*t. map(each_line => some_functions(each_line)).saveAsTextFile("/output/" +
filename)*

My job runs fine for sample inputs (~20 files ) but when the number of
input files  increases (~7000 files ) the program execution stops around
4000 files [once hanged at 4200 files and once at 4216 files ].

This is my console ,

"13/10/14 11:54:08 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_201310141154_0000_m_000000_4212' to
file:/usr/local/spark/ram_examples/AnonApp/sample-ANON/MSC/06/MAZ0320111206074848911831.cdr.gz
13/10/14 11:54:08 INFO mapred.FileInputFormat: Total input paths to process
: 1
13/10/14 11:54:08 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_201310141154_0000_m_000000_4213' to
file:/usr/local/spark/ram_examples/AnonApp/sample-ANON/MSC/06/KBL0320111206154040475980.cdr.gz
13/10/14 11:54:08 INFO mapred.FileInputFormat: Total input paths to process
: 1"
__<Control waits here>

When the job hangs, I checked the output folder , the _temporary file is
created but I'm not sure why the program hangs there. The control
stops/waits like this,

I saw one post on user group and it suggested me to increase my
*ulimit*(on number of open files) - but my ulimit is already set to
unlimited.

2 ) When I change the Master URI to local[2], where I have 2 cores.

My earlier said works fine for sample inputs of 20 files.  But the same
program when changed from local to local[2] in SparkContext, hangs in the
same fashion like the one shown above. While making the change (local ->
local[2]) am I expected to make any other change ?


Is there any pattern between both these failures ? Apart from the console
logs ? Is there a place where I can see the logs to understand what is
going on when the program hangs ?



Regards,

Ram.

Mime
View raw message