spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject getting Caused by: org.apache.spark.SparkException: Job failed: Task 1.0:1 failed more than 4 times
Date Wed, 23 Oct 2013 21:02:25 GMT
I have spark 0.8.0 running in cluster with 2 workers each setup with 16 cores and 24GB memory
against hadoop 1.2.1

I have csv with over 1 million records.

My spark jave program runs as expected with smaller size csv but fails as follows:

Loading csv as text works
              JavaRDD<String> rawTable = sc.textFile(raw_file_path).cache();

Then apply map works
              JavaRDD<String> col_values =
                           new Function<String, String>() {
                                  private static final long serialVersionUID = 1L;

                                  public String call(String line) throws Exception {
                                         return FileUtil.extractValueForAPositionNo(line,

Then getting distinct works also
                                  JavaRDD<String> distinct_col_values = col_values.distinct().cache();

But to dump the content of the distinct into an List of String object .. fails
                                  List<String> list = distinct_col_values.collect();

Any help?

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
        at java.lang.reflect.Method.invoke(
        at org.codehaus.mojo.exec.ExecJavaMojo$
Caused by: org.apache.spark.SparkException: Job failed: Task 1.0:1 failed more than 4 times
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
        at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
        at org.apache.spark.scheduler.DAGScheduler$$anon$

View raw message