spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuti Awasthi <stutiawas...@hcl.com>
Subject Not able to write output to local filsystem from Standalone mode.
Date Tue, 24 May 2016 09:26:50 GMT
Hi All,
I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2 Slaves. Also Im not having
Hadoop as filesystem . Now, Im able to launch shell , read the input file from local filesystem
and perform transformation successfully. When I try to write my output in local filesystem
path then I receive below error .

I tried to search on web and found similar Jira : https://issues.apache.org/jira/browse/SPARK-2984
. Even though it shows resolved for Spark 1.3+ but already people have posted the same issue
still persists in latest versions.

ERROR
scala> data.saveAsTextFile("/home/stuti/test1")
16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, server1): java.io.IOException:
The temporary job-output directory file:/home/stuti/test1/_temporary doesn't exist!
        at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
        at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
        at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
        at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

What is the best way to resolve this issue if suppose I don't want to have Hadoop installed
OR is it mandatory to have Hadoop to write the output from Standalone cluster mode.

Please suggest.

Thanks &Regards
Stuti Awasthi



::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named
recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted,
corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e
mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator
or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may
not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying,
disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized
representative of
HCL is strictly prohibited. If you have received this email in error please delete it and
notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------

Mime
View raw message