spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yong Zhang <java8...@hotmail.com>
Subject RE: Not able to write output to local filsystem from Standalone mode.
Date Fri, 27 May 2016 18:29:52 GMT
I am not familiar with that particular piece of code. But the spark's concurrency comes from
Multi-thread. One executor will use multi threads to process tasks, and these tasks share
the JVM memory of the executor. So it won't be surprised that Spark needs some blocking/sync
for the memory some places.
Yong

> Date: Fri, 27 May 2016 20:21:46 +0200
> Subject: Re: Not able to write output to local filsystem from Standalone mode.
> From: jacek@japila.pl
> To: java8964@hotmail.com
> CC: mathieu@closetwork.org; stutiawasthi@hcl.com; user@spark.apache.org
> 
> Hi Yong,
> 
> It makes sense...almost. :) I'm not sure how relevant it is, but just
> today was reviewing BlockInfoManager code with the locks for reading
> and writing, and my understanding of the code shows that Spark if fine
> when there are multiple attempts for writes of new memory blocks
> (pages) with a mere synchronized code block. See
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala#L324-L325
> 
> With that, it's not that simple to say "that just makes sense".
> 
> p.s. The more I know the less things "just make sense to me".
> 
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
> 
> 
> On Fri, May 27, 2016 at 3:42 AM, Yong Zhang <java8964@hotmail.com> wrote:
> > That just makes sense, doesn't it?
> >
> > The only place will be driver. If not, the executor will be having
> > contention by whom should create the directory in this case.
> >
> > Only the coordinator (driver in this case) is the best place for doing it.
> >
> > Yong
> >
> > ________________________________
> > From: mathieu@closetwork.org
> > Date: Wed, 25 May 2016 18:23:02 +0000
> > Subject: Re: Not able to write output to local filsystem from Standalone
> > mode.
> > To: jacek@japila.pl
> > CC: stutiawasthi@hcl.com; user@spark.apache.org
> >
> >
> > Experience. I don't use Mesos or Yarn or Hadoop, so I don't know.
> >
> >
> > On Wed, May 25, 2016 at 2:51 AM Jacek Laskowski <jacek@japila.pl> wrote:
> >
> > Hi Mathieu,
> >
> > Thanks a lot for the answer! I did *not* know it's the driver to
> > create the directory.
> >
> > You said "standalone mode", is this the case for the other modes -
> > yarn and mesos?
> >
> > p.s. Did you find it in the code or...just experienced before? #curious
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Tue, May 24, 2016 at 4:04 PM, Mathieu Longtin <mathieu@closetwork.org>
> > wrote:
> >> In standalone mode, executor assume they have access to a shared file
> >> system. The driver creates the directory and the executor write files, so
> >> the executors end up not writing anything since there is no local
> >> directory.
> >>
> >> On Tue, May 24, 2016 at 8:01 AM Stuti Awasthi <stutiawasthi@hcl.com>
> >> wrote:
> >>>
> >>> hi Jacek,
> >>>
> >>> Parent directory already present, its my home directory. Im using Linux
> >>> (Redhat) machine 64 bit.
> >>> Also I noticed that "test1" folder is created in my master with
> >>> subdirectory as "_temporary" which is empty. but on slaves, no such
> >>> directory is created under /home/stuti.
> >>>
> >>> Thanks
> >>> Stuti
> >>> ________________________________
> >>> From: Jacek Laskowski [jacek@japila.pl]
> >>> Sent: Tuesday, May 24, 2016 5:27 PM
> >>> To: Stuti Awasthi
> >>> Cc: user
> >>> Subject: Re: Not able to write output to local filsystem from Standalone
> >>> mode.
> >>>
> >>> Hi,
> >>>
> >>> What happens when you create the parent directory /home/stuti? I think
> >>> the
> >>> failure is due to missing parent directories. What's the OS?
> >>>
> >>> Jacek
> >>>
> >>> On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawasthi@hcl.com>
wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2
> >>> Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch
> >>> shell , read the input file from local filesystem and perform
> >>> transformation
> >>> successfully. When I try to write my output in local filesystem path then
> >>> I
> >>> receive below error .
> >>>
> >>>
> >>>
> >>> I tried to search on web and found similar Jira :
> >>> https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows
> >>> resolved for Spark 1.3+ but already people have posted the same issue
> >>> still
> >>> persists in latest versions.
> >>>
> >>>
> >>>
> >>> ERROR
> >>>
> >>> scala> data.saveAsTextFile("/home/stuti/test1")
> >>>
> >>> 16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2,
> >>> server1): java.io.IOException: The temporary job-output directory
> >>> file:/home/stuti/test1/_temporary doesn't exist!
> >>>
> >>>         at
> >>>
> >>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >>>
> >>>         at
> >>>
> >>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
> >>>
> >>>         at
> >>>
> >>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> >>>
> >>>         at
> >>> org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
> >>>
> >>>         at
> >>>
> >>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
> >>>
> >>>         at
> >>>
> >>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
> >>>
> >>>         at
> >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> >>>
> >>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
> >>>
> >>>         at
> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> >>>
> >>>         at
> >>>
> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>>
> >>>         at
> >>>
> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>>
> >>>         at java.lang.Thread.run(Thread.java:745)
> >>>
> >>>
> >>>
> >>> What is the best way to resolve this issue if suppose I don’t want to
> >>> have
> >>> Hadoop installed OR is it mandatory to have Hadoop to write the output
> >>> from
> >>> Standalone cluster mode.
> >>>
> >>>
> >>>
> >>> Please suggest.
> >>>
> >>>
> >>>
> >>> Thanks &Regards
> >>>
> >>> Stuti Awasthi
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ::DISCLAIMER::
> >>>
> >>>
> >>> ----------------------------------------------------------------------------------------------------------------------------------------------------
> >>>
> >>> The contents of this e-mail and any attachment(s) are confidential and
> >>> intended for the named recipient(s) only.
> >>> E-mail transmission is not guaranteed to be secure or error-free as
> >>> information could be intercepted, corrupted,
> >>> lost, destroyed, arrive late or incomplete, or may contain viruses in
> >>> transmission. The e mail and its contents
> >>> (with or without referred errors) shall therefore not attach any
> >>> liability
> >>> on the originator or HCL or its affiliates.
> >>> Views or opinions, if any, presented in this email are solely those of
> >>> the
> >>> author and may not necessarily reflect the
> >>> views or opinions of HCL or its affiliates. Any form of reproduction,
> >>> dissemination, copying, disclosure, modification,
> >>> distribution and / or publication of this message without the prior
> >>> written consent of authorized representative of
> >>> HCL is strictly prohibited. If you have received this email in error
> >>> please delete it and notify the sender immediately.
> >>> Before opening any email and/or attachments, please check them for
> >>> viruses
> >>> and other defects.
> >>>
> >>>
> >>>
> >>> ----------------------------------------------------------------------------------------------------------------------------------------------------
> >>>
> >>> --------------------------------------------------------------------- To
> >>> unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional
> >>> commands, e-mail: user-help@spark.apache.org
> >>
> >> --
> >> Mathieu Longtin
> >> 1-514-803-8977
> >
> > --
> > Mathieu Longtin
> > 1-514-803-8977
 		 	   		  
Mime
View raw message