spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Ningjun (LNG-NPV)" <ningjun.w...@lexisnexis.com>
Subject RE: sc.textFile() on windows cannot access UNC path
Date Tue, 10 Mar 2015 14:07:48 GMT
This sounds like the right approach. Is there any sample code showing how to use sc.newAPIHadoopFile
 ? I am new to Spark and don’t know much about Hadoop. I just want to read a text file from
UNC path into an RDD.

Thanks


From: Akhil Das [mailto:akhil@sigmoidanalytics.com]
Sent: Tuesday, March 10, 2015 9:14 AM
To: java8964
Cc: Wang, Ningjun (LNG-NPV); user@spark.apache.org
Subject: Re: sc.textFile() on windows cannot access UNC path

You can create your own Input Reader (using java.nio.*) and pass it to the sc.newAPIHadoopFile
while reading.


Thanks
Best Regards

On Tue, Mar 10, 2015 at 6:28 PM, java8964 <java8964@hotmail.com<mailto:java8964@hotmail.com>>
wrote:
I think the work around is clear.

Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path.

Yong
________________________________
From: ningjun.wang@lexisnexis.com<mailto:ningjun.wang@lexisnexis.com>
To: java8964@hotmail.com<mailto:java8964@hotmail.com>; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: sc.textFile() on windows cannot access UNC path
Date: Tue, 10 Mar 2015 03:02:37 +0000


Hi Yong



Thanks for the reply. Yes it works with local drive letter. But I really need to use UNC path
because the path is input from at runtime. I cannot dynamically assign a drive letter to arbitrary
UNC path at runtime.



Is there any work around that I can use UNC path for sc.textFile(…)?





Ningjun





From: java8964 [mailto:java8964@hotmail.com<mailto:java8964@hotmail.com>]
Sent: Monday, March 09, 2015 5:33 PM
To: Wang, Ningjun (LNG-NPV); user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: sc.textFile() on windows cannot access UNC path



This is a Java problem, not really Spark.



From this page: http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u



You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path class in Hadoop
will use java.io.*, instead of java.nio.



You need to manually mount your windows remote share a local driver, like "Z:", then it should
work.



Yong

________________________________

From: ningjun.wang@lexisnexis.com<mailto:ningjun.wang@lexisnexis.com>
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: sc.textFile() on windows cannot access UNC path
Date: Mon, 9 Mar 2015 21:09:38 +0000

I am running Spark on windows 2008 R2. I use sc.textFile() to load text file  using UNC path,
it does not work.



sc.textFile(raw"file:////10.196.119.230/folder1/abc.txt<file:///\\10.196.119.230\folder1\abc.txt>",
4).count()



Input path does not exist: file:/10.196.119.230/folder1/abc.txt<http://10.196.119.230/folder1/abc.txt>

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/10.196.119.230/tar/Enron/enron-207-short.load<http://10.196.119.230/tar/Enron/enron-207-short.load>

            at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)

            at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)

            at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)

            at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

            at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

            at scala.Option.getOrElse(Option.scala:120)

            at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)

            at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

            at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

            at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

            at scala.Option.getOrElse(Option.scala:120)

            at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)

            at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)

            at org.apache.spark.rdd.RDD.count(RDD.scala:910)

            at ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104)

            at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)

            at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)

            at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)

            at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)

            at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)

            at org.scalatest.Transformer.apply(Transformer.scala:22)

            at org.scalatest.Transformer.apply(Transformer.scala:20)

            at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)

            at org.scalatest.Suite$class.withFixture(Suite.scala:1122)

            at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)

            at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)

            at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)

            at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)

            at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)

            at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)

            at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)

            at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)

            at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)

            at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)

            at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)

            at scala.collection.immutable.List.foreach(List.scala:318)

            at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)

            at org.scalatest.SuperEngine.org<http://org.scalatest.SuperEngine.org>$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)

            at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)

            at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)

            at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)

            at org.scalatest.Suite$class.run(Suite.scala:1424)

            at org.scalatest.FunSuite.org<http://org.scalatest.FunSuite.org>$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)

            at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)

            at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)

            at org.scalatest.SuperEngine.runImpl(Engine.scala:545)

            at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)

            at ltn.analytics.tests.IndexTest.org<http://ltn.analytics.tests.IndexTest.org>$scalatest$BeforeAndAfterAll$$super$run(IndexTest.scala:15)

            at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)

            at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)

            at ltn.analytics.tests.IndexTest.run(IndexTest.scala:15)

            at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)

            at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)

            at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)

            at scala.collection.immutable.List.foreach(List.scala:318)

            at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)

            at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)

            at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)

            at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)

            at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)

            at org.scalatest.tools.Runner$.run(Runner.scala:883)

            at org.scalatest.tools.Runner.run(Runner.scala)

            at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:137)

            at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)

            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

            at java.lang.reflect.Method.invoke(Method.java:606)

            at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)





The path is correct, I can open windows explorer enter the following path to open the text
file

\\10.196.119.230\folder1\abc.txt<file:///\\10.196.119.230\folder1\abc.txt>



I have tried to use 3 slah, 2 slah, and always got the same error



sc.textFile(raw"file:///10.196.119.230/folder1/abc.txt<file:///\\10.196.119.230\folder1\abc.txt>",
4).count()

sc.textFile(raw"file://10.196.119.230/folder1/abc.txt<file:///\\10.196.119.230\folder1\abc.txt>",
4).count()



Please advise.

Ningjun

Mime
View raw message