spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Ningjun (LNG-NPV)" <ningjun.w...@lexisnexis.com>
Subject RE: sc.textFile() on windows cannot access UNC path
Date Wed, 11 Mar 2015 14:15:24 GMT
Thanks for the reference. Is the following procedure correct?

1.            Copy of the Hadoop source code org.apache.hadoop.mapreduce.lib.input .TextInputFormat.java
as my own class, e.g. UncTextInputFormat.java
2.            Modify UncTextInputFormat.java to handle UNC path
3.            Call sc.newAPIHadoopFile(…) with

sc.newAPIHadoopFile[LongWritable, Text, UncTextInputFormat](“file:////10.196.119.230/folder1/abc.txt”,
         classOf[UncTextInputFormat],
         classOf[LongWritable],
        classOf[Text], conf)

Ningjun

From: Akhil Das [mailto:akhil@sigmoidanalytics.com]
Sent: Wednesday, March 11, 2015 2:40 AM
To: Wang, Ningjun (LNG-NPV)
Cc: java8964; user@spark.apache.org
Subject: Re: sc.textFile() on windows cannot access UNC path

​​
I don't have a complete example for your usecase, but you can see a lot of codes showing how
to use new APIHadoopFile from here<https://github.com/search?q=sc.newAPIHadoopFile&type=Code&utf8=%E2%9C%93>

Thanks
Best Regards

On Tue, Mar 10, 2015 at 7:37 PM, Wang, Ningjun (LNG-NPV) <ningjun.wang@lexisnexis.com<mailto:ningjun.wang@lexisnexis.com>>
wrote:
This sounds like the right approach. Is there any sample code showing how to use sc.newAPIHadoopFile
 ? I am new to Spark and don’t know much about Hadoop. I just want to read a text file from
UNC path into an RDD.

Thanks


From: Akhil Das [mailto:akhil@sigmoidanalytics.com<mailto:akhil@sigmoidanalytics.com>]
Sent: Tuesday, March 10, 2015 9:14 AM
To: java8964
Cc: Wang, Ningjun (LNG-NPV); user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: sc.textFile() on windows cannot access UNC path

You can create your own Input Reader (using java.nio.*) and pass it to the sc.newAPIHadoopFile
while reading.


Thanks
Best Regards

On Tue, Mar 10, 2015 at 6:28 PM, java8964 <java8964@hotmail.com<mailto:java8964@hotmail.com>>
wrote:
I think the work around is clear.

Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path.

Yong
________________________________
From: ningjun.wang@lexisnexis.com<mailto:ningjun.wang@lexisnexis.com>
To: java8964@hotmail.com<mailto:java8964@hotmail.com>; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: sc.textFile() on windows cannot access UNC path
Date: Tue, 10 Mar 2015 03:02:37 +0000


Hi Yong



Thanks for the reply. Yes it works with local drive letter. But I really need to use UNC path
because the path is input from at runtime. I cannot dynamically assign a drive letter to arbitrary
UNC path at runtime.



Is there any work around that I can use UNC path for sc.textFile(…)?





Ningjun





From: java8964 [mailto:java8964@hotmail.com<mailto:java8964@hotmail.com>]
Sent: Monday, March 09, 2015 5:33 PM
To: Wang, Ningjun (LNG-NPV); user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: sc.textFile() on windows cannot access UNC path



This is a Java problem, not really Spark.



From this page: http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u



You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path class in Hadoop
will use java.io.*, instead of java.nio.



You need to manually mount your windows remote share a local driver, like "Z:", then it should
work.



Yong

________________________________

From: ningjun.wang@lexisnexis.com<mailto:ningjun.wang@lexisnexis.com>
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: sc.textFile() on windows cannot access UNC path
Date: Mon, 9 Mar 2015 21:09:38 +0000

I am running Spark on windows 2008 R2. I use sc.textFile() to load text file  using UNC path,
it does not work.



sc.textFile(raw"file:////10.196.119.230/folder1/abc.txt<file:///\\10.196.119.230\folder1\abc.txt>",
4).count()



Input path does not exist: file:/10.196.119.230/folder1/abc.txt<http://10.196.119.230/folder1/abc.txt>

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/10.196.119.230/tar/Enron/enron-207-short.load<http://10.196.119.230/tar/Enron/enron-207-short.load>

            at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)

            at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)

            at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)

            at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

            at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

            at scala.Option.getOrElse(Option.scala:120)

            at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)

            at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

            at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)

            at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)

            at scala.Option.getOrElse(Option.scala:120)

            at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)

            at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)

            at org.apache.spark.rdd.RDD.count(RDD.scala:910)

            at ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104)

            at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)

            at ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)

            at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)

            at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)

            at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)

            at org.scalatest.Transformer.apply(Transformer.scala:22)

            at org.scalatest.Transformer.apply(Transformer.scala:20)

            at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)

            at org.scalatest.Suite$class.withFixture(Suite.scala:1122)

            at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)

            at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)

            at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)

            at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)

            at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)

            at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)

            at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)

            at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)

            at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)

            at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)

            at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)

            at scala.collection.immutable.List.foreach(List.scala:318)

            at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)

            at org.scalatest.SuperEngine.org<http://org.scalatest.SuperEngine.org>$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)

            at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)

            at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)

            at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)

            at org.scalatest.Suite$class.run(Suite.scala:1424)

            at org.scalatest.FunSuite.org<http://org.scalatest.FunSuite.org>$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)

            at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)

            at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)

            at org.scalatest.SuperEngine.runImpl(Engine.scala:545)

            at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)

            at ltn.analytics.tests.IndexTest.org<http://ltn.analytics.tests.IndexTest.org>$scalatest$BeforeAndAfterAll$$super$run(IndexTest.scala:15)

            at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)

            at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)

            at ltn.analytics.tests.IndexTest.run(IndexTest.scala:15)

            at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)

            at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)

            at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)

            at scala.collection.immutable.List.foreach(List.scala:318)

            at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)

            at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)

            at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)

            at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)

            at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)

            at org.scalatest.tools.Runner$.run(Runner.scala:883)

            at org.scalatest.tools.Runner.run(Runner.scala)

            at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:137)

            at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)

            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

            at java.lang.reflect.Method.invoke(Method.java:606)

            at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)





The path is correct, I can open windows explorer enter the following path to open the text
file

\\10.196.119.230\folder1\abc.txt<file:///\\10.196.119.230\folder1\abc.txt>



I have tried to use 3 slah, 2 slah, and always got the same error



sc.textFile(raw"file:///10.196.119.230/folder1/abc.txt<file:///\\10.196.119.230\folder1\abc.txt>",
4).count()

sc.textFile(raw"file://10.196.119.230/folder1/abc.txt<file:///\\10.196.119.230\folder1\abc.txt>",
4).count()



Please advise.

Ningjun


Mime
View raw message