spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sunny Khatri <sunny.k...@gmail.com>
Subject Re: Cannot read from s3 using "sc.textFile"
Date Tue, 07 Oct 2014 16:51:36 GMT
Not sure if it's supposed to work. Can you try newAPIHadoopFile() passing
in the required configuration object.

On Tue, Oct 7, 2014 at 4:20 AM, Tomer Benyamini <tomer.ben@gmail.com> wrote:

> Hello,
>
> I'm trying to read from s3 using a simple spark java app:
>
> ---------------------
>
> SparkConf sparkConf = new SparkConf().setAppName("TestApp");
> sparkConf.setMaster("local");
> JavaSparkContext sc = new JavaSparkContext(sparkConf);
> sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XXXXXX");
> sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "XXXXXX");
>
> String path = "s3://bucket/test/testdata";
> JavaRDD<String> textFile = sc.textFile(path);
> System.out.println(textFile.count());
>
> ---------------------
> But getting this error:
>
> org.apache.hadoop.mapred.InvalidInputException: Input path does not
> exist: s3://bucket/test/testdata
> at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
> at org.apache.spark.rdd.RDD.count(RDD.scala:861)
> at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:365)
> at org.apache.spark.api.java.JavaRDD.count(JavaRDD.scala:29)
> ....
>
> Looking at the debug log I see that
> org.jets3t.service.impl.rest.httpclient.RestS3Service returned 404
> error trying to locate the file.
>
> Using a simple java program with
> com.amazonaws.services.s3.AmazonS3Client works just fine.
>
> Any idea?
>
> Thanks,
> Tomer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message