spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enno Shioji <eshi...@gmail.com>
Subject Re: ReceiverInputDStream#saveAsTextFiles with a S3 URL results in double forward slash key names in S3
Date Tue, 23 Dec 2014 14:29:50 GMT
ᐧ
I filed a new issue HADOOP-11444. According to HADOOP-10372, s3 is likely
to be deprecated anyway in favor of s3n.
Also the comment section notes that Amazon has implemented an EmrFileSystem
for S3 which is built using AWS SDK rather than JetS3t.




On Tue, Dec 23, 2014 at 2:06 PM, Enno Shioji <eshioji@gmail.com> wrote:

> Hey Jay :)
>
> I tried "s3n" which uses the Jets3tNativeFileSystemStore, and the double
> slash went away.
> As far as I can see, it does look like a bug in hadoop-common; I'll file a
> ticket for it.
>
> Hope you are doing well, by the way!
>
> PS:
>  Jets3tNativeFileSystemStore's implementation of pathToKey is:
> ======
>   private static String pathToKey(Path path) {
>     if (path.toUri().getScheme() != null &&
> path.toUri().getPath().isEmpty()) {
>       // allow uris without trailing slash after bucket to refer to root,
>       // like s3n://mybucket
>       return "";
>     }
>     if (!path.isAbsolute()) {
>       throw new IllegalArgumentException("Path must be absolute: " + path);
>     }
>     String ret = path.toUri().getPath().substring(1); // remove initial
> slash
>     if (ret.endsWith("/") && (ret.indexOf("/") != ret.length() - 1)) {
>       ret = ret.substring(0, ret.length() -1);
>   }
>     return ret;
>   }
> ======
>
> whereas Jets3tFileSystemStore uses:
> ======
>   private String pathToKey(Path path) {
>     if (!path.isAbsolute()) {
>       throw new IllegalArgumentException("Path must be absolute: " + path);
>     }
>     return path.toUri().getPath();
>   }
> ======
>
>
>
>
>
>
> On Tue, Dec 23, 2014 at 1:07 PM, Jay Vyas <jayunit100.apache@gmail.com>
> wrote:
>
>> Hi enno.  Might be worthwhile to cross post this on dev@hadoop...
>> Obviously a simple spark way to test this would be to change the uri to
>> write to hdfs:// or maybe you could do file:// , and confirm that the extra
>> slash goes away.
>>
>> - if it's indeed a jets3t issue we should add a new unit test for this if
>> the hcfs tests are passing for jets3tfilesystem, yet this error still
>> exists.
>>
>> - To learn how to run HCFS tests against any FileSystem , see the wiki
>> page : https://wiki.apache.org/hadoop/HCFS/Progress (see the July 14th
>> entry on that page).
>>
>> - Is there another S3FileSystem implementation for AbstractFileSystem or
>> is jets3t the only one?  That would be a easy  way to test this. And also a
>> good workaround.
>>
>> I'm wondering, also why jets3tfilesystem is the AbstractFileSystem used
>> by so many - is that the standard impl for storing using AbstractFileSystem
>> interface?
>>
>> On Dec 23, 2014, at 6:06 AM, Enno Shioji <eshioji@gmail.com> wrote:
>>
>> Is anybody experiencing this? It looks like a bug in JetS3t to me, but
>> thought I'd sanity check before filing an issue.
>>
>>
>> ================
>> I'm writing to S3 using ReceiverInputDStream#saveAsTextFiles with a S3
>> URL ("s3://fake-test/1234").
>>
>> The code does write to S3, but with double forward slashes (e.g.
>> "s3://fake-test//1234/-1419334280000/".
>>
>> I did a debug and it seem like the culprit is
>> Jets3tFileSystemStore#pathToKey(path), which returns "/fake-test/1234/..."
>> for the input "s3://fake-test/1234/...". when it should hack off the first
>> forward slash. However, I couldn't find any bug report for JetS3t for this.
>>
>> Am I missing something, or is this likely a JetS3t bug?
>> ================
>>
>>
>>
>

Mime
View raw message