hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Completing a bulk load from HFiles stored in S3
Date Thu, 31 Oct 2019 16:25:24 GMT
Short answer: no, it will not work and you need to copy it to HDFS first.

IIRC, the bulk load code is ultimately calling a filesystem rename from 
the path you provided to the proper location in the hbase.rootdir's 
filesystem. I don't believe that an `fs.rename` is going to work across 
filesystems because you can't do this atomically, which HDFS guarantees 
for the rename method [1]

Additionally, for Kerberos-secured clusters, the server-side bulk load 
logic expects that the filesystem hosting your hfiles is HDFS (in order 
to read the files with the appropriate authentication). This fails right 
now, but is something our PeterS is looking at.

[1] 
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29

On 10/31/19 6:55 AM, Wellington Chevreuil wrote:
> I believe you can specify your s3 path for the hfiles directly, as hdfs
> FileSystem does support s3a scheme, but you would need to add your s3
> access and secret key to your completebulkload configuration.
> 
> Em qua, 30 de out de 2019 às 19:43, Gautham Acharya <
> gauthama@alleninstitute.org> escreveu:
> 
>> If I have Hfiles stored in S3, can I run CompleteBulkLoad and provide an
>> S3 Endpoint to run a single command, or do I need to first copy the S3
>> Hfiles to HDFS first? The documentation is not very clear.
>>
> 

Mime
View raw message