hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Completing a bulk load from HFiles stored in S3
Date Tue, 12 Nov 2019 18:32:06 GMT
Thanks for the info, Austin. I'm guessing that's how 1.x works since you 
mention EMR?

I think this code has changed in 2.x with the SecureBulkLoad stuff 
moving into "core" (instead of external as a coproc endpoint).

On 11/12/19 10:39 AM, Austin Heyne wrote:
> Sorry for the late reply. You should be able to bulk load files from S3 
> as it will detect that they're not the same filesystem and have the 
> regionservers copy the files locally and then up to HDFS. This is 
> related to a problem I reported a while ago when using HBase on S3 with 
> EMR.
> https://issues.apache.org/jira/browse/HBASE-20774
> -Austin
> On 11/1/19 8:04 AM, Wellington Chevreuil wrote:
>> Ah yeah, didn't realise it would assume same FS, internally. Indeed, 
>> no way
>> to have rename working between different FSes.
>> Em qui, 31 de out de 2019 às 16:25, Josh Elser <elserj@apache.org> 
>> escreveu:
>>> Short answer: no, it will not work and you need to copy it to HDFS 
>>> first.
>>> IIRC, the bulk load code is ultimately calling a filesystem rename from
>>> the path you provided to the proper location in the hbase.rootdir's
>>> filesystem. I don't believe that an `fs.rename` is going to work across
>>> filesystems because you can't do this atomically, which HDFS guarantees
>>> for the rename method [1]
>>> Additionally, for Kerberos-secured clusters, the server-side bulk load
>>> logic expects that the filesystem hosting your hfiles is HDFS (in order
>>> to read the files with the appropriate authentication). This fails right
>>> now, but is something our PeterS is looking at.
>>> [1]
>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29

>>> On 10/31/19 6:55 AM, Wellington Chevreuil wrote:
>>>> I believe you can specify your s3 path for the hfiles directly, as hdfs
>>>> FileSystem does support s3a scheme, but you would need to add your s3
>>>> access and secret key to your completebulkload configuration.
>>>> Em qua, 30 de out de 2019 às 19:43, Gautham Acharya <
>>>> gauthama@alleninstitute.org> escreveu:
>>>>> If I have Hfiles stored in S3, can I run CompleteBulkLoad and 
>>>>> provide an
>>>>> S3 Endpoint to run a single command, or do I need to first copy the S3
>>>>> Hfiles to HDFS first? The documentation is not very clear.

View raw message