hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Austin Heyne <ahe...@ccri.com>
Subject Re: Completing a bulk load from HFiles stored in S3
Date Tue, 12 Nov 2019 19:32:56 GMT
Yes, that's correct. I've never tried bulk loading from S3 on 2.x


On 11/12/19 1:32 PM, Josh Elser wrote:
> Thanks for the info, Austin. I'm guessing that's how 1.x works since 
> you mention EMR?
> I think this code has changed in 2.x with the SecureBulkLoad stuff 
> moving into "core" (instead of external as a coproc endpoint).
> On 11/12/19 10:39 AM, Austin Heyne wrote:
>> Sorry for the late reply. You should be able to bulk load files from 
>> S3 as it will detect that they're not the same filesystem and have 
>> the regionservers copy the files locally and then up to HDFS. This is 
>> related to a problem I reported a while ago when using HBase on S3 
>> with EMR.
>> https://issues.apache.org/jira/browse/HBASE-20774
>> -Austin
>> On 11/1/19 8:04 AM, Wellington Chevreuil wrote:
>>> Ah yeah, didn't realise it would assume same FS, internally. Indeed, 
>>> no way
>>> to have rename working between different FSes.
>>> Em qui, 31 de out de 2019 às 16:25, Josh Elser <elserj@apache.org> 
>>> escreveu:
>>>> Short answer: no, it will not work and you need to copy it to HDFS 
>>>> first.
>>>> IIRC, the bulk load code is ultimately calling a filesystem rename 
>>>> from
>>>> the path you provided to the proper location in the hbase.rootdir's
>>>> filesystem. I don't believe that an `fs.rename` is going to work 
>>>> across
>>>> filesystems because you can't do this atomically, which HDFS 
>>>> guarantees
>>>> for the rename method [1]
>>>> Additionally, for Kerberos-secured clusters, the server-side bulk load
>>>> logic expects that the filesystem hosting your hfiles is HDFS (in 
>>>> order
>>>> to read the files with the appropriate authentication). This fails 
>>>> right
>>>> now, but is something our PeterS is looking at.
>>>> [1]
>>>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29

>>>> On 10/31/19 6:55 AM, Wellington Chevreuil wrote:
>>>>> I believe you can specify your s3 path for the hfiles directly, as 
>>>>> hdfs
>>>>> FileSystem does support s3a scheme, but you would need to add your s3
>>>>> access and secret key to your completebulkload configuration.
>>>>> Em qua, 30 de out de 2019 às 19:43, Gautham Acharya <
>>>>> gauthama@alleninstitute.org> escreveu:
>>>>>> If I have Hfiles stored in S3, can I run CompleteBulkLoad and 
>>>>>> provide an
>>>>>> S3 Endpoint to run a single command, or do I need to first copy 
>>>>>> the S3
>>>>>> Hfiles to HDFS first? The documentation is not very clear.

View raw message